Dependency-based clustering using correspondence and Ward's hierarchical analysis: a case study on clean water and sanitation indicators in West Java's cities and regencies

Theresia Samaria Nauli, Irlandia Ginanjar, Defi Yusti Faidah

Abstract


Correspondence analysis is a graphical technique for depicting relationships between variables in a low-dimensional space, making it ideal for non-metric data and non-linear associations. Multiple Correspondence Analysis (MCA) expands on this by identifying patterns in categorical variables, using the Burt matrix, a multidimensional contingency table. MCA aims for a cumulative variance of at least 70% across two dimensions; however, if this threshold is not met, Euclidean distance can improve object characterization with results extending beyond two dimensions. Though the dependency information from correspondence analysis is objective, it cannot reveal groups with fewer members based on available resources. Hence, cluster analysis is conducted using the principal coordinates from MCA results. This study aims to identify the unique characteristics of each object, allowing more focused evaluations based on specific attributes. By ensuring a cumulative variance of 100%, this method captures all relevant dependency information, offering a deeper understanding of variable relationships. The study stresses the importance of selecting the most suitable clustering model that aligns with the correspondence analysis results. By combining MCA and hierarchical clustering, the study visualizes and groups regencies and cities based on their clean water and sanitation conditions. Initial MCA results showed a cumulative variance of 22.3% in two dimensions, requiring further adjustments for more accurate interpretation. The innovation of this research lies in integrating MCA with hierarchical clustering using Euclidean distance to explore characteristics comprehensively. This method ensures a complete representation of dependency relationships, maintaining a cumulative variance of 100%. A Euclidean distance matrix across 63 dimensions was used to enhance objectivity. The results identify 18 groups of regencies and cities with similar clean water and sanitation characteristics. Among clustering methods, the Ward method was most consistent with MCA findings. Cluster analysis was performed by forming three, four, and five clusters, aligning with government budget constraints.

Full Text: PDF

Published: 2025-05-27

How to Cite this Article:

Theresia Samaria Nauli, Irlandia Ginanjar, Defi Yusti Faidah, Dependency-based clustering using correspondence and Ward's hierarchical analysis: a case study on clean water and sanitation indicators in West Java's cities and regencies, Commun. Math. Biol. Neurosci., 2025 (2025), Article ID 71

Copyright © 2025 Theresia Samaria Nauli, Irlandia Ginanjar, Defi Yusti Faidah. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Commun. Math. Biol. Neurosci.

ISSN 2052-2541

Editorial Office: office@scik.org

 

Copyright ©2025 CMBN