20 Cluster jerárquico

20.1 Cluster jerárquico

source('functions.R')

set.seed(311265)
data <- read_spss('datos_para_cluster.sav')

bw8hcluster(data,
            vars= c("X1","X2","X3","X4","X5","X6","X7"),
          weight_var = NULL,
            id_var = 'ID',
            standardize = FALSE,
            distance = "squared_euclidean", 
            method = "ward",
            dist_matrix = TRUE,
            agglom_schedule = TRUE,
            dendrogram = TRUE,
            membership = "range",
            n_clusters = 3, # si single
            min_clusters = 2, # si range       
            max_clusters = 4, # si range      
            run_nbclust = TRUE, #check
            elbow = TRUE, # check
            silhouette = TRUE, # check
            json_path = "hcluster_export.json", #check
            limit_dist_matrix = 100,  # Máximo N para imprimir matriz de distancias
            limit_agglom_steps = 50,  # Máximo de últimas etapas a mostrar en la agenda
            publish = TRUE)

NULL

Análisis de clúster jerárquico (HC)

Método: WARD | Medida: SQUARED_EUCLIDEAN | Estandarizado: FALSE

Agenda

Nota: Se muestran únicamente las últimas 50 etapas de fusión por motivos de rendimiento y relevancia analítica.

Matriz de distancias

Pertenencia de clúster

Validación, método Elbow

Validación, análisis de silueta (Silhouette)

Diagnóstico de solución óptima

*** : The Hubert index is a graphical method of determining the number of clusters.
                In the plot of Hubert index, we seek a significant knee that corresponds to a 
                significant increase of the value of the measure i.e the significant peak in Hubert
                index second differences plot. 
 
*** : The D index is a graphical method of determining the number of clusters. 
                In the plot of D index, we seek a significant knee (the significant peak in Dindex
                second differences plot) that corresponds to a significant increase of the value of
                the measure. 
 
******************************************************************* 
* Among all indices:                                                
* 14 proposed 2 as the best number of clusters 
* 5 proposed 3 as the best number of clusters 
* 5 proposed 4 as the best number of clusters 

                   ***** Conclusion *****                            
 
* According to the majority rule, the best number of clusters is  2 
 
 
*******************************************************************

Dendrograma