Hierarchical Clustering in R

What’s Hierarchical Clustering?

Hierarchical clustering is another type of clustering technique used in identifying groups in a given dataset. Hierarchical clustering helps in solving major issues that are associated with Kmeans clustering.

There are two main types of Hierarchical Clustering, we have the Agglomerative and the Divisive clustering.

1: Agglomerative: It starts by making each data point a cluster. At each point of iteration, two closest clusters are merged until only a single cluster remains which is plotted as a dendrogram

Agglomerative

2: Divisive: In Divisive clustering, all data points start as a single cluster, at each point of iteration, the most heterogeneous cluster is divided into two until all data points are in their cluster.

Divisive

Dissimilarity Measures

Single Linkage: Also known as the Minimum Linkage, it computes all pairwise dissimilarities between two clusters and uses the minimum of these dissimilarities as the distance between the two clusters.

Complete Linkage: Also known as the Maximum Linkage, it computes all pairwise dissimilarities between two clusters and uses the maximum of these dissimilarities as the distance between the two clusters.

Average Linkage: The average of all pairwise dissimilarities between cluster 1 and cluster two is considered as the distance between the two clusters.

Ward’s Method: It tends to minimize the variance within each cluster. At each alteration, it combines the clusters with minimum between-cluster distance.

Centroid Linkage: It considers the distance between the centroid for cluster 1 ( a mean vector of length p variables) and the centroid for cluster 2. At each point of iteration, the pair of clusters with the minimum between cluster distance are merged.

Getting Started with Hierarchical Clustering in R

The following packages are needed for this tutorial

We will be using the Iris data in R

library(dplyr)
library(cluster)
library(dplyr)
library(factoextra)
data("iris")
df<-iris%>%select(-c(Species))
glimpse(df)
## Rows: 150
## Columns: 4
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4,...
## $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7,...
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5,...
## $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2,...

Agglomerative Clustering

We will start by computing the distance measure using the dist function in R. Also using the hclust function , we can perform agglomerative hierarchical clustering.

In the hclust function, we specify the method we want i.e "complete", "Single", "Average" or "Ward D"

d <- dist(df, method = "euclidean") # Distance matrix
ag_1 <- hclust(d, method = "complete" ) # Using the complete Linkage

Dendrogram of AG Clustering

plot(ag_1, cex = 0.0002, hang = -0.5)  # Dendrogram

The Dendrogram indicates the number of clusters to go with and the clustering structure.

Agglomerative Coefficient

We need the Agglomerative coefficient to determine the method ( complete, single, ward, average) with the best clustering structure. The hclust function can’t be used for this, hence we will the Agnes to obtain our Agglomerative coefficient.

ag_methods<-list(average="average",single="single",complete="complete",ward="ward")
a_coeff <- function(x) {
agnes(df, method = x)$ac
}
map_dbl(ag_methods,a_coeff)
##   average    single  complete      ward 
## 0.9300174 0.8493364 0.9574622 0.9908772

Ward method appears to have the highest AC value, hence we will settle for it as the method for our Agglomerative Clustering.

Divisive Clustering

The diana function in Cluster package can be used to perform divisive clustering in R. Just like we obtained the Agglomerative Coefficient for Agglomerative clustering using Agnes, we can also obtain divisive coefficients for divisive clustering using diana function

d_clus<-diana(df) # Implementing Divisive clustering
d_clus$dc  # Divisive coefficient
## [1] 0.953798

Dendrogram for Divisive Clustering

plot(d_clus)

Optimal clusters(k)

In our tutorial on K-means clustering, we examine some methods for obtaining the optimal K for our Kmeans model. We can also use those methods for obtaining optimal K for Hierarchical clustering

Gap Statistics

The estimate of the optimal clusters K is the value that maximizes the Gap Statistics

g_p <- clusGap(df, FUN = hcut, nstart = 25, K.max = 10, B = 50)

Gap Plot

fviz_gap_stat(g_p)

Elbow Method

In the Elbow Method, the optimal is usually the point where the curve is starting to have a diminishing return.

Elbow Plot

fviz_nbclust(df, FUN = hcut, method = "wss")

Average Silhouette Method

In the Average Silhouette Method, the optimal K is the point at which the Average Silhouette width is maximized.

Silhouette Plot

fviz_nbclust(df, FUN = hcut, method = "silhouette")

Choosing optimal K as 3 as seen in the elbow method, we move ahead to implement our Agglomerative clustering

Implementing the clustering using optimal K

ag_final<- hclust(d, method = "ward.D2" )
c_grp <- cutree(ag_final, k = 3) #  cut To the chosen clusters
table(c_grp)
## c_grp
##  1  2  3 
## 50 64 36
n_d<-df%>%mutate(cluster = c_grp) # Adding the clusters to our data
  head(n_d)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width cluster
## 1          5.1         3.5          1.4         0.2       1
## 2          4.9         3.0          1.4         0.2       1
## 3          4.7         3.2          1.3         0.2       1
## 4          4.6         3.1          1.5         0.2       1
## 5          5.0         3.6          1.4         0.2       1
## 6          5.4         3.9          1.7         0.4       1

Visualizing our cluster

fviz_cluster(list(data = df, cluster = c_grp))

Final Comments

In this tutorial, we examined the different types of hierarchical clustering, namely Divisive and Agglomerative.The hclust function was used in implementing the hierarchical clustering while the diana function was used to implement the divisive clustering. We also used different methods to examined our optimal K. Finally, Using K as 3, we implemented hierachical clustering and visualized the three clusters.