Clustering

Overview

In this analysis, we have applied three different clustering methods: KMeans, Dendrogram (Hierarchical Clustering), and HDBSCAN. Each method has its own strengths and weaknesses, and they can provide different insights into the data. Below, we compare and contrast the results of these clustering methods.

1. KMeans Clustering

2. Dendrogram (Hierarchical Clustering)

3. HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise)

KMeans

KMeans - Temperature, Precipitation, and Snow Water Equivalent 3D Plot

Note that cluster centroids are viewable, you just need to zoom in a bit.

Hierarchical

Density-based

Conclusion

Each clustering method offers unique insights and has its own advantages and limitations. KMeans is efficient for large datasets with a known number of clusters. Based on the centroid-distance calculation process, we get 4 distinct clusters in our results that roughly translate to seasons. Dendrogram provides a hierarchical perspective. Based on the agglomerative clustering we see 4 distinct groupings of the columns of the streamflow prediction dataset. Generally, soil moisture is grouped together, precipitation and temperature is grouped together, and snow water equivalent is grouped together. This serves as a useful validation to ensure like-behaving values. HDBSCAN is suitable for data with varying densities and noise. We chose a comparison between temperature and precipitation to try and uncover clustered data. In general, HDBSCAN does a good job at identifying a gradient of like-behaving data. This may not correspond to seasons as strictly as KMeans, but it does indicate a more granular look at how temperature and precipitation interact.

Non-Technical

Based on the various clustering exercises seen above, we land on three key conclusions. 1) Our data roughly resembles seasons. This is a good sign as it helps us get comfortable with using the data in future modeling. 2) Columns that appear to be alike are alike. This goes for snow, rain, and ground moisture. 3) precipitation and temperature follow a logical trend.