CSCE 470 Lecture 17
Jump to navigation
Jump to search
« previous | Wednesday, October 2, 2013 | next »
Flat Clustering
Measuring "cluster goodness" for a k-means output:
Residual Sum of Squares
- is vector to position
- is the centroid (component-wise averages)
We've discussed meta-looping to dispense with random chance of local optimum, but what about choosing the number of clusters?
We could add penalty to "gooness" that incorporates the number of clusters (i.e. more clusters = bad)
Purity
measure "good things" as the most prominent "type" in each cluster.
Hierarchical Clustering
Lots of little clusters merge to form overall cluster.
Bisecting K-means
Input: document corpus (no ; implicitly 2)
"Recursively" call k-means on elements of each cluster to further break them up.
Stop case is when each leaf cluster is a document
Hierarchical Agglomerative (HAC)
Whoa! New favorite word: Agglomerative