CSCE 470 Lecture 17

From Notes
Jump to navigation Jump to search

« previous | Wednesday, October 2, 2013 | next »


Flat Clustering

Measuring "cluster goodness" for a k-means output:

Residual Sum of Squares

  • is vector to position
  • is the centroid (component-wise averages)

We've discussed meta-looping to dispense with random chance of local optimum, but what about choosing the number of clusters?

We could add penalty to "gooness" that incorporates the number of clusters (i.e. more clusters = bad)

Purity

measure "good things" as the most prominent "type" in each cluster.

Hierarchical Clustering

Lots of little clusters merge to form overall cluster.

Bisecting K-means

Input: document corpus (no ; implicitly 2)

"Recursively" call k-means on elements of each cluster to further break them up.

Stop case is when each leaf cluster is a document


Hierarchical Agglomerative (HAC)

Whoa! New favorite word: Agglomerative