CSCE 470 Lecture 18
Jump to navigation
Jump to search
« previous | Friday, October 4, 2013 | next »
Clustering Review
- Top-down: Bisecting k-means
- Bottom-up: Hierarchical Agglomerative Clustering (HAC)
Hierarchical Agglomerative Clustering
Measure distance between point and every other point and put in a lower triangular matrix:
Constructing this thing takes
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | — | — | — | — | — | — | — | — | — |
| 2 | 0 | — | — | — | — | — | — | — | — | |
| 3 | 0 | — | — | — | — | — | — | — | ||
| 4 | 0 | — | — | — | — | — | — | |||
| 5 | 0 | — | — | — | — | — | ||||
| 6 | 0 | — | — | — | — | |||||
| 7 | 0 | — | — | — | ||||||
| 8 | 0 | — | — | |||||||
| 9 | 0 | — | ||||||||
| 10 | 0 |
Calculating distance between points is as easy as looking them up in the table!
Merge the smallest distance in the table.
Now we have a way to calculate distance between:
- point and point:
dist(p,q) = sqrt(sum((i-j)**2 for i,j in zip(p,q)) - point and cluster:
dist(p,c) = min(dist(p,q) for q in c)(ormax(); design choice) - cluster to cluster:
dist(c1,c2) = min(dist(p,q) for p in c1 for q in c2)(ormax(); design choice)