CSCE 470 Lecture 18
Jump to navigation
Jump to search
« previous | Friday, October 4, 2013 | next »
Clustering Review
- Top-down: Bisecting k-means
- Bottom-up: Hierarchical Agglomerative Clustering (HAC)
Hierarchical Agglomerative Clustering
Measure distance between point and every other point and put in a lower triangular matrix:
Constructing this thing takes
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | — | — | — | — | — | — | — | — | — |
2 | 0 | — | — | — | — | — | — | — | — | |
3 | 0 | — | — | — | — | — | — | — | ||
4 | 0 | — | — | — | — | — | — | |||
5 | 0 | — | — | — | — | — | ||||
6 | 0 | — | — | — | — | |||||
7 | 0 | — | — | — | ||||||
8 | 0 | — | — | |||||||
9 | 0 | — | ||||||||
10 | 0 |
Calculating distance between points is as easy as looking them up in the table!
Merge the smallest distance in the table.
Now we have a way to calculate distance between:
- point and point:
dist(p,q) = sqrt(sum((i-j)**2 for i,j in zip(p,q))
- point and cluster:
dist(p,c) = min(dist(p,q) for q in c)
(ormax()
; design choice) - cluster to cluster:
dist(c1,c2) = min(dist(p,q) for p in c1 for q in c2)
(ormax()
; design choice)