CSCE 470 Lecture 18

From Notes
Jump to navigation Jump to search

« previous | Friday, October 4, 2013 | next »


Clustering Review

  • Top-down: Bisecting k-means
  • Bottom-up: Hierarchical Agglomerative Clustering (HAC)

Hierarchical Agglomerative Clustering

Measure distance between point and every other point and put in a lower triangular matrix:

Constructing this thing takes Face-sad.svg

  1 2 3 4 5 6 7 8 9 10
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0

Calculating distance between points is as easy as looking them up in the table!

Merge the smallest distance in the table.

Now we have a way to calculate distance between:

  • point and point: dist(p,q) = sqrt(sum((i-j)**2 for i,j in zip(p,q))
  • point and cluster: dist(p,c) = min(dist(p,q) for q in c) (or max(); design choice)
  • cluster to cluster: dist(c1,c2) = min(dist(p,q) for p in c1 for q in c2) (or max(); design choice)