CSCE 470 Lecture 19

« previous | Monday, October 7, 2013 | next »

Overview

Document Space $X$

Classes $C$

For now, we will assume classes are mutually exclusive and mutually exhaustive (i.e. partition the document space)

Goal Classifier $\gamma$ that can map $X$ to $C$ . In not so many words,

\gamma :X\to C\,\!

Learning $\gamma$ , the classifier

Eyeballs (manually)
Figure out some "rules" (college friend if your.college = my.college and |your.grad_year − my.grad_year| ≤ 4)
Machine Learning

Neither 1 nor 2 are very useful (i.e. they break and/or don't scale well), so we will be focusing on 3!

Learn the classifier ( $\gamma$ ): analyze training data (examples of Docs → classes) ^[1]
Testing/Application: use $\gamma$ on new stuff.

Vector space classification.

Two assumptions:

Docs from the same class "bunch" in a contiguous region of space
Classes are non-overlapping (i.e. drawing a convex blob around points does not include any points from other classes)

Compute centroids for each class in training data.

Assign new documents to closest class centroid.

(for now this assign thing is completely separate from the training step)