CSCE 470 Lecture 20

« previous | Wednesday, October 9, 2013 | next »

Quiz on Tuesday; likely over pagerank.

HW5 Out Tonight.

Classification

What? putting things (documents) in things (classes)

Why? spam filtering, enhanced results, topics of interest to me

How? assumption: training data (examples)

Set up Phase

Both Clustering and Classification involve setting up document vectors. How are these fectors obtained? TF-IDF, "features", I don't know... We can choose different ones.

Classifiers

Rocchio:
- Learn: find centroids of each class in training data
- Apply: assign new documents to nearest centroid's class
- Fail: "Multi-modal" or overlapping classes (linear)
K-Nearest Neighbors (KNN):
- Learn: just have pre-classified data; don't do anything with it (nada)
- Apply: find $k$ nearest neighbors and assign to class that shows up the most. What $k$ is best?
- Pass: allows for "pockets" (non-linear)
Naïve Bayes

CSCE 470 Lecture 20

Classification

Set up Phase

Classifiers

Navigation menu

Search