CSCE 470 Lecture 20
Jump to navigation
Jump to search
« previous | Wednesday, October 9, 2013 | next »
Quiz on Tuesday; likely over pagerank.
HW5 Out Tonight.
Classification
What? putting things (documents) in things (classes)
Why? spam filtering, enhanced results, topics of interest to me
How? assumption: training data (examples)
Set up Phase
Both Clustering and Classification involve setting up document vectors. How are these fectors obtained? TF-IDF, "features", I don't know... We can choose different ones.
Classifiers
- Rocchio:
- Learn: find centroids of each class in training data
- Apply: assign new documents to nearest centroid's class
- Fail: "Multi-modal" or overlapping classes (linear)
- K-Nearest Neighbors (KNN):
- Learn: just have pre-classified data; don't do anything with it (nada)
- Apply: find nearest neighbors and assign to class that shows up the most. What is best?
- Pass: allows for "pockets" (non-linear)
- Naïve Bayes