CSCE 470 Lecture 36
Jump to navigation
Jump to search
« previous | Friday, November 22, 2013 | next »
WE NEED TO RENAME OUR BUNDLE NAMESPACE!!!!
"company" \ "product" + Bundle
???\Wikipedia\CategorizerBundle
Final Review
"term-document" matrix
- sparse 1s in a sea of 0s
- inverted index
- indexing pipeline → magic!
Foundations
- Building an index
- Statistical properties of text (Zipf and Heaps)
- Evaluation (Precision, Recall, F-Measure, NDCG)
- MapReduce
- Interfaces
- Bow-tie structure of the web
Retrieval Models
- Boolean
- Vector Space (cosine, TF-IDF)
- Link Analysis (PageRank and HITS)
- Learning to Rank
IR in Action
- Recommenders (collab filtering, content-based)
- Clustering (K-Means, HAC) and Classification (Rocchio, KNN, Naïve Bayes)
- Geo + Location-based
- Question answering
- Privacy
The Final
- Everything is fair game (cumulative)
- Like in-class quizzez
- 90 minutes (planning for "75-minute exam")
Logistics
- 20% of grade
- Monday
- Regular Class Time
- HRBB 131
- Closed Book
- Two Pages of notes, formulas, etc
- No calculators
Three Types of Questions:
- Short Answer
- Concept application (walk through the algorithm)
- K-Means clustering
- Naïve Bayes Classification
- Collaborative Filtering
- ...
- Synthesis
Practice
Do the posted previous finals
- Hub
- page that links to pages that directly answer the information need
- Authority
- page that directly answers an information need
Naïve Bayes
Three classes:
- Funny
- Rants
- Crap
Precalculated probabilities:
Funny | Rants | Crap | |
---|---|---|---|
P(c) | 0.1 | 0.2 | 0.7 |
c) | 0.05 | 0.8 | 0.1 |
c) | 0.3 | 0.4 | 0.6 |
c) | 0.05 | 0.01 | 0.5 |
c | 0.3 | 0.02 | 0.01 |
c) | 0.5 | 0.01 | 0.2 |
Ignore words not in table
Alternative things to do when not in table:
- ignore
- smoothing: give unknown words probability 0.0001 for all categories