CSCE 470 Lecture 6
Jump to navigation
Jump to search
« previous | Friday, September 6, 2013 | next »
Search Ranking
Two Key Questions
- How to represent each query or document?
- set of terms
- bag of words → TF, log TF, TF IDF
- How to measure similarity (or distance) between and ?
- Jaccard
- Manhattan Distance (
- Euclidean Distance ()
- Cosine
Cosine Similarity
Measures angle between query vector and document vector . Two vectors are similar if
- the angle between is smaller (i.e. ), and thus
- the cosine similarity is larger (i.e. )
From vector calculus, we can find the cosine between two vectors as follows:
If the vectors are stored in a normalized format, the similarity formula becomes much easier: