CSCE 470 Lecture 6

From Notes
Jump to navigation Jump to search

« previous | Friday, September 6, 2013 | next »


Search Ranking

Two Key Questions

  1. How to represent each query or document?
    • set of terms
    • bag of words → TF, log TF, TF IDF
  2. How to measure similarity (or distance) between and ?
    • Jaccard
    • Manhattan Distance (
    • Euclidean Distance ()
    • Cosine

Cosine Similarity

Measures angle between query vector and document vector . Two vectors are similar if

  • the angle between is smaller (i.e. ), and thus
  • the cosine similarity is larger (i.e. )

From vector calculus, we can find the cosine between two vectors as follows:

If the vectors are stored in a normalized format, the similarity formula becomes much easier: