CSCE 470 Lecture 6
Jump to navigation
Jump to search
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathrm{sim}\left( \vec{q}, \vec{d} \right) = \cos{\theta} = \frac{\vec{q} \cdot \vec{d}}{\left\| \vec{q} \right\| \, \left\| \vec{d} \right\|}}
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} \hat{q} &= \frac{\vec{q}}{\left\| \vec{q} \right\|} & \hat{d} &= \frac{\vec{d}}{\left\| \vec{d} \right\|} & \mathrm{sim}\left( \hat{q}, \hat{d} \right) = \cos{\theta} &= \hat{q} \cdot \hat{d} \end{align}}
« previous | Friday, September 6, 2013 | next »
Search Ranking
Two Key Questions
- How to represent each query or document?
- set of terms
- bag of words → TF, log TF, TF IDF
- How to measure similarity (or distance) between Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q}
and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d}
?
- Jaccard
- Manhattan Distance (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle |u_1 - v_1| + \dots + |u_n - v_n|}
- Euclidean Distance ()
- Cosine
Cosine Similarity
Measures angle between query vector and document vector . Two vectors are similar if
- the angle between is smaller (i.e. ), and thus
- the cosine similarity is larger (i.e. Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \cos{\theta} \to 1} )
From vector calculus, we can find the cosine between two vectors as follows:
If the vectors are stored in a normalized format, the similarity formula becomes much easier: