CSCE 470 Lecture 32
« previous | Wednesday, November 13, 2013 | next »
Learning To Rank
Last time was just "learning relevance"
Machine Learning and Ranked Info Retrieval have been around for a long time... so why didn't the Machine Learning and Ranking communities get together earlier?
- "Progressive development" — We're smarter now than all of the human race was Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} years ago.
- They didn't know about each other
- The people didn't have access to much training data
- Just takes time to be appreciated
What we talked about (relevant or not) is just classic classification: mapping to an unordered set of classes.
Solution:
- Regression problems: map to a real value
- Ordinal regression: map to an ordered set of classes (buckets)
- Short Answer
- Work-out Question
- Thinking Question
- Synthesis Question (put things together)
Assume we have categories Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C} of relevance exist with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C_1 < C_2 < \dots < C_j} .
Assume training data is available consisting of document-query pairs represented as feature vectos Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \psi_i} and relevance ranking Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C_i}
Two ways:
point-wise learning- pair-wise learning
Pair-Wise Learning
Main Idea: Take pairs of documents and determine which document is better.
Construct vector of features Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \psi_j = \psi(d_j, q)}
Now training data consists of two documents Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d_i} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d_j} .
Example
| Title | query | cosine | pagerank | loadtime | label |
|---|---|---|---|---|---|
| My Blog | johnny football | 0.2 | 0.1 | 0.01 | Poor |
| ESPN | johnny football | 0.3 | 0.2 | 0.01 | Excellent |
Calculate Differences
| Title I | Title J | Δ cosine | Δ pagerank | Δ loadtime | label | |
|---|---|---|---|---|---|---|
| My Blog | ESPN | johnny football | -0.1 | -0.1 | 0 | J |
| My Blog | Your Blog | johnny football | 0.0 | 0.1 | 0.01 | I |
Now we have:
- A comparator to order between documents for a query
- A classifier for I or J (like what we covered last time)
Summary
Ultimately beats traditional hand-designed ranking functions (hand-designed functions are included as features)