CSCE 470 Lecture 32

From Notes
Jump to navigation Jump to search

« previous | Wednesday, November 13, 2013 | next »


Learning To Rank

Last time was just "learning relevance"

Machine Learning and Ranked Info Retrieval have been around for a long time... so why didn't the Machine Learning and Ranking communities get together earlier?

  • "Progressive development" — We're smarter now than all of the human race was Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} years ago.
  • They didn't know about each other
  • The people didn't have access to much training data
  • Just takes time to be appreciated

What we talked about (relevant or not) is just classic classification: mapping to an unordered set of classes.

Solution:

  • Regression problems: map to a real value
  • Ordinal regression: map to an ordered set of classes (buckets)


  • Short Answer
  • Work-out Question
  • Thinking Question
  • Synthesis Question (put things together)

Assume we have categories Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C} of relevance exist with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C_1 < C_2 < \dots < C_j} .

Assume training data is available consisting of document-query pairs represented as feature vectos Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \psi_i} and relevance ranking Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C_i}

Two ways:

  • point-wise learning
  • pair-wise learning

Pair-Wise Learning

Main Idea: Take pairs of documents and determine which document is better.

Construct vector of features Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \psi_j = \psi(d_j, q)}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \phi(d_i, d_j, q) = \psi(d_i, q) - \psi(d_j, q)}

Now training data consists of two documents Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d_i} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle d_j} .

Example

Title query cosine pagerank loadtime label
My Blog johnny football 0.2 0.1 0.01 Poor
ESPN johnny football 0.3 0.2 0.01 Excellent

Calculate Differences

Title I Title J Δ cosine Δ pagerank Δ loadtime label
My Blog ESPN johnny football -0.1 -0.1 0 J
My Blog Your Blog johnny football 0.0 0.1 0.01 I

Now we have:

  • A comparator to order between documents for a query
  • A classifier for I or J (like what we covered last time)

Summary

Ultimately beats traditional hand-designed ranking functions (hand-designed functions are included as features)