CSCE 470 Lecture 32

From Notes
Jump to navigation Jump to search

« previous | Wednesday, November 13, 2013 | next »


Learning To Rank

Last time was just "learning relevance"

Machine Learning and Ranked Info Retrieval have been around for a long time... so why didn't the Machine Learning and Ranking communities get together earlier?

  • "Progressive development" — We're smarter now than all of the human race was years ago.
  • They didn't know about each other
  • The people didn't have access to much training data
  • Just takes time to be appreciated

What we talked about (relevant or not) is just classic classification: mapping to an unordered set of classes.

Solution:

  • Regression problems: map to a real value
  • Ordinal regression: map to an ordered set of classes (buckets)


  • Short Answer
  • Work-out Question
  • Thinking Question
  • Synthesis Question (put things together)

Assume we have categories of relevance exist with .

Assume training data is available consisting of document-query pairs represented as feature vectos and relevance ranking

Two ways:

  • point-wise learning
  • pair-wise learning

Pair-Wise Learning

Main Idea: Take pairs of documents and determine which document is better.

Construct vector of features

Now training data consists of two documents and .

Example

Title query cosine pagerank loadtime label
My Blog johnny football 0.2 0.1 0.01 Poor
ESPN johnny football 0.3 0.2 0.01 Excellent

Calculate Differences

Title I Title J Δ cosine Δ pagerank Δ loadtime label
My Blog ESPN johnny football -0.1 -0.1 0 J
My Blog Your Blog johnny football 0.0 0.1 0.01 I

Now we have:

  • A comparator to order between documents for a query
  • A classifier for I or J (like what we covered last time)

Summary

Ultimately beats traditional hand-designed ranking functions (hand-designed functions are included as features)