CSCE 470 Lecture 33

From Notes
Jump to navigation Jump to search

« previous | Friday, November 15, 2013 | next »


Spatial Variation in Search Engine Queries

  • Lars Backstrom (smart dude)
  • Jon Kleinberg (the rebel king)
  • Ravi Kumar (Yahoo!)
  • Jasmine Novak (Yahoo!)

Caverlee says:

  1. Learn model in paper


Many topics have a Geographic focus

  • Sports
  • airlines
  • utility companies
  • attractions

Identify and characterize topics

  • find center of geographic focus
  • determine if a topic is highly concentrated or spread diffusely

Use Yahoo! query logs

Probablistic Model

Consider some query term .

For each location , a query coming from has probability of containing .

This assumes that term has a center "hot-spot" (call it ):

  • Probability will be highest at
  • decreases as gets further away from .

Query coming from at a distance from the term's center has probability

  • and are unknown parameters.
  • For non-local topics, is small (slow decay)
  • For extremely local topics, is large (fast decay)

Algorithm

They don't know the center, so

  1. split the area into regions
  2. find the region with best fit
  3. recursively to refine

This brings out some crazy data!

However, this all assumes that topics have a single center of interest)

Multiple Centers

Basically K-Means clustering:

  • Start with random centers, optimize with 1-center algorithm
  • Assign to best center
  • recompute
Mer-People use Bing... I knew it!