CSCE 470 Lecture 33
Jump to navigation
Jump to search
« previous | Friday, November 15, 2013 | next »
Spatial Variation in Search Engine Queries
- Lars Backstrom (smart dude)
- Jon Kleinberg (the rebel king)
- Ravi Kumar (Yahoo!)
- Jasmine Novak (Yahoo!)
Caverlee says:
- Learn model in paper
Many topics have a Geographic focus
- Sports
- airlines
- utility companies
- attractions
Identify and characterize topics
- find center of geographic focus
- determine if a topic is highly concentrated or spread diffusely
Use Yahoo! query logs
Probablistic Model
Consider some query term .
For each location , a query coming from has probability of containing .
This assumes that term has a center "hot-spot" (call it ):
- Probability will be highest at
- decreases as gets further away from .
Query coming from at a distance from the term's center has probability
- and are unknown parameters.
- For non-local topics, is small (slow decay)
- For extremely local topics, is large (fast decay)
Algorithm
They don't know the center, so
- split the area into regions
- find the region with best fit
- recursively to refine
This brings out some crazy data!
However, this all assumes that topics have a single center of interest)
Multiple Centers
Basically K-Means clustering:
- Start with random centers, optimize with 1-center algorithm
- Assign to best center
- recompute
Mer-People use Bing... I knew it!