CSCE 470 Lecture 12

From Notes
Jump to navigation Jump to search

« previous | Monday, September 23, 2013 | next »


Quiz Review

Would stemming affect precision, recall, both, or neither?
Improve recall, but would negatively affect precision.
is tf or idf more important in tweet searching/
idf: tweets are so short that they are unlikely to have the same word multiple times

Topic-Sensitive Pagerank (TSPR)

Given a graph of the web and two topics—"miley cyrus" and "other"

Page nodes are labeled with topics, so each page has scores—one for each topic.

We run our random surfer analysis thing once for each topic:

  • instead of randomly jumping to any page, randomly jumps to only a page labeled with the surf topic

Thus if we have 4 pages A, B, C, and D, where only A and B match our topic, our teleport matrix would be as follows:

Thus our markov chain would converge to a pagerank weighted toward our topic

Note: our "hack" row in the link matrix is identical to the rows in our teleport matrix.


Hubs and Authorities or Hyperlink Induced Topic Search (HITS)

Independently developed on the east coast at Cornell University by Kleinberg

Authority
page that directly satisfies my information need
Hub
Page that aggregates links to pages

Every page has two scores:

  • an authority score , and
  • a hub score .

Good authorities are pointed to by good hubs, and good hubs point to good authorities

Similar to our pagerank algorithm, we start out with and for all pages .

Next we proceed to iteratively update all hub and authority scores