CSCE 470 Lecture 12

« previous | Monday, September 23, 2013 | next »

Quiz Review

tf-idf / cos vector question
why does tf matter?
why does idf matter?
no link analysis
terms (know definitions, advantages/disadvantages)
- NDCG
- precision
- recall
zipf's law
heaps' law
tokenization
normalization
stemming
lemmatization
statistical properties of text

Would stemming affect precision, recall, both, or neither?: Improve recall, but would negatively affect precision.
is tf or idf more important in tweet searching/: idf: tweets are so short that they are unlikely to have the same word multiple times

Topic-Sensitive Pagerank (TSPR)

Given a graph of the web and two topics—"miley cyrus" and "other"

Page nodes are labeled with topics, so each page has $n$ scores—one for each topic.

We run our random surfer analysis thing once for each topic:

instead of randomly jumping to any page, randomly jumps to only a page labeled with the surf topic

Thus if we have 4 pages A, B, C, and D, where only A and B match our topic, our teleport matrix would be as follows:

${\begin{bmatrix}{\frac {1}{2}}&{\frac {1}{2}}&0&0\\{\frac {1}{2}}&{\frac {1}{2}}&0&0\\{\frac {1}{2}}&{\frac {1}{2}}&0&0\\{\frac {1}{2}}&{\frac {1}{2}}&0&0\\\end{bmatrix}}$