CSCE 470 Lecture 11

From Notes
Jump to navigation Jump to search

« previous | Wednesday, September 18, 2013 | next »


Pagerank

Random surfer

Transition Probability Matrix

"If I'm on (row), what's the probability that I will go to (col)" Each row must sum to 1, but what about pages without any outlinks?

When surfer gets bored, magically pick a random page and continue walking. This is called the "I'm bored matrix" or the "teleport matrix":

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{bmatrix} \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \end{bmatrix}}

Let's use Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha} to represent our "am I not bored?" coin. So I get bored at each page with probability Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (1-\alpha)} .

For pages with no outlinks, we'll use a "hack" row from our teleport matrix (since our only way out is to teleport). Thus our "follow link matrix" becomes

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 1 & 0 & 0 & 0 \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \end{bmatrix}}

Now we can apply our "am I bored?" parameter to represent the choices we will make:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha \, \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 1 & 0 & 0 & 0 \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \end{bmatrix} + (1-\alpha) \, \begin{bmatrix} \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \\ \end{bmatrix} = \begin{bmatrix} \frac{1}{16} & \frac{13}{16} & \frac{1}{16} & \frac{1}{16} \\ \frac{1}{16} & \frac{1}{16} & \frac{7}{16} & \frac{7}{16} \\ \frac{13}{16} & \frac{1}{16} & \frac{1}{16} & \frac{1}{16} \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \end{bmatrix} }

This shall henceforth be informally known as the "final super mega" matrix.

Markov Property

"I'm forgetful. I forgot where I was. All I know is that I'm here, so what do I do now?"

Formally: Markov chains are stateless; no history

Encode current location as a vector. Suppose the surfer is at B: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \vec{x}_0 = \left\langle 0, 1, 0, 0 \right\rangle}

Our next position will be Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \vec{x}_1 = \vec{x}_0 \, \hat{P}} , then Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \vec{x}_2 = \vec{x}_1 \, \hat{P} = \vec{x}_0 \, \hat{P}^2}

We continue to infinity, and our position magically converges. This converged vector becomes our Pagerank

Things to take away

  • Now to blow our mind: it doesn't matter where we start, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \lim_{n \to \infty} x_n}
  • If we start with the converged vector, it converges in one step

Since start position actually doesn't matter; we could start simultaneously at all pages: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \vec{x}_0 = \left\langle \frac{1}{4}, \frac{1}{4}, \frac{1}{4}, \frac{1}{4} \right\rangle}

Bottom line:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \vec{x} = \lim_{n \to \infty} \vec{x}_0 \, \hat{P}^n}

Thus the PageRank vector Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \vec{x}} is a left eigenvector of the transition matrix Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{P}} . (i.e. Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \vec{x}} is an eigenvector of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \hat{P}^T} .)

Solving the Markov Chain

Recall the equations we came up with last time:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} PR(A) &= \alpha \, PR(C) + \frac{1-\alpha}{n} \\ PR(B) &= \alpha \, PR(A) + \frac{1-\alpha}{n} \\ PR(C) &= \alpha \, PR(B) + \frac{1-\alpha}{n} \\ PR(D) &= \alpha \, \frac{PR(B)}{2} + \frac{1-\alpha}{n} \end{align}}

The Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{1-\alpha}{n}} represents our teleport probability from our matrix above.

Let's add subscripts to reflect the fact that the values change at each iteration.

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{align} PR(A)_t &= \alpha \, PR(C)_{t-1} + \frac{1-\alpha}{n} \\ PR(B)_t &= \alpha \, PR(A)_{t-1} + \frac{1-\alpha}{n} \\ PR(C)_t &= \alpha \, PR(B)_{t-1} + \frac{1-\alpha}{n} \\ PR(D)_t &= \alpha \, \frac{PR(B)_{t-1}}{2} + \frac{1-\alpha}{n} \end{align}}

We start our 0th iteration Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t=0} with all pagerank values equal to 1.

As Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t \to \infty} , the values will converge to the same thing as our condition matrix.

Obviously, since the system of equations can be represented as a matrix.