PageRank

From Notes
Jump to navigation Jump to search

Pagerank is a measure of a node's "popularity" within a graph.


Rationale

Random graph for PageRank

Consider the directed graph at right.

If we start on node and choose to follow an edge at random, we will land on or with equal probability . If we choose to follow a link again, we will arrive at , , or with the following probabilities:

The process can be repeated indefinitely, and the probabilities will converge to a stable value. These values are the PageRank values of the nodes.


Mathematical Representation

Positional Probability

We choose to represent the probabilities of arriving at a node as a vector. Using the example above with coordinates for , , , and in that order, we can represent the probabilities as follows:

  • Starting at node :
  • After one step:
  • After two steps:

Graph Traversal (Transformation)

Let be a probability vector as described above. The steps taken in determining the probability resulting from graph edge traversal be described by multiplication by a matrix so that . The cells of this matrix represent the relative probability of following edges to other nodes, hence we shall call it the transformation probability matrix, although it will be shown later that this matrix requires some modification in order to work correctly.

Here we choose to represent the cell as the probability of arriving at node if we were originally on node . Observe that the sum of each column of this matrix is 1. Continuing with our running example, the initial transformation probability matrix for the above example is given by:

Dead-Ends and Cycles

There is a slight problem with the transformation probability matrix representation above. If there are "dead-end" nodes in the graphs (i.e. no out-edges), then no more steps can be taken once those nodes are reached. If we represent this as a self-loop edge on the node, then the probability limit will eventually converge to only these dead-end nodes.

To alleviate this problem, we introduce an additional transformation matrix that represents the probability of jumping from any node to any other node. This is an matrix whose values are all :

This matrix also has the property that all column sums are 1. For the example above, the "teleport" transformation matrix is:


Definition

Taking a weighted linear combination of the two transformation matrices given above with weight parameter produces an adjusted transformation probability matrix. This parameter represents the probability that the position randomly teleports from the current node to any other node in the graph.

Starting at a positional probability vector , we define the process of taking a randomized single step in graph traversal as

Since the PageRank vector is the converged positional probability vector, we can write

However, this limit is usually not feasible to evaluate, so we employ the following property: Assuming is the converged PageRank vector, we have

Hence is a right eigenvector of for eigenvalue


Computation