CSCE 470 Lecture 25

« previous | Wednesday, October 23, 2013 | next »

Privacy

Used for evaluating the effectiveness of a model/estimator (or in our case, a recommender based on ratings)

RMSE={\sqrt {\frac {\sum \left({\mbox{predicted}}-{\mbox{actual}}\right)^{2}}{n}}}

Given matrix of data:

${\begin{bmatrix}4&&3&\\3&4&2^{*}&\\3&5^{*}&5&\\&&4^{*}&5\\\end{bmatrix}}$

The starred items are not known to the algorithm. Suppose the algorithm gives the following scores:

${\begin{bmatrix}4&&3&\\3&4&3&\\3&2&5&\\&&1&5\\\end{bmatrix}}$

The RMSE is

${\sqrt {\frac {(3-2)^{2}+(2-5)^{2}+(1-4)^{2}}{3}}}\approx 2.516611478$

The goal is to minimize this value. If the value is 1, then our recommender was off by one for every single value.

Does rating really equal enjoyment?

Well, ${\mbox{rating}}={\mbox{enjoyment}}({\mbox{context}})$

To find similarity (as in KNN), we need:

representation (usually a vector)
- ratings
- Watched-or-not (binary)
- distance from average
a way to compare
- manhattan
- Euclidean
- Cosine
- Jaccard

Once we've found several similar users, how should we give score based on values?

Find groups of similar people that have similar tastes