CSCE 470 Lecture 21

« previous | Friday, October 11, 2013 | next »

Classifications

Finds "separating hyperplane" between classes of data (think of a Voronoi Diagram)

Still given a set of documents $C$ and a corpus of documents $D$

P(A|B)={\frac {P(B|A)\,P(A)}{P(B)}}\,\!

Think of $A$ as a class and $B$ as a document.

We want to estimate $P(c_{i}|d)$ for each class $c_{i}$ and a document $d$ .

In regards to Bayes' Theorem, we do a little trick by assuming all documents have the same probability (i.e. $P(d)$ doesn't matter).

Thus $P(c_{i}|d)=P(d|c_{i})\,P(c_{i})$ .

We want to find the best class $c_{MAP}$ that gives us the highest probability

c_{MAP}={\underset {c_{i}\in C}{\mathrm {argmax} }}\ P(c_{i}|d)

Where MAP stands for Maximum A Posteriori

Suppose we have $n_{i}$ documents in class $c_{i}$ for a corpus size $N=\sum _{i}n_{i}$ .

We can estimate the probability of a class as

P(c_{i})={\frac {n_{i}}{N}}

We can estimate $P(d|c_{i})$ by analyzing whether each term $t$ in document $d$ is from a certain class:

P(d|c_{i})=\prod _{t\in T}P(t|c_{i})