CSCE 470 Lecture 22

From Notes
Jump to navigation Jump to search

« previous | Monday, October 14, 2013 | next »


No class friday.


Naïve Bayes' Classifier

Spam or not spam. That is the question.

From last time, we discussed assigning classes using naïve Bayes'

Let's classify a new email. Based on what we know about (90% of all emails are spam), this message is probably spam just without looking at it.

Let's look at the words in the document and factor their probabilities in.


Suppose a regular "spam" message has about 2000 words and a regular "not spam" message has 1000 words.

term spam not spam
"click" 100 10

What if a word does not occur in the training data?

  • ignore it
  • give it a little number (smoothing)


Confusion Matrix

  It really is
spam not spam
I predict spam 50 40
not spam 3 7

Precision: 57 correct out of 100

Accuracy: (number of correct predictions out of number of predictions) = 50/90