CSCE 470 Lecture 22
Jump to navigation
Jump to search
« previous | Monday, October 14, 2013 | next »
No class friday.
Naïve Bayes' Classifier
Spam or not spam. That is the question.
From last time, we discussed assigning classes using naïve Bayes'
Let's classify a new email. Based on what we know about (90% of all emails are spam), this message is probably spam just without looking at it.
Let's look at the words in the document and factor their probabilities in.
Suppose a regular "spam" message has about 2000 words and a regular "not spam" message has 1000 words.
term | spam | not spam |
---|---|---|
"click" | 100 | 10 |
What if a word does not occur in the training data?
- ignore it
- give it a little number (smoothing)
Confusion Matrix
It really is | |||
---|---|---|---|
spam | not spam | ||
I predict | spam | 50 | 40 |
not spam | 3 | 7 |
Precision: 57 correct out of 100
Accuracy: (number of correct predictions out of number of predictions) = 50/90