CSCE 420 Lecture 28

From Notes
Jump to navigation Jump to search

« previous | Friday, April 26, 2013 | next »

End Exam 2 content


Honors Lecture

Machine Learning

"Learning from experience" (in true light of "intelligence")

→ Adapt and Improve performance

  • Knowledge Base Repair: correct errors in KB rules
  • Neural Networks: graph of sensor inputs through node "network" to output


Supervised learning

Labeled examples:

Labels
finite set of classes/categories
usually small, maybe only positive and negative classes
Examples
described as feature vectors
maps state of features to an action

Driving Example

Labels:

  1. c1 = accelerate
  2. c2 = do nothing
  3. c3 = brake

Features:

  1. F1 = light (red, green, yellow)
  2. F2 = pedestrians
  3. F3 = distance to car in front
  4. F3 = surface wet?

Training Examples:

Unsupervised learning

Unlabeled Examples

Seems much more abstract...

Reinforcement learning

Indirect feedback

Credit assignment

Analyzing Learning

Usually plotted on learning curve graphs:

  • X axis → number of examples
  • Y axis → accuracy
  • Usually plateaus before reaching 100% accuracy


Decision Trees

Each node represents a feature, and branches represent that feature's values:

F1
|-- True: F2
|   |-- True:  (+)
|   `-- False: (-)
`-- False: (+)

How does one construct a decision tree?

ID3 Algorithm

For each node, What feature gains the most information?

ID3(example) returns Tree
  if example is pure, make leaf
  else
    for each feature f
      divide examples into 2 subsets A and B
      calculate information gain IG = Hex - sum(weight[b] * entropy[b], b in branch)
      for Fi that maximizes IG
        return Tree(ID3(A), ID3(B))


Entropy minimization

Over a domain of classes, if we were to plot entropy (Y axis) based on how many of classes were positive, we would get a parabola with roots 0 and and maximum at (n/2, 1)

Perceptrons: Linear Classifiers

Labeled examples of binary classes (positive or negative)

If these are plotted as points on a graph, how would we find a line (discriminant function, hyperplane) to divide the positives and negatives into groups.

In other words, if our examples have values and our target classes are , we can write our learned rule as a weighted linear combination compared to a threshhold:

if x1 w1 + x2 w2 >= c
    return +
else
    return -
end if

we want to find the weight vector such that and the mean squared error is minimized.


Gradient Descent

Delta Training Rule: for each example update:

, where is a small learning rate (e.g. 0.001

This eventually causes a random assignment to migrate toward the best position


This has application in neural networks, where each node is "trained" to understand its threshhold for executing.

Summary

Machine learning boils down to improving a function or a model (hypothesis):

  • is instance space
  • is a discrete output space
  • is a hypothesis from Hypothesis Space
    • for ID3, is the space of all decision tree
    • for Perceptrons, is for instance space dimension of (don't forget the threshhold is a weight on the constant term!)