Difference between revisions of "Neural Networks (Geoffrey Hinton Course)"
Line 78: | Line 78: | ||
** One major aim is to create an internal representation of the input that is useful for subsequent supervised or reinforcement learning. | ** One major aim is to create an internal representation of the input that is useful for subsequent supervised or reinforcement learning. | ||
** You can compute the distance to a surface by using the disparity between two images. But you don't want to learn to compute disparities by stubbing your toe thousands of times. | ** You can compute the distance to a surface by using the disparity between two images. But you don't want to learn to compute disparities by stubbing your toe thousands of times. | ||
+ | * Other goals: | ||
+ | ** Providing a compact, low-dimensional representation of the input. | ||
+ | *** High-dimensional inputs typically live on or near a low-dimensional manifold (or several such manifolds) | ||
+ | *** Principal Component Analysis is a widely used linear method for finding a low-dimensional representation. | ||
+ | ** Providing an economical high-dimensional representation of the input in terms of learned features. | ||
+ | *** Binary features | ||
+ | *** Real-valued features that are nearly all zero | ||
+ | ** Finding sensible clusters in the input | ||
+ | *** This is an example of a <em>very</em> sparse code in which only one of the features is non-zero. |
Revision as of 18:09, 30 October 2016
Some Simple Models or Neurons
$y$ output, $x_i$ input.
Linear Neurons
$y = b + \sum_{i} x_i w_i$
$w_i$ weights, $b$ bias
Binary Threshold Neurons
$z = \sum_{i} x_i w_i$
$y = 1$ if $z \geq \theta$, $0$ otherwise.
Or, equivalently,
$z = b + \sum_{i} x_i w_i$
$y = 1$ if $z \geq 0$, $0$ otherwise.
Rectified Linear Neurons
$z = b + \sum_{i} x_i w_i$
$y = z$ if $z > 0$, $0$ otherwise. (linear above zero, decision at zero.)
Sigmoid Neurons
Give a real-valued output that is a smooth and bounded function of their total input.
$z = b + \sum_{i} x_i w_i$
$y = \frac{1}{1 + e^{-z}}$
Stochastic Binary Neurons
Same equations as logistic units, but outputs $1$ (=spike) or $0$ randomly based on the probability. They treat the output of the logistic as the probability of producing a spike in a short time window.
$z = b + \sum_{i} x_i w_i$
$P(s = 1) = \frac{1}{1 + e^{-z}}$
We can do a similar trick for rectified linear units - in this case the output is treated as the Poisson rate for spikes.
Types of Learning
Supervised Learning
Learn to predict an output when given an input vector.
- Regression: The target output is a real number or a whole vector of real numbers.
- Classification: The target output is a class label.
How supervised Learning Typically Works
- Start by choosing a model-class: $y = f(x;W)$
- A model-class $f$ is a way of using some numerical parameters $W$ to map each input vector $x$ into a predicted output $y$.
- Learning usually means adjusting the parameters to reduce the discrepancy between the target output, $t$, on each training case and the actual output, $y$, produced by the model.
- For regression $\frac{1}{2}(y-t)^2$ is often a sensible measure of the discrepancy.
- For classification there are other measures that are generally more sensible (they also work better).
Reinforcement Learning
Learn to select an action to maximize payoff.
- The output is an action or sequence of actions and the only supervisory signal is an occasional scalar reward.
- The goal in selecting each action is to maximize the expected sum of the future rewards.
- We usually use a discount factor for delayed rewards so that we don't have to look too far into the future.
- Reinforcement learning is difficult because:
- The rewards are typically delayed so it's hard to know where we went wrong (or right).
- A scalar reward does not supply much information.
- Typically you can't learn millions of parameters using reinforcement learning (you can with supervised/unsupervised learning). Typically you learn dozens or thousands of parameters.
- Will not be covered in this course.
Unsupervised Learning
Discover a good internal representation of the input.
- For about 40 years unsupervised learning was largely ignored by the machine learning community (except for clustering).
- It is hard to say what the aim of unsupervised learning is:
- One major aim is to create an internal representation of the input that is useful for subsequent supervised or reinforcement learning.
- You can compute the distance to a surface by using the disparity between two images. But you don't want to learn to compute disparities by stubbing your toe thousands of times.
- Other goals:
- Providing a compact, low-dimensional representation of the input.
- High-dimensional inputs typically live on or near a low-dimensional manifold (or several such manifolds)
- Principal Component Analysis is a widely used linear method for finding a low-dimensional representation.
- Providing an economical high-dimensional representation of the input in terms of learned features.
- Binary features
- Real-valued features that are nearly all zero
- Finding sensible clusters in the input
- This is an example of a very sparse code in which only one of the features is non-zero.
- Providing a compact, low-dimensional representation of the input.