Difference between revisions of "Neural Networks (Geoffrey Hinton Course)"

From TedYunWiki
Jump to navigation Jump to search
Line 87: Line 87:
 
** Finding sensible clusters in the input
 
** Finding sensible clusters in the input
 
*** This is an example of a <em>very</em> sparse code in which only one of the features is non-zero.
 
*** This is an example of a <em>very</em> sparse code in which only one of the features is non-zero.
 +
 +
== Neural Network Architectures ==
 +
 +
# Feed-forward architecture: information comes into the input units and flows one direction through hidden layers until each reaches the output units.
 +
# Recurrent neural network: information can flow around in cycles.
 +
# Symmetrically connected network: weights are the same in both directions between two units.
 +
 +
===  Feed-forward Neural Networks ===
 +
 +
* The commonest type of neural network
 +
* The first layer is the input and the last layer is the output
 +
* Called "deep" neural networks if there is more than one hidden layer.
 +
* They compute a series of transofrmations that change the similarities between cases.

Revision as of 15:34, 5 November 2016

Some Simple Models or Neurons

$y$ output, $x_i$ input.

Linear Neurons

$y = b + \sum_{i} x_i w_i$

$w_i$ weights, $b$ bias

Binary Threshold Neurons

$z = \sum_{i} x_i w_i$

$y = 1$ if $z \geq \theta$, $0$ otherwise.

Or, equivalently,

$z = b + \sum_{i} x_i w_i$

$y = 1$ if $z \geq 0$, $0$ otherwise.

Rectified Linear Neurons

$z = b + \sum_{i} x_i w_i$

$y = z$ if $z > 0$, $0$ otherwise. (linear above zero, decision at zero.)

Sigmoid Neurons

Give a real-valued output that is a smooth and bounded function of their total input.

$z = b + \sum_{i} x_i w_i$

$y = \frac{1}{1 + e^{-z}}$

Stochastic Binary Neurons

Same equations as logistic units, but outputs $1$ (=spike) or $0$ randomly based on the probability. They treat the output of the logistic as the probability of producing a spike in a short time window.

$z = b + \sum_{i} x_i w_i$

$P(s = 1) = \frac{1}{1 + e^{-z}}$

We can do a similar trick for rectified linear units - in this case the output is treated as the Poisson rate for spikes.

Types of Learning

Supervised Learning

Learn to predict an output when given an input vector.

  • Regression: The target output is a real number or a whole vector of real numbers.
  • Classification: The target output is a class label.

How supervised Learning Typically Works

  1. Start by choosing a model-class: $y = f(x;W)$
    • A model-class $f$ is a way of using some numerical parameters $W$ to map each input vector $x$ into a predicted output $y$.
  2. Learning usually means adjusting the parameters to reduce the discrepancy between the target output, $t$, on each training case and the actual output, $y$, produced by the model.
    • For regression $\frac{1}{2}(y-t)^2$ is often a sensible measure of the discrepancy.
    • For classification there are other measures that are generally more sensible (they also work better).

Reinforcement Learning

Learn to select an action to maximize payoff.

  • The output is an action or sequence of actions and the only supervisory signal is an occasional scalar reward.
    • The goal in selecting each action is to maximize the expected sum of the future rewards.
    • We usually use a discount factor for delayed rewards so that we don't have to look too far into the future.
  • Reinforcement learning is difficult because:
    • The rewards are typically delayed so it's hard to know where we went wrong (or right).
    • A scalar reward does not supply much information.
  • Typically you can't learn millions of parameters using reinforcement learning (you can with supervised/unsupervised learning). Typically you learn dozens or thousands of parameters.
  • Will not be covered in this course.

Unsupervised Learning

Discover a good internal representation of the input.

  • For about 40 years unsupervised learning was largely ignored by the machine learning community (except for clustering).
  • It is hard to say what the aim of unsupervised learning is:
    • One major aim is to create an internal representation of the input that is useful for subsequent supervised or reinforcement learning.
    • You can compute the distance to a surface by using the disparity between two images. But you don't want to learn to compute disparities by stubbing your toe thousands of times.
  • Other goals:
    • Providing a compact, low-dimensional representation of the input.
      • High-dimensional inputs typically live on or near a low-dimensional manifold (or several such manifolds)
      • Principal Component Analysis is a widely used linear method for finding a low-dimensional representation.
    • Providing an economical high-dimensional representation of the input in terms of learned features.
      • Binary features
      • Real-valued features that are nearly all zero
    • Finding sensible clusters in the input
      • This is an example of a very sparse code in which only one of the features is non-zero.

Neural Network Architectures

  1. Feed-forward architecture: information comes into the input units and flows one direction through hidden layers until each reaches the output units.
  2. Recurrent neural network: information can flow around in cycles.
  3. Symmetrically connected network: weights are the same in both directions between two units.

Feed-forward Neural Networks

  • The commonest type of neural network
  • The first layer is the input and the last layer is the output
  • Called "deep" neural networks if there is more than one hidden layer.
  • They compute a series of transofrmations that change the similarities between cases.