Difference between revisions of "Machine Learning"

From TedYunWiki
Jump to navigation Jump to search
Line 8: Line 8:
  
 
== Linear Regression ==
 
== Linear Regression ==
 +
 +
=== Advanced Optimization Algorithms ===
 +
 +
There are advanced algorithms to minimize the cost function other than the gradient descent (from numerical computing). For all of the following algorithms all we need to supply to the algorithm is the function $J(\theta)$ (the cost function) and the partial derivatives of the function $\frac{\partial}{\partial \theta_i} J(\theta)$.
 +
 +
# Gradient descent
 +
# Conjugate gradient
 +
# BFGS
 +
# L-BFGS
 +
 +
Advantages
 +
* No need to manually pick $\alpha$ (the learning rate)
 +
* Often faster than gradient descent
 +
 +
Disadvantages
 +
* More complex
  
 
== Classification Problem ==
 
== Classification Problem ==

Revision as of 19:47, 21 May 2016

Types of Machine Learning

  • Supervised Learning
    • Regression Problem: Continuous valued output.
    • Classification Problem: Discrete valued output.
  • Unsupervised Learning
    • Clustering

Linear Regression

Advanced Optimization Algorithms

There are advanced algorithms to minimize the cost function other than the gradient descent (from numerical computing). For all of the following algorithms all we need to supply to the algorithm is the function $J(\theta)$ (the cost function) and the partial derivatives of the function $\frac{\partial}{\partial \theta_i} J(\theta)$.

  1. Gradient descent
  2. Conjugate gradient
  3. BFGS
  4. L-BFGS

Advantages

  • No need to manually pick $\alpha$ (the learning rate)
  • Often faster than gradient descent

Disadvantages

  • More complex

Classification Problem

Cocktail Party Problem

  • Algorithm
    • [W, s, v] = svd((repmat(sum(x.*x, 1), size(x, 1), 1).*x)*x');

$\log yh_\theta(x) + (1-y) \log (1-h_\theta(x))$ comes from Maximum Likelihood Method in Statistics