Difference between revisions of "Machine Learning"
Jump to navigation
Jump to search
Line 11: | Line 11: | ||
=== Advanced Optimization Algorithms === | === Advanced Optimization Algorithms === | ||
− | There are advanced algorithms to minimize the cost function other than the gradient descent (from numerical computing). For all of the following algorithms all we need to supply to the algorithm is the function $J(\theta)$ (the cost function) and the partial derivatives of the function $\frac{\partial}{\partial \theta_i} J(\theta)$. | + | There are advanced algorithms to minimize the cost function other than the gradient descent (from numerical computing). For all of the following algorithms all we need to supply to the algorithm is the function $J(\theta)$ (the cost function) and the partial derivatives of the cost function $\frac{\partial}{\partial \theta_i} J(\theta)$. |
# Gradient descent | # Gradient descent |
Revision as of 19:48, 21 May 2016
Types of Machine Learning
- Supervised Learning
- Regression Problem: Continuous valued output.
- Classification Problem: Discrete valued output.
- Unsupervised Learning
- Clustering
Linear Regression
Advanced Optimization Algorithms
There are advanced algorithms to minimize the cost function other than the gradient descent (from numerical computing). For all of the following algorithms all we need to supply to the algorithm is the function $J(\theta)$ (the cost function) and the partial derivatives of the cost function $\frac{\partial}{\partial \theta_i} J(\theta)$.
- Gradient descent
- Conjugate gradient
- BFGS
- L-BFGS
Advantages
- No need to manually pick $\alpha$ (the learning rate)
- Often faster than gradient descent
Disadvantages
- More complex
Classification Problem
Cocktail Party Problem
- Algorithm
- [W, s, v] = svd((repmat(sum(x.*x, 1), size(x, 1), 1).*x)*x');
$\log yh_\theta(x) + (1-y) \log (1-h_\theta(x))$ comes from Maximum Likelihood Method in Statistics