1 files changed, 75 insertions, 0 deletions
diff --git a/en_GB/Introduction to Machine Learning/introduction_to_machine_learning.md b/en_GB/Introduction to Machine Learning/introduction_to_machine_learning.md
new file mode 100644
index 0000000..0e17258
--- /dev/null
+++ b/en_GB/Introduction to Machine Learning/introduction_to_machine_learning.md
@@ -0,0 +1,75 @@
+# Introduction to Machine Learning
+
+## Collection of formulas
+
+### Quadratic error function
+
+$$ E(\textbf{w})=\frac{1}{2}\sum\limits_{n=1}^{N}(y(x_n, \textbf{w}) - t_n)^2 $$
+
+### Quadratic error function with regularization
+
+$$ E(\textbf{w})=\frac{1}{2}\sum\limits_{n=1}^{N}(y(x_n, \textbf{w}) - t_n)^2 + \frac{\lambda}{2}\left\|\textbf{w}\right\|^2 $$
+$$ \lambda := \text{Penalty factor} $$
+
+- "ridge regression"
+
+### Gaussian distribution in 1-D
+
+$$ \mathcal{N}(t|\mu,{\sigma}^{2}) = \frac{1}{\sqrt{2 \pi {\sigma}^{2}}} \text{exp}(-\frac{(t - \mu)^2}{2 {\sigma}^{2}}) $$
+
+### Probabilistic modelling: likelihood in 1-D
+
+$$ p(t | x_0, \textbf{w}, \beta) = \mathcal{N}(t | y(\textbf{w}, x_0), {\sigma}^{2}) $$
+$$ \beta = \frac{1}{{\sigma}^{2}}\ \text{(\emph{precision})} $$
+$$ y(\textbf{w}, x_0) := \text{Output of the model at $x_0$ with parameters \textbf{w}} $$
+
+### Probabilistic modelling: likelihood multidimensional
+
+$$ p(\textbf{t} | \textbf{x}_0, \textbf{w}, {\Sigma}^{-1}) = \mathcal{N}(\textbf{t} | y(\textbf{w}, \textbf{x}_0), {\Sigma}^{-1}) $$
+$$ \Sigma := \text{Covariance matrix} $$
+$$ y(\textbf{w}, \textbf{x}_0) := \text{Output of the model at $\textbf{x}_0$ with parameters \textbf{w}} $$
+
+### Data-likelihood
+
+- Joint distribution over all data together
+- Individual data points are assumed to be independent
+
+$$ L(\textbf{w}) = P(T | X, \textbf{w}, \beta) = \prod\limits_{n=1}^{N} \frac{1}{c} \text{exp}(-\frac{(t_n - y(x_n, \textbf{w}))^2}{2 {\sigma}^{2}}) $$
+$$ T := \text{Set of all target points (data)} $$
+$$ X := \text{Set of all inputs} $$
+$$ c := \text{Normalization constant} $$
+$$ N := \text{Number of all input data} $$
+
+### Parameter optimization from data-likelihood
+
+$$ \text{maximize}\ L(\textbf{w}) \Leftrightarrow \text{minimize}\ -\text{log}L(\textbf{w}) $$
+
+- Sum-of-squares-error is contained in $L(\textbf{w})$, rest are constants
+- It is sufficient to minimize the sum-of-squares-error
+
+$$ \textbf{w}_{\text{ML}} = \text{argmax}_{\textbf{w}}(L(\textbf{w})) = \text{argmin}_{\textbf{w}}(\frac{1}{2} \sum\limits_{n=1}^{N} (y(x_n, \textbf{w}) - t_n)^2) $$
+
+$$ \frac{1}{{\beta}_{\text{ML}}} = \frac{1}{N} \sum\limits_{n=1}^{N} (y(x_n, \textbf{w}_{\text{ML}}) - t_n)^2 $$
+
+### Bayesian inference
+
+$$ P(\textbf{w} | D) = \frac{P(D | \textbf{w}) P(\textbf{w})}{P(D)} $$
+$$ P(\textbf{w} | D) := \text{Posterior} $$
+$$ P(D | \textbf{w}) := \text{Likelihood (model as before)} $$
+$$ P(\textbf{w}) := \text{A-priori probability for \textbf{w} (higher probability for smaller parameters)} $$
+
+### Parameter optimization for bayesian approach
+
+$$ \text{maximize}\ P(\textbf{w} | D) \Leftrightarrow \text{minimize}\ -\text{log}P(\textbf{w} | D) $$
+$$ \textbf{w}_{\text{MAP}} = \text{argmax}_{\textbf{w}}(P(\textbf{w} | D)) = \text{argmin}_{\textbf{w}}(\frac{1}{2} \sum\limits_{n=1}^{N} (y(x_n, \textbf{w}) - t_n)^2 + \frac{\alpha}{2} \textbf{w}^{T} \textbf{w}) $$
+$$ \alpha := \text{Hyperparameter, denoting initial uncertainty} $$
+
+## Definitions
+
+### Likelihood
+
+Function, describing the joint probability of the data $\textbf{x}$ as function of the parameters $\textbf{w}$ of the statistical model.
+
+### Bayesian-approach
+
+Probabilistic model for the parameters, not the actual data.
+\ No newline at end of file