**Neural Network**

Artificial neural network is the predictive model. Artificial Neural network in machine learning is inspiration for human brain and how it works. Our brain consists of the collection of neurons wired together. Every neuron connected to each other takes output from previous neurons that feed into it, does the calculation and then either fires or doesn’t.

Neural network can solve a wide variety of problems like handwriting recognition and face detection, and they are used heavily in deep learning. Neural network are “black box” meaning its internal can’t be understand about how they are working. Furthermore a large Neural network are difficult to train. In training large neural network help will be required from another data scientist.

How neural networks work in machine learning?

In neural network, there are one input layers, hidden layers and out layers. Layers can be donate as L(i) where is the index of layer. Whereas s(l) is the number of units in layer without bias unit.

Here X1 is the bias input. Furthermore, weights are same as parameters in linear or other regressions. Input layer combining with parameter/weight and bias value feeds input into hidden layer. Please note that hidden layer cannot be observed by the observers. In hidden layer, ai(j) are activation of unit i in layer j.

Activation function in neural network combine sum of inputs with weights and bias value, then gives the output in zero or one (fire the activation or not fire the activation function). There are different type of activation function but the simplest and common known activation function is Sigmod function.

These neural network when combined and represented in mathematical can be viewed as.

a1(2) = g(w10(1)x0 + w11(1)x1 + w12(1)x2 + w13(1)x3)

a2(2) = g(w20(1)x0 + w21(1)x1 + w12(1)x2 + w23(1)x3)

a3(2) = g(w30(1)x0 + w31(1)x1 + w32(1)x2 + w33(1)x3)

Finally hΘ(x) will be the sum of all the hidden layer unit’s output into the output layer to give the final result

hΘ(x) = a1(3) = g(a0(2).w10 (2)+ a1(2).w11(2) + a2(2).w12(2) + a3(2).w13(2))

Same as other machine learning algorithm, the problem is to determine the best fitting parameters. In neural network number of output unit is donated as K for example if we have binary output (0 or 1) then the K = 1 if we have multiple output units, let say 4 then K >= 3.

In the cost function of neutral network, if taking general logistic regression cost function, the logistic regression has one output units. Instead of having one output unit, neural network has K output unit. It is the sum of all the m training set nested with sum K unit in a layer.

**Back Propagation **

To minimize J(Θ) in the neural network we will use back propagation. We need to compute partial derivatives of J(Θ). The concepts are very similar to traditional minimization of J(Θ).

The first thing, apply the forward propagation to the neural network. Let say we have one only training set (x, y), then our vectorized output of neural network will be as follows

a(1) = x then adding parameter to first layer activation value

z(2) = Θ(1).a(1), then apply the activation function to z(2) this will give the a(2) activation value as a(2) = g(z(2)) (adding a0(2) bias) then so on till the hΘ(x).

To compute the error, we will use back propagation. It can be donated as ∂j(l), it is called as error in node j of layer l.

This can be computed as; for each unit in layer, let say 4th layer is our output layer, we can compute the error by ∂j(4) = aj(4) – yi. Now for the hidden layers, the back propagation formula represented as

Again for each unit in layer

∂j(3) = (Θ(3))^T ∂j(4) . g'(z(3))

∂j(2) = (Θ(2))^T ∂j(3) . g'(z(2))

Where g’ is the derivatives of activation function, similar as a(l).(1 – a(1)). There will be no ∂(1) terms because it is the input layer therefore input value should not be change. In the end we will accumulate the delta terms with the partial derivatives of delta terms

Δij(l) = Δij(l) + aj(l)∂j(l)

Back propagation is the similar as forward propagation, instead for from left to right, back propagation is from right to left.

We will be implementing back propagation in Python.

**References:**

Machine Learning Lectures by Andrew NG