Gradient Descent and Cost Functions

When talking about linear regression(only one variable) then the hypothesis is

β„Žπœƒ (π‘₯) = πœƒ0 + πœƒ1π‘₯

where hypothesis is best fitting line for the data. In this equation, guessing theta values that creates the best fit line for our linear regression model is important. Therefore to determine the theta values in the equation, there is the need to know what is the best values for those theta in the model.

A cost function is the way to determine the theta value. It is something that minimize. In linear regression the cost function is the sum of square error over the training data sets. Whereas, Simple gradient descent is the iterative method to find the minimum of a function.

Cost Function

Cost function in linear regression is determine by

𝐽(πœƒ) = 1/2π‘š βˆ‘β„Žπœƒ(π‘₯𝑖) βˆ’ 𝑦𝑖)^2

Where h(ΞΈ) is the hypothesis function difference with y (correct output). Summing the squared error. M is the number of training data. Dividing by m is for normalization.

Gradient Descent

Gradient descent is the iterative to determine the cost function, that best fit the hypothesis.

In gradient descent. It first starts with initial value of ΞΈ0 and ΞΈ1. Then apply the cost function those theta. If the gradient descent reaches the global optimum that the gradient descent otherwise the ΞΈ0 and ΞΈ1 are keep changing until it reaches global optimum of  the cost function.

repeat until convergence πœƒπ‘— = πœƒπ‘— βˆ’ 𝛼 πœ•/πœ•πœƒπ‘— 𝐽(πœƒ0,πœƒ1) where Ξ± is the learning steep. This learning steep is the step forward to global optimum. If Ξ± is too slow gradient descent will be slow but if Ξ± is too large it will keep overshooting. If the ΞΈ0 and ΞΈ1 are already on the optimum that the derivatives will give 0.

import numpy as np
import random

def gradient_descent(x, y, theta, alpha, m, iteration):
    # transposing matrices x to multiple by theta
    # multipling x by theta gives the hypothesis
    # visualization of this step is that if theta = [0, 1] and x = [ [1, 101],[1, 122],[1, 143],[1, 214] ]
    # then the equation is h(theta) = theta0 + X.theta1
    x_trans = x.transpose()
    for i in range(0, iteration):
        hypothesis =, theta)
        # Calculating difference
        loss = hypothesis - y
        # cost function 
        cost = np.sum(loss ** 2) / (2 * m)

        gradient =, loss) / m

        theta = theta - alpha * gradient

    return theta

def generate_data(num_points, bias, variance):
    x = np.zeros(shape=(num_points, 2))
    y = np.zeros(shape=(num_points))

    for i in range(0, num_points):
        x[i][0] = 1
        x[i][1] = i

        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = generate_data(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradient_descent(x, y, theta, alpha, m, numIterations)
def hypothesis(x, theta):

plt.plot(x, y, 'ro')
plt.plot(x, hypothesis(x, theta), 'orange')


This is the traditional gradient descent and it is often too slow for training large number of data set. There are different approach for achieving global optimum.

Some are normal equation, stochastic gradient descent, momentum, ada grad, adam optimizer. Since, in real world the situation is different and the data is very sparse therefore adam optimizer has the highest performance rate.


Code is from
Youtube link Machine learning by Andrew NG


1 thought on “Gradient Descent and Cost Functions

  1. […] value andΒ  πœƒ1 .. πœƒn are the learningΒ coefficients/intercepts and x0 are number of features. Gradient descentΒ are used to determine the best value for these […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: