Machine Learning: Regression

Regression in statistics is defined as estimating the relationship among variables. It is defined as to determine the strength of relationship with dependent variable (Y) and change in variables (independent variables (X)). It is to used for financial data, stock market data and business analysis. It helps to understand the relationship between variables(data) generated.

For example, the website user’s friends spending time on the website rather than alternatives.

Linear Regression Model

Linear regression in machine learning is the simplest model to implement as well as to learn. It is the basic and mainly the starting point for the beginner to learn about machine learning. Fortunately, linear regression model are used in some simple machine learning problems successfully.

The intuition is to predict value/output/Y given features/X

Linear regression is defined as Y = šœƒ0 + šœƒ1š‘„1 + šœƒ2š‘„2 + ā‹Æ šœƒš‘„š‘›

Where šœƒ0 is the bias value and  šœƒ1 .. šœƒn are the learning coefficients/intercepts and x0 are number of features. Gradient descent are used to determine the best value for these coefficients.

For using linear regression first make sure you have linear relationship among the data if the data has many feature transform the data into linear. As most statistician remove noise from regression problems, they search for outliers because they can effects the performance of model, outliers mainly effect the mean result. Remove col linearity from data as that can result in overfiting. If the linear regression is a Guassian distribution that it will perform well lastly, rescale your data through normalization for better predictions.

Multiple/Polynomial Regression

Multiple regression is consider as special form of linear regression. It’s features are nth degree, therefore creating a curve (such as Quadratic or cubic). It is used when the relationship between the data is for example quadratic. Since, simple linear regression will underfit the data, therefore to best fit the data, polynomial regression can be used. However, the increasing degree of x can result in overfit data.

Multiple regression is defined as Y = šœƒ0 + šœƒ2š‘„i^2 + … + šœƒmš‘„i^m

Regularization

It is the techniques to overcome the overfitting problem. If the hypothesis is overfitting, means that it is capturing too much and it is trying hard to come up with a predict value therefore looses accuracy.

As you tend to add more features to make the model more accurate but this will result in adding more noise to your model thus making model complex and high error prone with low training data.

Regularization is represented as Ī». Regularization should be choose carefully because regularization reduces variances without losing properties, however if value of regularization increases then it can cause underfitting and lose some properties.

Python Code With Linear Regression

https://github.com/mubeenahmed/machine-learning-journey/tree/master/regression

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: