Regression in statistics is defined as estimating the relationship among variables. It is defined as to determine the strength of relationship with dependent variable (Y) and change in variables (independent variables (X)). It is to used for financial data, stock market data and business analysis. It helps to understand the relationship between variables(data) generated.
For example, the website user’s friends spending time on the website rather than alternatives.
Linear Regression Model
Linear regression in machine learning is the simplest model to implement as well as to learn. It is the basic and mainly the starting point for the beginner to learn about machine learning. Fortunately, linear regression model are used in some simple machine learning problems successfully.
The intuition is to predict value/output/Y given features/X
Linear regression is defined as Y = 𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2 + ⋯ 𝜃𝑥𝑛
Where 𝜃0 is the bias value and 𝜃1 .. 𝜃n are the learning coefficients/intercepts and x0 are number of features. Gradient descent are used to determine the best value for these coefficients.
For using linear regression first make sure you have linear relationship among the data if the data has many feature transform the data into linear. As most statistician remove noise from regression problems, they search for outliers because they can effects the performance of model, outliers mainly effect the mean result. Remove col linearity from data as that can result in overfiting. If the linear regression is a Guassian distribution that it will perform well lastly, rescale your data through normalization for better predictions.
Multiple regression is consider as special form of linear regression. It’s features are nth degree, therefore creating a curve (such as Quadratic or cubic). It is used when the relationship between the data is for example quadratic. Since, simple linear regression will underfit the data, therefore to best fit the data, polynomial regression can be used. However, the increasing degree of x can result in overfit data.
Multiple regression is defined as Y = 𝜃0 + 𝜃2𝑥i^2 + … + 𝜃m𝑥i^m
It is the techniques to overcome the overfitting problem. If the hypothesis is overfitting, means that it is capturing too much and it is trying hard to come up with a predict value therefore looses accuracy.
As you tend to add more features to make the model more accurate but this will result in adding more noise to your model thus making model complex and high error prone with low training data.
Regularization is represented as λ. Regularization should be choose carefully because regularization reduces variances without losing properties, however if value of regularization increases then it can cause underfitting and lose some properties.
Python Code With Linear Regression