# Understanding Linear Regression in Machine Learning

## What is Regression?

Regression is the kind of predictive analysis that investigates the relationship between the response (dependent) variable and explanatory (independent) variable.

Regression is the kind of predictive analysis that investigates the relationship between the response (dependent) variable and explanatory (independent) variable.

It shows the change in the dependent variable on the y-axis to the change in the explanatory variable on the x-axis.

The regression will mainly do two things:

1. Are the set of independent variables better in predicting the dependent variables or not.
2. Which of the variables are significant predictors of the outcome variable.

These regression estimator variables are used to explain the relationship between one dependent variable and one or more independent variables.

What is Linear regression?

It is a linear approach of modeling the relationship between a scalar response (dependent variable) [should be continuous] and one or more explanatory variables (independent variables) [either continuous or categorical]. As the name suggests, the relationship is a linear equation between the dependent and independent variables and is a type of predictive analysis and modeling.

If one explanatory variable is present, it is called Simple Linear Regression. If more than one explanatory variable is present, it is called Multiple Linear Regression. Rarely, multiple correlated dependent variables are predicted. Such a regression model is called Multivariate Linear Regression.

Example: predicting the price of a house, the salary of an employee, the price of a car, etc.

### Model Representation — Simple Linear Regression

We are interested in regressing a line that best fits the provided data points. We need to find m and c in this line equation:

y=mx+c

This line equation gives the correlation between x and y. Note that, while the data set itself is discrete, each x in the regression model has a corresponding y value as the regressed line is a continuous function.

Notations:

m = Number of training examples

x’s = “input” variable/feature — Size of the House as seen in the data above

y’s = “output” variable/target variable — Price of the House as seen in the data above

Prediction is done using a function that takes in independent variables (in this case, Size of the House) and gives out a predicted value. This function is called a hypothesis. The hypothesis function is created using a Learning Algorithm based on its input (Training Set).

How do we represent h?

In this case (Simple Linear Regression), the hypothesis h is the best fit line equation that we’ve identified via the learning algorithm. As a standard, we represent h as: ### Cost Function

The process of finding the hypothesis (done by the learning algorithm) involves identifying the parameters θ0 and θ1. The cost function quantifies the difference between the actual value and predicted value with the selected θ0 and θ1. The idea is to minimize this difference by varying θ0 and θ1 so that our predictions are as close to actual values as possible.

### The learning algorithm used for minimization is Gradient Descent. The algorithm will be discussed further in the upcoming articles.

# Multiple Linear Regression

Multiple Linear Regression involves multiple explanatory variables and one target variable. Naturally, this is just an extension of the equation that was noted above for Simple Linear Regression.