Saturday, February 15, 2020

Machine Learning - Linear Regression


What is a Linear Regression?

Linear Equation:

Y = a+bX+e

Y = Dependent Variable

X= Independent Variable

a=y-intercept

b= slope

e= error term/Residual

Interpretation of b = one unit change of x will change the average/expected value of y by b unit.

Interpretation of a = often y-intercept does not have any practical meaning as x=0 is beyond the scope of the model.

R-square: 0.8 means 80% variation independent variable(e.g sales) can be explained by the independent variables eg. advertising expenditure.

ANOVA: H0: all regression coefficients in population are zero
                 Ha: At least one of the regression coefficient is non zero

T-test : H0: individual regression coefficient in population is zero.
             Ha: individual regression coefficient in population is non zero

Assumptions of linear regression

  • The error term is normally distributed with zero mean and finite variance. Error ~N(0,σ2).
  • For each fixed value of X, the distribution of Y is normal. The means of all these normal distributions of Y, given X, lie on the fitted regression line(plane).
  • Variance of error term is constant.(homoscedasticity). This variance does not depend on the values assumed by X.
  • Error terms are uncorrelated. In other words, the observations are drawn independently.
  • Uncorrelated error with X.
  • Relationship between y and X is linear.
Unusual and Influential data

A single observation that is substantially different from all other observations can make a large difference in the results of your regression analysis
  • Outliers: In the linear regression, an outlier is an observation with a large residual. In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.
  •  Leverage: An observation with an extreme value on a predictor variable is called a point with high leverage. Leverage is a measure of how far an observation deviates from the mean of that variable.
  • Influence: An observation is said to be influential if removing the observation substantially changes the estimates of coefficients. Influence can be thought of as the product of leverage and outlierness.








No comments: