What is a Linear Regression?
Linear Equation:
Y = a+bX+e
Y = Dependent Variable
X= Independent Variable
a=y-intercept
b= slope
e= error term/Residual
Interpretation of b = one unit change of x will change the average/expected value of y by b unit.
Interpretation of a = often y-intercept does not have any practical meaning as x=0 is beyond the scope of the model.
R-square: 0.8 means 80% variation independent variable(e.g sales) can be explained by the independent variables eg. advertising expenditure.
ANOVA: H0: all regression coefficients in population are zero
Ha: At least one of the regression coefficient is non zero
T-test : H0: individual regression coefficient in population is zero.
Ha: individual regression coefficient in population is non zero
Assumptions of linear regression
- The error term is normally distributed with zero mean and finite variance. Error ~N(0,σ2).
- For each fixed value of X, the distribution of Y is normal. The means of all these normal distributions of Y, given X, lie on the fitted regression line(plane).
- Variance of error term is constant.(homoscedasticity). This variance does not depend on the values assumed by X.
- Error terms are uncorrelated. In other words, the observations are drawn independently.
- Uncorrelated error with X.
- Relationship between y and X is linear.
Unusual and Influential data
A single observation that is substantially different from all other observations can make a large difference in the results of your regression analysis
- Outliers: In the linear regression, an outlier is an observation with a large residual. In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.
- Leverage: An observation with an extreme value on a predictor variable is called a point with high leverage. Leverage is a measure of how far an observation deviates from the mean of that variable.
- Influence: An observation is said to be influential if removing the observation substantially changes the estimates of coefficients. Influence can be thought of as the product of leverage and outlierness.