Wednesday, February 5, 2020

Data Science, Machine Learning and Artificial Intelligence using Python part-2

The bias-variance tradeoff

In machine learning,bias-variance tradeoff is the problem of simultaneously minimizing two sources of error that prevent supervised learning algorithms from generalizing beyond their training data.

High bias(underfitting) can cause an algorithm to miss the relevant relations between features and target outputs.

High variance(overfitting) can cause an algorithm to model the random noise in the training data, rather than the intended outputs.

Variance

Variance refers to the amount by which "f" would change if we estimated it using different training data sets.

Since the training data are used to fit the statistical learning method, different training data sets will result in a different "f" . But ideally, the estimate for "f" should not vary too much between training sets.

However, if a method has high variance then small changes in the training data can result in large changes in " f " .In general, more flexible statistical methods have higher variance.

Bias 

Bias refers to the error that is introduced by approximating a real-life problem, which may be extremely complicated, by a much simpler model.

For example, linear regression assumes that there is a linear relationship between Y and X1, X2, X3.....XP.It is unlikely that any real-life problem truly has such a simple linear relationship, and So, performing linear regression will undoubtedly result in some bias in the estimate of " f ".

Bias Variance trade-off 

As a general rule, as we use more flexible methods, the variance will increase and the bias will decrease.

The relative rate of change of these two quantities determines whether the test ERROR increases or decreases.

As we increase the flexibility of a class of methods, the bias tends to initially decrease faster than the variance increases. Consequently, the expected test ERROR declines.

However, at some point, increasing flexibility has little impact on the bais but starts to significantly increase the variance. When this happens the test ERROR increases.









No comments: