Saturday, February 15, 2020

Machine Learning - Hypothesis Testing

Important terminologies -- Hypothesis testing

  • Population = all possible values
  • Sample = a portion of the population
  • Parameter = a characteristic of a population, e.g., the population mean μ
  • Statistic = calculated from data from the sample, e.g., sample mean

Hypothesis Testing

Hypothesis testing test a claim about a population parameter(characteristics) using evidence from sample data.

Steps of hypothesis testing:

A) State Null and alternative hypothesis
B)Calculate test statistics
C)Decide the levels of Significance
D)p-value and decision

Null and Alternative Hypotheses

  • Convert the research question to null and alternative hypotheses
  • The null hypothesis(H0) is a claim of "no difference in the population"
  • the alternative hypothesis (Ha) claims "H0 is false"
  • Collect data and seek evidence against H0 as a way of bolstering Ha(deduction)

Example: "Body Weight"
The problem: in the 1970s, 20-24-year-old men joining the army had an average body weight of 65kg. The standard deviation of body weight was 10kg. We test whether the average body now differs.

The null hypothesis is H0: μ =65(no difference)
The alternative hypothesis can be either Ha: μ > 65(one-sided test) or Ha:μ not equals 65(two-sided test).

Sampling Distributions of Mean




Errors in hypothesis testing




Test Statistic

This is an example of a one-sample test of a mean when sigma is known.use this statistic to test the problem.


P-value
The p-value is the probability of getting test statistics as extreme as the observed value or more extreme than it when H0 is true?

One-sided P-value for z statistic of 0.6

Interpretation 

p-value answers the question: what is the probability of getting the observed test statistic when H0 is true?

Thus, smaller and smaller .P-values provides stronger and stronger evidence against H0

Small p-value => strong evidence H0 is false and Ha is true

Decision Rule

alpha = probability of rejecting H0 when it is true

Set alpha threshold(eg. 0.0  or 0.10, or 0.05)

Reject H0 and retain Ha when p-value less than and equal to alpha.




No comments: