There are 10 men and 10 women. I'd like to investigate whether:
avg(height(men)) - avg(height(women)) >> 0
or not. I observe this different for my samples, and it is 20 cm. I need to see if this difference happened by chance; that is, if I shuffle the label of being a man or woman, will I get the same difference between two groups? If I randomly get 20 or a close number, it means there is nothing interesting about the distinction between the height of men and women, right? What if I get 2 cm? Then, it means the height difference between our true groups remarkably get distant from random. To have a better sense of random shuffling, I should repeat the shuffling many times until I get a probability distribution of the random observables (the height differences). I call this distribution as Null Hypothesis. Now I should look at the probability of my true observed data point, 20 cm, and measure what is the probability of this data point and more extreme result arising by chance, i.e. p-value. These notions are clearly depicted in the following figure obtained from the wikipedia.

No comments:
Post a Comment