Statistics practice tutorial

Henry S. Thompson
13 Mar 2012

1. Goals

To make the basics of distributions and significance testing a little less foreign, a little more familiar

2. Flipping coins

What's the most obvious thing about the figure below, taken from this week's lecture, of the result of 100,000 trials of 40 coin flips?:

[no description, sorry]

And why does it look the way it does?

What is the binominial distribution

See the relationship?

3. Monte Carlo experiments with 'coins' and the t-test

These instructions are for Open Office Calc: using them for Excel requires that you replace ';' with ',' throughout, and follow the spirit, not the letter, of the instructions wrt the Chart insertion wizard

3.1. Part A: Create some coin-flip data

  1. Enter =randbetween(0;1) in cell A1
  2. Select A1 and drag the selection down to A20 using the series box (the small black square at the lower-right of the selected cell)
    • In the resulting Fill Series popup, select Series Type: Formulas and OK
  3. Drag the whole column selection over to column J by dragging the series box from A20 over to J20
    • In the resulting Fill Series popup, select Direction: Right, Series Type: Formulas and OK

You now have 20 trials of 10 coin flips, one trial per row

If you press Ctrl+Alt+F9 by mistake, you will want to use Ctrl+Alt+F7 to undo that!

If you press Ctrl+Shift+F9, all the coins will be flipped anew

3.2. Stage B: summary statistics

  1. Enter =sum(A1:J1) in cell K1, and use the series box to extend that down to K20, giving the number of, shall we say heads, for each trial
  2. Add the mean (use the average function) and standard deviation (stdev) over the 20 trials in K21 and K22 respectively
  3. Use Ctrl+Shift+F9 to re-run the 'experiment' several times, to get a feel for the distribution

3.3. Stage C: Histogram

  1. Generate indices 0--10 in cells A44:K44 by entering 0 in A44 and dragging the series box to K44 and
    • In the resulting Fill Series popup, select Direction: Right, Series Type: Linear and OK
  2. Generate result frequencies in cells A45:K45 by entering =COUNTIF($K1:$K20;A44) in A45 and dragging the series box to K45 and
    • In the resulting Fill Series popup, select Direction: Right, Series Type: Formulas and OK
  3. Select A44:K45 and then Insert/Chart from the main menu above
    • In the resulting Chart Wizard, go to Step 2 Data Range and select Data series in rows and First row as label
  4. Select Finish

Re-run with Ctrl+Shift+F9 again and check that the frequency counts and histogram update correctly

4. T-testing the results

4.1. Stage D: Compare your results with a neighbour's

  1. Re-run until you get a mean pretty close to 5
  2. Copy just the values of the resulting mean and standard deviation from K21:K22 to L21:L22 using by selecting them, right-clicking in L21, selecting Paste Special from the pop-up and then setting Formulas off and Numbers on
  3. Re-run again, so now you have mean and s.d. for two complete experiements. Let's test the hypothesis that these come from different distributions (i.e., that the built-in random number generator isn't stable)
  4. Put a "T" label in K25 and a "p" label in K26
  5. Put =abs(K21-L21)/(sqrt(0.5*(K22^2+L22^2))*sqrt(2/20)) in L25 -- that's the Student's t-test for comparing two sample sets with equal sizes, assumed equal variance:
    • [no description, sorry], where
    • [no description, sorry]
    • Essentially this is a z-score for the difference between the means of the two distributions, using a pooled standard deviation
  6. Put =tdist(L25;19;2) in L26, to get the probability that the two samples came from distinct distributions

Re-run several times and see if the difference is ever significant

4.2. Phase E: Doctor your coin! (Extra time)

This would be a lot more fun if you could test a 'doctored' coin

  1. Put the label "Bias" in A22
  2. Put a 0 in A23, and use Insert/Names/Define from the main menu to give A23 the name "Bias"
  3. Replace the formula in A1 with =round(rand()+Bias;0)
  4. As before, drag that first down to A20, then across to K20 to change the whole table

What we've done is create the possibility of biasing our coin, to make a heads outcome more likely

By shifting to using the rand function, we get a random number between 0 and 1, and then round that up or down

If the bias is greater than 0, that should move the mean up by that amount, and make heads consequently more likely

It doesn't take much -- try putting a 0.3 in the Bias, and watch the histogram shift

What's the new p value?

Re-run, and experiment with different biases, to find how much bias it takes to get a signficantly detectable different distribution. . .