Inf1 OP : Lab Sheet Week 3 Q4 - Mean and Variance
Overview

In this question, you will create a program called MeanVariance which will calculate the mean and variance of a set of data given on the command-line. These two quantities are commonly used to quickly summarise data. You should be familiar with the notion of mean (or average) of a data set; the variance of a data set gives us an idea about how spread-out the data is around the mean.

Maths Overview

A common notation for mean is \( \bar{x} \), and this can be calculated as:

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i = \frac{ x_1 + x_2 + ... + x_{n} } {n} \]

More simply, the mean of \( N \) data items is calculated by adding up all the items and dividing the total by \( N \).

Once you have calculated the mean of a data set, you can calculate the variance. The notation for variance is \( \sigma^2 \). As mentioned above, variance gives us an idea about the spread of data around the mean. If a data set has a high variance, then the data is widely spread out around the mean, whereas if it has low variance then the data is more tightly clustered. Variance of a sample data set is defined as the average of the squared distance between each item and the mean, or more mathematically,

\[ \begin{split}\sigma^2 &= \frac{1}{n} \sum_{i=1} ^{n} { (x_i - \bar{x})^2 } \\ &= \frac{ (x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + ... + (x_{n} - \bar{x})^2 } {n}\end{split} \]

Warning

If you look up definitions of variance, you may find a slightly different formula in which the denominator is \( n-1 \). However this is not what you should use for this exercise.

Worked Example

For this example, we assume our input data are the numbers 65, 45, 34, 87, so we set \( n = 4 \) to be the number of elements.

The mean is calculated as:

\[ \begin{split}\bar{x} &= \frac{65 + 45 + 34 + 87}{4} \\ &= \frac{231}{4} = 57.75\end{split} \]

And the variance is calculated as:

\[ \begin{split}\sigma^2 &= \frac{ (65-57.75)^2 + (45-57.75)^2 +(34-57.75)^2 + (87-57.75)^2 }{4} \\ &= \frac{ 7.25^2 + 12.75^2 + 23.75^2 + 29.25^2}{4} \\ &= \frac{1634.75}{4} \\ &= 408.6875\end{split} \]
Calculate the Mean and Variance

Write a program MeanVariance that reads a data set in the form of a sequence of command-line arguments, and calculates the mean and variance of the data. Your program should print out the mean and variance on two separate lines, as illustrated below:

: java MeanVariance 1.0 1.0 1.0 1.0
1.0
0.0

:java MeanVariance 1.0 3.0 1.0 3.0
2.0
1.0


:java MeanVariance 1.0 2.0 3.4 2.3 5.6 3.4 2.1
2.8285714
1.8820408

It is quite likely that in the third example, your code will obtain a slightly different floating point number. You are only expected to produce a result that is identical up to the first two decimal places with what we have shown.

An automated test has been created for this exercise: MeanVarianceTest.java.