In this question, you will create a program called MeanVariance which will calculate the mean and variance of a set of data given on the command-line. These two quantities are commonly used to quickly summarise data. You should be familiar with the notion of mean (or average) of a data set; the variance of a data set gives us an idea about how spread-out the data is around the mean.
A common notation for mean is \( \bar{x} \), and this can be calculated as:
\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i = \frac{ x_1 + x_2 + ... + x_{n} } {n} \]More simply, the mean of \( N \) data items is calculated by adding up all the items and dividing the total by \( N \).
Once you have calculated the mean of a data set, you can calculate the variance. The notation for variance is \( \sigma^2 \). As mentioned above, variance gives us an idea about the spread of data around the mean. If a data set has a high variance, then the data is widely spread out around the mean, whereas if it has low variance then the data is more tightly clustered. Variance of a sample data set is defined as the average of the squared distance between each item and the mean, or more mathematically,
\[ \begin{split}\sigma^2 &= \frac{1}{n} \sum_{i=1} ^{n} { (x_i - \bar{x})^2 } \\ &= \frac{ (x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + ... + (x_{n} - \bar{x})^2 } {n}\end{split} \]Warning
If you look up definitions of variance, you may find a slightly different formula in which the denominator is \( n-1 \). However this is not what you should use for this exercise.
For this example, we assume our input data are the numbers 65, 45, 34, 87, so we set \( n = 4 \) to be the number of elements.
The mean is calculated as:
\[ \begin{split}\bar{x} &= \frac{65 + 45 + 34 + 87}{4} \\ &= \frac{231}{4} = 57.75\end{split} \]And the variance is calculated as:
\[ \begin{split}\sigma^2 &= \frac{ (65-57.75)^2 + (45-57.75)^2 +(34-57.75)^2 + (87-57.75)^2 }{4} \\ &= \frac{ 7.25^2 + 12.75^2 + 23.75^2 + 29.25^2}{4} \\ &= \frac{1634.75}{4} \\ &= 408.6875\end{split} \]Write a program MeanVariance that reads a data set in the form of a sequence of command-line arguments, and calculates the mean and variance of the data. Your program should print out the mean and variance on two separate lines, as illustrated below:
: java MeanVariance 1.0 1.0 1.0 1.0 1.0 0.0 :java MeanVariance 1.0 3.0 1.0 3.0 2.0 1.0 :java MeanVariance 1.0 2.0 3.4 2.3 5.6 3.4 2.1 2.8285714 1.8820408
It is quite likely that in the third example, your code will obtain a slightly different floating point number. You are only expected to produce a result that is identical up to the first two decimal places with what we have shown.
An automated test has been created for this exercise: MeanVarianceTest.java.