Algorithms for calculating variance
Sajun.org
In statistics, a formula for calculating the variance of a population of size ''n'' is: :<math>\mathit{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n^2}</math> A formula for calculating the unbiased estimation of the population variance from ''n'' finite samples is: :<math>\mathit{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n(n-1)}</math> The method of calculation may be more easily understood from the table below where the mean is 8.
{| width="90%" border="0"
|-
| '''i ''' || '''xi ''' || '''xi-mean '''
| '''(xi-mean)2'''
|-
| (index) || (datum) || (deviation) || (squared deviation)
|-
| 1 || 5 || -3 || 9
|-
| 2 || 7 || -1 || 1
|-
| 3 || 8 || 0 || 0
|-
| 4 || 10 || 2 || 4
|-
| 5 || 10 || 2 || 4
|-
| n=5 || sum=40 || 0 || 18
|}
- mean = 40/5 = 8
- variance = (5*338 - 402)/(5 * 4) = 4.5
- standard deviation = <math>\sqrt{\mathit{Variance}}</math>= 2.12
40 = [5 + 7 + 8 + 10 + 10] ==Algorithm== Therefore a simple algorithm to calculate variance can be described by the following pseudocode:
double sum; double sum_sqr; double variance; long n = data.length; // the number of elements in the data array (the actual syntax is language-specific) for i = 1 to n sum += data[i]; sum_sqr += ( data[i] * data[i] ); end for variance = ((n * sum_sqr) - (sum * sum))/(n*(n-1));
[편집]
Algorithm
Another algorithm which avoids large numbers in sum_sqr while summing updouble avg = 0; double var = 0; long n = data.length; // number of elements for i = 1 to n avg = (avg*i + data[i]) / (i + 1); var = (var * (i - 1) + (data[i] - avg)*(data[i] - avg)) / i; end for return var; // resulting varianceThe above algorithm does not seem to be correct. The resulting variance does not equal either one of the variances whose formulas are given at the top of the page.
