The attribution of variation in a variable to variations in one or more explanatory variables. The term was introduced by Sir Ronald Fisher in 1918.
A measure of the total variability in a set of data is given by the sum of squared differences of the observations from their overall mean. This is the total sum of squares (TSS). It is often possible to subdivide this quantity into components that are identified with different causes of variation. The full subdivision is usually set out in an analysis of variance table, as suggested by Sir Ronald Fisher in his 1925 book Statistical Methods for Research Workers. Each row of the table is concerned with one or more of the components of the observed variation. The entries on a row usually include the sum of squares (SS), the corresponding number of degrees of freedom (ν), and their ratio, the mean square (=SS/ν).
After the contributions of all the specified sources of variation have been determined, the remainder, often called the residual sum of squares (RSS) or error sum of squares, is attributed to random variation. The mean square corresponding to RSS is often used as the yardstick for assessing the importance of the specified sources of variation. One method involves comparing ratios of mean squares with the critical values of an F-distribution.
The proportion of variation explained by the model is which is sometimes called the coefficient of determination.
In an ANOVA analysis each explanatory variable takes one of a small number of values. If, instead, some explanatory variables are continuous in nature, then the resulting models are called ANOCOVA models. ANOVA can also be thought of as multiple regression using only dummy variables.
As an example, suppose that four varieties of tomatoes are grown in three grow-bags giving the yields (in g) shown below. The explanatory variables are the grow-bags and the varieties. The following ANOVA table results:Since the mean square for varieties is much greater than that for grow-bags we can conclude that differences between varieties are more important. However, the residual sum of squares amounts to nearly half the total sum of squares, indicating that there are major unexplained sources of variation.
The proportion of the total sum of squares that is attributed to any particular source of variation is referred to as eta-squared (η2). Thus, the difference between grow-bags has . A related statistic is partial eta-squared, which is the ratio of the sum of squares of interest to the total of that sum of squares and the residual sum of squares. Thus, for grow-bags, the partial eta-squared is
Often, particular comparisons between the treatments are of interest. These are referred to as contrasts. The set of contrasts consisting of, say, (i) a comparison of treatment 1 with the average effect of treatments 2 to t, (ii) a comparison of treatment 2 with the average effect of treatments 3 to t, etc. are called Helmert contrasts. Both these contrasts and those corresponding to orthogonal polynomials lead to an orthogonal variance–covariance matrix. See also experimental design.
Subjects: Probability and Statistics.