## Quick Reference

Various statistics that give information about the reliability of the estimates of the multiple regression model

E(**Y**)=**X***β*,

where **Y** is an *n*×1 vector of independent and identically distributed response variables, *β* is a *p*×1 vector of unknown parameters, and **X** is an *n*×*p* matrix. If *β* is replaced by its least squares estimate, *β̂*, the estimated column vector of fitted values, **ŷ**, is given by

**ŷ**=**Hy**,

where the *n*×*n* matrix **H**, the hat matrix, is given by

**H**=**X**(**X**′**X**)^{−1}**X**′,

**X**′ is the transpose of **X**, (**X**′**X**)^{−1} is the inverse of the matrix **X**′**X**, and **y** is the column vector of observed values. Denote the element in the *j*th row and *k*th column of **H** by *h** _{jk}*. The fitted value,

*ŷ*

*, for the*

_{j}*j*th observation,

*y*

*, is given by Thus there is a direct link between the fitted and observed values in the form of*

_{j}*h*

*. This is the leverage: a large value (e.g.>2*

_{jj}*) indicates an observation having a large influence on the form of the fitted model.*

_{p/n}The most obvious guide to the fit of a model are the residuals, *e*_{1}, *e*_{2},…, where *e** _{j}* is given by

*e** _{j}*=

*y*

*-*

_{j}*ŷ*

*.*

_{j}If the random variables have common variance *σ*^{2} and if *s*^{2} is an unbiased estimate of *σ*^{2}, then the standardized residual is sometimes defined by *e** _{j}*/

*s*. However, an unbiased estimate of the variance of

*e*

*is not*

_{j}*s*

^{2}but

*s*

^{2}(1−

*h*

*) and a more appropriate residual (having unit variance if the model is correct) is given by*

_{jj}*r*

*, where This is sometimes called the standardized residual and sometimes the Studentized residual.*

_{j}The deletion residual is given by

*d** _{j}*=

*y*

*-*

_{j}*ŷ*

*,*

_{j,-j}where *ŷ** _{j,-j}* is the fitted value for observation

*j*based on the fit of the model to all the observations except the observation

*y*

*. Dividing the deletion residual by its estimated standard error, we get the Studentized deletion residual which can be written as where*

_{j}*s*

^{2}

_{-j}is the unbiased estimate of

*σ*

^{2}obtained when observation

*j*is omitted. Confusingly, this may also be called the Studentized residual. See also Anscombe residual; deviance residual.

A related influence statistic is DFFITS, which is an abbreviation for difference in fits. For observation *j*, DFFITS* _{j}* is The influence statistic DFBETA (difference in beta values) applies the idea embodied in DFFITS to the parameter estimates rather than the fitted values. For

*β*

*, DFBETA*

_{k}*is where*

_{k,−j}*β̂*

*is the estimate of*

_{k}*β*

*from the complete data,*

_{k}*β̂*

*is the estimate when observation*

_{k,-j}*j*is omitted, and

*m*

*is the corresponding diagonal element of the*

_{kk}*p*×

*p*matrix (

**X**′

**X**)

^{−1}.

A statistic that usefully combines information about leverage and influence is Cook's statistic, *D** _{j}*, given by This statistic can also be interpreted as measuring the effect on the parameter estimates of omitting the

*j*th observation. Large values point to possible outliers.

*Subjects:*
Probability and Statistics.

## Related content in Oxford Index

##### Reference entries

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.