A procedure for identifying an appropriate model in the context of multiple regression. The expectation of the response variable, E(Y), is modelled as a linear combination of many (p, say) explanatory X-variables. A natural question is whether all p of the X-variables are required.
Forward selection begins by determining which one of the X-variables provides most information about Y. This variable is retained in all future models. At the second stage the procedure considers the remaining (p −1) variables and determines which, in conjunction with the first variable, provides most additional information about Y. This procedure continues until there are no further variables that make worthwhile extra contributions to the fit of the model. The successive contributions are compared using an F-test: a contribution is worthwhile if the observed F-value exceeds a critical value often referred to in the jargon of computer packages as the F to enter.
Backward elimination mirrors forward selection by starting with the model containing all p X-variables and removing ineffective variables one by one. A variable is judged to be ineffective if its contribution results in a value for the F-test that fails to exceed the F to remove value.
Forward selection and backward elimination are often referred to as stepwise selection procedures because they move one variable at a time. A general stepwise procedure would combine elements of the two; after each removal stage there would be a check for possible additions.
Subjects: Probability and Statistics.