Section 46 Model Selection: Summary

46.1 Statistical Model

\[ \large y_{i} = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} + \epsilon_{i} \]

\[ i = 1,...,n; \space p = \space number \space of \space predictors \]

Variable selection or model selection is a part of statistical modelling, not the end of statistical modelling.
Consider the underlying system (for example, biological underpinnings to identify the scope of model exploration).
The aim of model selection is to construct a model that explains the relationships in the data, contributes to valid interpretations and helps in prediction, if necessary. Do not just adopt or rely on a given model selection strategy.
Automatic variable selections are not guaranteed to be consistent with these goals; use this as a guide only.
Criterion-based methods typically involve a wider search and compare models in a preferable manner. However, this could be computationally intensive.
It is possible several candidate models may be suggested which fit the data equally well. Other practical considerations should be taken into account while opting for a final model. For example, the cost of measuring predictors, model accuracy, model assumptions, prediction accuracy etc.