Section 46 Model Selection: Summary


46.1 Statistical Model

\[ \large y_{i} = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} + \epsilon_{i} \]

\[ i = 1,...,n; \space p = \space number \space of \space predictors \]


46.2 Points to Note

  • Variable selection or model selection is a part of statistical modelling, not the end of statistical modelling.

  • Consider the underlying system (for example, biological underpinnings to identify the scope of model exploration).

  • The aim of model selection is to construct a model that explains the relationships in the data, contributes to valid interpretations and helps in prediction, if necessary. Do not just adopt or rely on a given model selection strategy.

  • Automatic variable selections are not guaranteed to be consistent with these goals; use this as a guide only.

  • Criterion-based methods typically involve a wider search and compare models in a preferable manner. However, this could be computationally intensive.

  • It is possible several candidate models may be suggested which fit the data equally well. Other practical considerations should be taken into account while opting for a final model. For example, the cost of measuring predictors, model accuracy, model assumptions, prediction accuracy etc.