28 Machine Learning Pipeline

  • Many advanced machine learning models can be implemented in R and Python

  • Machine learning tools and pipeline are better integrated in Python environment


28.1 Machine Learning Pipeline

  • Machine learning in its core is the process of pattern recognition to draw inferences from data in an automatic fashion.

  • The core machine learning methods are only one part of the challenge.

  • Data availability, processing and most importantly quality are fundamental components of making good models.

  • The interpretation and understanding of the learnt models enables better use of machine learning and improved development.

  • The typical machine learning pipeline is comprised of a number of steps


Data Reading

  • Acquire the data and provide a method to get the data into your development environment.

Data Cleaning

  • Identifying and correcting errors in dataset, organise and understand your data.

Data Preprocessing

  • Transform the data so that it can be easily parsed, and improve stability and consistency of training.

Model Fitting

  • Model training and hyper-parameter tuning.

Evaluating Model Performance

  • Analyse and visualise the resulting model.