28 Machine Learning Pipeline
-
Many advanced machine learning models can be implemented in R and Python
-
Machine learning tools and pipeline are better integrated in Python environment
28.1 Machine Learning Pipeline
Machine learning in its core is the process of pattern recognition to draw inferences from data in an automatic fashion.
The core machine learning methods are only one part of the challenge.
Data availability, processing and most importantly quality are fundamental components of making good models.
The interpretation and understanding of the learnt models enables better use of machine learning and improved development.
The typical machine learning pipeline is comprised of a number of steps
Data Reading
- Acquire the data and provide a method to get the data into your development environment.
Data Cleaning
- Identifying and correcting errors in dataset, organise and understand your data.
Data Preprocessing
- Transform the data so that it can be easily parsed, and improve stability and consistency of training.
Model Fitting
- Model training and hyper-parameter tuning.
Evaluating Model Performance
- Analyse and visualise the resulting model.