Introduction
Machine Learning Practice
Dr. Ashish Tendulkar
IIT Madras
What is Machine Learning and what are we set to achieve in this course?
- Machine learning tries to learn from data.
- NO DATA NO ML.
- Loss function.
- Optimize the loss via optimization algorithms.
- Patterns or model parameters.
- Make predictions.
- Generalize well.
- Train, validation and test sets.
- Cross validation based performance evaluation.
Challenges faced by data scientist
- What accuracy can we expect from the model?
- What model would give the best performance for the given task?
- How do we know that the model has learned sensible relationships or parameters?
- How would the model perform in the wild - on unseen data?
- What is the best way to divide the data into training, dev and test sets?
- How do we set hyper-parameters (HPTs) of the model?
- What are some of the best practices in data explorations? What visualizations make sense?
Important terms in ML
- Model
- Parameters
- Training data
- Training, dev, test division
- Cross validation
- Evaluation metrics
- Loss functions
- Optimization algorithms
Requirements of an ML Library
- Data loading and manipulation
- Preprocess the data, select and extract features. Also called feature engineering.
- Model selection.
- Cross validation.
- Training model.
- Loss functions.
- Gradient descent variations.
- Closed form solution.
- Evaluation Metrics.
Scikit-learn support all of these!
Sklearn modules
Function | Module |
---|---|
Dataset loading | sklearn.datasets |
Preprocessing | sklearn.preprocessing |
Feature imputation | sklearn.impute |
Feature extraction | sklearn.feature_extraction |
Feature selection | sklearn.feature_selection |
Requirements of an ML Library
Function | Module | Model Name |
---|---|---|
Model building | ||
sklearn.linear_model | Supervised linear models | |
sklearn.svm | SVM | |
sklearn.tree | Trees | |
sklearn.neural_network | Artificial neural networks | |
sklearn.cluster | Clustering |
Machine Learning Summary
Estimator Object
- It learns from data.
- It may solve regression, classification or clustering.
Transformers:
- Implement fit(), transform() and fit_transform() methods.
- Data preprocessing objects are transformers.
Predictors:
- Implement fit(), predict() and fit_predict() methods.
- This encompasses classifiers, regressors, clusterers and outlier detectors.
Meta estimators
- A meta estimator takes other estimators as input
- Examples:
- Pipeline
- Ensemble methods
- Model based feature selection
Some resources:
MLP Term 3 Introduction
By Swarnim POD
MLP Term 3 Introduction
- 155