Introduction

Machine Learning Practice

Dr. Ashish Tendulkar

IIT Madras

What is Machine Learning and what are we set to achieve in this course?

Machine learning tries to learn from data.
NO DATA NO ML.
Loss function.
Optimize the loss via optimization algorithms.
Patterns or model parameters.
Make predictions.
Generalize well.
Train, validation and test sets.
Cross validation based performance evaluation.

Challenges faced by data scientist

What accuracy can we expect from the model?
What model would give the best performance for the given task?
How do we know that the model has learned sensible relationships or parameters?
How would the model perform in the wild - on unseen data?
What is the best way to divide the data into training, dev and test sets?
How do we set hyper-parameters (HPTs) of the model?
What are some of the best practices in data explorations? What visualizations make sense?

Important terms in ML

Model
Parameters
Training data
Training, dev, test division
Cross validation
Evaluation metrics
Loss functions
Optimization algorithms

Requirements of an ML Library

Data loading and manipulation
Preprocess the data, select and extract features. Also called feature engineering.
Model selection.
- Cross validation.
Training model.
- Loss functions.
- Gradient descent variations.
- Closed form solution.
Evaluation Metrics.

Scikit-learn support all of these!

Sklearn modules

Function	Module
Dataset loading	sklearn.datasets
Preprocessing	sklearn.preprocessing
Feature imputation	sklearn.impute
Feature extraction	sklearn.feature_extraction
Feature selection	sklearn.feature_selection

Requirements of an ML Library

Function	Module	Model Name
Model building
	sklearn.linear_model	Supervised linear models
	sklearn.svm	SVM
	sklearn.tree	Trees
	sklearn.neural_network	Artificial neural networks
	sklearn.cluster	Clustering

Machine Learning Summary

Estimator Object

It learns from data.
It may solve regression, classification or clustering.

Transformers:

Implement fit(), transform() and fit_transform() methods.
Data preprocessing objects are transformers.

Predictors:

Implement fit(), predict() and fit_predict() methods.
This encompasses classifiers, regressors, clusterers and outlier detectors.

Meta estimators

A meta estimator takes other estimators as input
Examples:
- Pipeline
- Ensemble methods
- Model based feature selection

Some resources: