K Nearest Neighbours
Dr. Ashish Tendulkar
Machine Learning Practice
IIT Madras
-
It is a type of instance-based learning or non-generalizing learning
- does not attempt to construct a model
- simply stores instances of the training data
Nearest neighbor classifier
- Classification is computed from a simple majority vote of the nearest neighbors of each point.
- Two different implementations of nearest neighbors classifiers are available.
- KNeighborsClassifier
- RadiusNeighborsClassifier
How are KNeighborsClassifier and RadiusNeighborsClassifier different?
KNeighborsClassifier
RadiusNeighborsClassifier
- learning based on the k nearest neighbors
- learning based on the number of neighbors within a fixed radius r of each training point
- most commonly used technique
- used in cases where the data is not uniformly sampled
- choice of the value k is highly data-dependent
- fixed value of r is specified, such that points in sparser neighborhoods use fewer nearest neighbors for the classification
How do you apply KNeighborsClassifier?
Step 1: Instantiate a KNeighborsClassifer estimator without passing any arguments to it to create a classifer object.
from sklearn.neighbors import KNeighborsClassifier
kneighbor_classifier = KNeighborsClassifier()
Step 2: Call fit method on KNeighbors classifier object with training feature matrix and label vector as arguments.
# Model training with feature matrix X_train and
# label vector or matrix y_train
kneighbor_classifier.fit(X_train, y_train)
How do you specify the number of nearest neighbors in KNeighborsClassifier?
-
Specify the number of nearest neighbors K from the training dataset using
n_neighbors
parameter.- value should be int.
kneighbor_classifier = KNeighborsClassifier(n_neighbors = 3)
What is the default value of K?
n_neighbors = 5
How do you assign weights to neighborhood in KNeighborsClassifier?
- It is better to weight the neighbors such that nearer neighbors contribute more to the fit.
weights
- ‘uniform’ : All points in each neighborhood are weighted equally.
- ‘distance’ : weight points by the inverse of their distance.
- closer neighbors of a query point will have a greater influence than neighbors which are further away.
kneighbor_classifier = KNeighborsClassifier(weights= 'uniform')
Default:
Can we define our own weight values for KNeighborsClassifier?
- Yes, it is possible if you have an array of distances.
weights
parameter also accepts a user-defined function which takes an array of distances as input, and returns an array of the same shape containing the weights.
def user_weights(weights_array):
return weights_array
kneighbor_classifier = KNeighborsClassifier(weights=user_weights)
Example:
Which algorithm is used to compute the nearest neighbors in KNeighborsClassifier?
algorithm
‘ball_tree’ will use BallTree
‘kd_tree’ will use KDTree
‘brute’ will use a brute-force search
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to the fit method.
kneighbor_classifier = KNeighborsClassifier(algorithm='auto')
Default:
Some additional parmeters for tree algorithm in KNeighborsClassifier?
leaf_size
For 'ball_tree' and 'kd_tree' algorithms, there are some other parameters to be set.
metric
p
- can affect the speed of the construction and query, as well as the memory required to store the tree
- default = 30
- Distance metric to use for the tree
-
It is either string or callable function
-
some metrics are listed below:
- “euclidean”, “manhattan”, “chebyshev”, “minkowski”, “wminkowski”, “seuclidean”, “mahalanobis”
-
some metrics are listed below:
- default = 'minkowski'
- Power parameter for the Minkowski metric.
- default = 2
How do you apply RadiusNeighborsClassifier?
Step 1: Instantiate a RadiusNeighborsClassifer estimator without passing any arguments to it to create a classifer object.
from sklearn.neighbors import RadiusNeighborsClassifier
radius_classifier = RadiusNeighborsClassifier()
Step 2: Call fit method on RadiusNeighbors classifier object with training feature matrix and label vector as arguments.
# Model training with feature matrix X_train and
# label vector or matrix y_train
radius_classifier.fit(X_train, y_train)
How do you specify the number of neighbors in RadiusNeighborsClassifier?
-
The number of neighbors is specified within a fixed radius r of each training point using
radius
parameter. - r is a float value.
radius_classifier = RadiusNeighborsClassifier(radius=1.0)
What is the default value of r ?
r = 1.0
Parameters for RadiusNeighborsClassifier
weights
algorithm
‘uniform’
‘distance’
[callable] function
default = 'uniform'
‘ball_tree’
‘kd_tree’
‘brute’
default = ‘auto’
‘auto’
leaf_size
metric
p
default = 30
default = 'minkowski'
default = 2
MLP Week 7
By Swarnim POD
MLP Week 7
- 152