K Nearest Neighbours

Dr. Ashish Tendulkar

Machine Learning Practice

IIT Madras

  • It is a type of instance-based learning or non-generalizing learning
    • does not attempt to construct a model
    • simply stores instances of the training data

Nearest neighbor classifier

  • Classification is computed from a simple majority vote of the nearest neighbors of each point.
  • Two different implementations of nearest neighbors classifiers are available.
  1. KNeighborsClassifier
  2. RadiusNeighborsClassifier

How are KNeighborsClassifier and RadiusNeighborsClassifier different?

KNeighborsClassifier

RadiusNeighborsClassifier

  • learning based on the k nearest neighbors
  • learning based on the number of neighbors within a fixed radius r of each training point
  • most commonly used technique
  • used in cases where the data is not uniformly sampled
  • choice of the value k is highly data-dependent
  • fixed value of r is specified, such that points in sparser neighborhoods use fewer nearest neighbors for the classification

How do you apply KNeighborsClassifier?

Step 1: Instantiate a KNeighborsClassifer estimator without passing any arguments to it to create a classifer object.

from sklearn.neighbors import KNeighborsClassifier
kneighbor_classifier = KNeighborsClassifier()

Step 2: Call fit method on KNeighbors classifier object with training feature matrix and label vector as arguments.

# Model training with feature matrix X_train and 
# label vector or matrix y_train
kneighbor_classifier.fit(X_train, y_train)

How do you specify the number of nearest neighbors in KNeighborsClassifier?

  • Specify the number of nearest neighbors K from the training dataset using n_neighbors parameter. 
    • value should be int.
kneighbor_classifier = KNeighborsClassifier(n_neighbors = 3)

What is the default value of K?

n_neighbors = 5 

How do you assign weights to neighborhood in KNeighborsClassifier?

  • It is better to weight the neighbors such that nearer neighbors contribute more to the fit.

weights 

  • uniform’ : All points in each neighborhood are weighted equally.
  • distance’ : weight points by the inverse of their distance.
    • closer neighbors of a query point will have a greater influence than neighbors which are further away.
kneighbor_classifier = KNeighborsClassifier(weights= 'uniform')

Default:

Can we define our own weight values for KNeighborsClassifier?

  • Yes, it is possible if you have an array of distances.
  • weights  parameter also accepts a user-defined function which takes an array of distances as input, and returns an array of the same shape containing the weights.
def user_weights(weights_array):
    return weights_array

kneighbor_classifier = KNeighborsClassifier(weights=user_weights)

Example:

Which algorithm is used to compute the nearest neighbors in KNeighborsClassifier?

algorithm 

ball_tree’ will use BallTree

kd_tree’ will use KDTree

brute’ will use a brute-force search

auto’ will attempt to decide the most appropriate algorithm based on the values passed to the fit method.

kneighbor_classifier = KNeighborsClassifier(algorithm='auto')

Default:

Some additional parmeters for tree algorithm in KNeighborsClassifier?

leaf_size 

For 'ball_tree' and 'kd_tree' algorithms, there are some other parameters to be set. 

metric 

p 

  • can affect the speed of the construction and query, as well as the memory required to store the tree
  • default = 30
  • Distance metric to use for the tree
  • It is either string or callable function
    • ​some metrics are listed below:
      • “euclidean”, “manhattan”, “chebyshev”, “minkowski”, “wminkowski”, “seuclidean”, “mahalanobis”
  • default = 'minkowski'
  • Power parameter for the Minkowski metric.
  • default = 2

How do you apply RadiusNeighborsClassifier?

Step 1: Instantiate a RadiusNeighborsClassifer estimator without passing any arguments to it to create a classifer object.

from sklearn.neighbors import RadiusNeighborsClassifier
radius_classifier = RadiusNeighborsClassifier()

Step 2: Call fit method on RadiusNeighbors classifier object with training feature matrix and label vector as arguments.

# Model training with feature matrix X_train and 
# label vector or matrix y_train
radius_classifier.fit(X_train, y_train)

How do you specify the number of neighbors in RadiusNeighborsClassifier?

  • The number of neighbors is specified within a fixed radius r of each training point using radius parameter.
  • r is a float value.
radius_classifier = RadiusNeighborsClassifier(radius=1.0)

What is the default value of r ?

r = 1.0 

Parameters for RadiusNeighborsClassifier

weights 

algorithm 

‘uniform’

‘distance’

[callable] function

default = 'uniform'

‘ball_tree’

‘kd_tree’

‘brute’

default = ‘auto’

‘auto’

leaf_size 

metric 

p 

default = 30

default = 'minkowski'

default = 2

MLP Week 7

By Swarnim POD

MLP Week 7

  • 152