k-nearest neighbors (KNN)
One of several (supervised learning) classifier algorithms used in data mining and machine learning is K-Nearest Neighbors, commonly known as KNN. It is a classifier method where the learning is based on “how similar” is a data (a vector) from other.
How does K-NN work?
Suppose we have:
- a dataset A,
- a defined distance metric, which we’ll employ to calculate the separation between the group of observations.
- An integer I indicating the minimum necessary of nearby neighbors that should be taken into account when determining closeness.
These procedures will be taken in order to forecast the output y for a fresh observation X:
- Determine the total distances between every data point and the X observable.
- Keep the I observations that represent the closer observable point X distances.
- Using the y outputs derived from the I observations, do the following actions:
a. If the problem is one of regression, use the mean of the y deductions;
b. If the problem is one of classification, use the mode of the y deductions.
4. The value determined in step 3 will serve as the final projection.