KNN Algorithm

Tanishq Rawat
3 min readJun 28, 2023

--

KNN Algorithm also known as K-Nearest Neighbour Classification Algorithm.

The K-NN algorithm examines a new data entry by comparing it to the values present in a provided data set that consists of various classes or categories. By evaluating its proximity or resemblances to a specific range (K) of neighboring data points, the algorithm assigns the new data to a class or category within the data set.

Following are the steps to be followed for KNN Algorithm

Step-1: Prepare the data set with various categories or classes

Step-2: Decide the value of K

K is nothing but the square root of the total number of entries in the data set. K should not be divisible by the number of categories that exist in the data set and K should always be odd

Step-3: Determine the query data

Step-4: Calculate the distance between the query point and each element of the data set and store it in a list

Step-5: Sort the list

Step-6: Choose the top K elements from the list

Step-7: Determine the mode of categories of the top K elements, which will be our answer.

Let us take an example of two groups of points such as Red Group and Green Group we have a query point and we need to determine that in which group that group belongs.

Green Group

[(2,3), (1,2), (1,4), (2,4), (1,3), (3,2), (3.5,1), (1,3.5), (0.5,1.5)]

Red Group

[(6,3), (5,2), (4,4), (3.5,4.5), (5,4), (6,3), (5,3)]

Query Point

(3.5,3.5)

def myFunc(a):
return a[1]
a=[(2,3),(1,2),(1,4),(2,4),(1,3),(3,2),(3.5,1),(1,3.5),(0.5,1.5)]
b=[(6,3),(5,2),(4,4),(3.5,4.5),(5,4),(6,3),(5,3)]
n=int((len(a)+len(b))**0.5)
if n%2==0: n+=1
print("N for the following data sets is:",n)
c=(3.5,3.5)
print("Data Set (A):",a)
print("Data Set (B):",b)
print("Query Point (C):",c)
frequencyList=[]
for i in a: frequencyList.append(('A',((c[0]-i[0])**2+(c[1]-i[1])**2)**0.5))
for i in b: frequencyList.append(('B',((c[0]-i[0])**2+(c[1]-i[1])**2)**0.5))
frequencyList.sort(key=myFunc)
frequencyList=frequencyList[:n]
print(frequencyList)
ans=None
aa=0
bb=0
for i in frequencyList:
if i[0]=='A':aa+=1
if i[0]=='B':bb+=1
if aa>bb: ans='A'
else: ans='B'
print()
print(f"{c} belongs to {ans} group")

We can clearly see that there are 5 nearest points, of which 3 belong to the Red (A) group and 2 belong to the Green (B) group, thus the answer is A

Joining the nearest points with their group color line
For query points (4,3.5), it results in the Green group

--

--

Tanishq Rawat
Tanishq Rawat

Written by Tanishq Rawat

SDE @ Mobileum | Ex-SDE Intern at Cerebry

No responses yet