KNN Algorithm
KNN Algorithm also known as K-Nearest Neighbour Classification Algorithm.
The K-NN algorithm examines a new data entry by comparing it to the values present in a provided data set that consists of various classes or categories. By evaluating its proximity or resemblances to a specific range (K) of neighboring data points, the algorithm assigns the new data to a class or category within the data set.
Following are the steps to be followed for KNN Algorithm
Step-1: Prepare the data set with various categories or classes
Step-2: Decide the value of K
K is nothing but the square root of the total number of entries in the data set. K should not be divisible by the number of categories that exist in the data set and K should always be odd
Step-3: Determine the query data
Step-4: Calculate the distance between the query point and each element of the data set and store it in a list
Step-5: Sort the list
Step-6: Choose the top K elements from the list
Step-7: Determine the mode of categories of the top K elements, which will be our answer.
Let us take an example of two groups of points such as Red Group and Green Group we have a query point and we need to determine that in which group that group belongs.
Green Group
[(2,3), (1,2), (1,4), (2,4), (1,3), (3,2), (3.5,1), (1,3.5), (0.5,1.5)]
Red Group
[(6,3), (5,2), (4,4), (3.5,4.5), (5,4), (6,3), (5,3)]
Query Point
(3.5,3.5)
def myFunc(a):
return a[1]
a=[(2,3),(1,2),(1,4),(2,4),(1,3),(3,2),(3.5,1),(1,3.5),(0.5,1.5)]
b=[(6,3),(5,2),(4,4),(3.5,4.5),(5,4),(6,3),(5,3)]
n=int((len(a)+len(b))**0.5)
if n%2==0: n+=1
print("N for the following data sets is:",n)
c=(3.5,3.5)
print("Data Set (A):",a)
print("Data Set (B):",b)
print("Query Point (C):",c)
frequencyList=[]
for i in a: frequencyList.append(('A',((c[0]-i[0])**2+(c[1]-i[1])**2)**0.5))
for i in b: frequencyList.append(('B',((c[0]-i[0])**2+(c[1]-i[1])**2)**0.5))
frequencyList.sort(key=myFunc)
frequencyList=frequencyList[:n]
print(frequencyList)
ans=None
aa=0
bb=0
for i in frequencyList:
if i[0]=='A':aa+=1
if i[0]=='B':bb+=1
if aa>bb: ans='A'
else: ans='B'
print()
print(f"{c} belongs to {ans} group")
We can clearly see that there are 5 nearest points, of which 3 belong to the Red (A) group and 2 belong to the Green (B) group, thus the answer is A