Kick-starting ML with building facial recognition system using KNN

7 min readJun 26, 2020

I have always been intrigued by the potential of Machine learning, as we all have grown up watching movies like ‘Terminator’, ‘I, Robot’ and most recent and my personal favorite ‘Iron-Man’. We could see in these movies some aspects of Machine learning have been used and indicating how our future might look like.

Machine learning has swiftly entered our daily lives, taking from ‘Netflix’ recommendation system to ‘Tesla’s’ self-driven car, we could see profound impact of AI (Artificial Intelligence) in our surroundings.

Image Credit: https://electrek.co/2017/03/13/tesla-vision-autopilot-chip-intel-mobileye/

With these cool applications and enormous potential this technology has, I have also decided to deep dive into it.

In this post, we will start with some basics of Machine learning technologies and building facial recognition system using simplest and yet powerful algorithm K-nearest neighbor (KNN).

What is Machine learning?

Machine learning (ML) is the study of computer algorithms that improve automatically through experiences. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, knows as “training data”, in order to make predictions or decisions without being explicitly programmed to do so.

Machine learning Categories.

Machine learning is generally categorized into three types: Supervised Learning, Unsupervised Learning, Reinforcement Learning. For this post we are focusing on Supervised Learning.

Supervised machine learning is all about learning a function from existing data that can make estimated predictions about new data. In this post we will investigate following:

· Framing problems in terms of Machine learning

· Basic mathematics behind the standard machine learning setup

· Decision making about splitting a data set.

· Facial recognition using KNN algorithm.

The Machine Learning Setup

The purpose of machine learning is to make decisions from data. Following the approach of traditional computer science, one might be tempted to write a carefully designed program that follows some rules to make these decisions. Instead of writing a program in this traditional way, however, data scientists use supervised machine learning to create a computer program that is learned from past data.

To learn from data, you must differentiate between what you know, the features (x), and what you would like to infer, the label (y). For example, your features could describe a patient in a hospital (e.g., gender, age, body temperature, various symptoms) and the label could be if the patient is sick or healthy. You can use data from past medical records to learn a function (h) that is able to determine a future patient’s diagnosis based on their symptoms.

For an incoming patient, when you observe features (x), you can apply the function (h) to predict whether this new patient is sick or healthy (y). The initial stage where you use existing medical records to learn a function is called the training stage, and the latter where you apply the function to a new patient is called the testing stage.

The Hypothesis of a Problem

The function, often referred to as hypothesis and denoted as h, is the program that is learned from the data. You differentiate between the hypothesis class, which is the set of all possible functions that could be learned by the algorithm, and the final hypothesis, which is obtained after learning from our training data. Many possible hypothesis classes exist, and they loosely correspond to different types of learning algorithms. All machine learning algorithms function differently and will have a variety of parameters for their hypothesis class. As we solve more and more problem, we as a data scientist have to identify which algorithm is most suitable for a specific problem.

The Features of a Problem

Features are the relevant characteristics or attributes that you believe may contribute to the outcome of an event. You use features to describe the data from which you are trying to make predictions.

Examples of Feature Vectors

Heterogeneous features: Patient data in medical applications typically contain many heterogeneous features.

Bag-of-words features: Text documents are often stored as bag-of-words features. This is a method to convert a text document with any number of words into a feature vector of fixed dimensionality.

Pixel features: Images are typically stored as pixels. These can be represented as a vector by simply “vectorizing” the image in one long chain of numbers. If an image has six megapixels, and each pixel has three numbers (one for red, green, blue) this yields an 18 million-dimensional vector. Colors are typically stored by their saturation, ranging from zero to 255, so all feature values are non-negative and within that range.

The Labels of a Problem

The label (y) is what you want to predict for a given data instance. Labels can come in many different forms.

Binary

There are only two possible label values. For example, with spam email classification, an email is either spam or not spam. Spam could be mapped to “+1” and email not considered spam to “-1.”

Multiclass

There are multiple distinct label values. For example, in a facial recognition application, you would distinguish each individual as a separate class, such as:

· class1=” Tom Cruise”

· class2=” Elon Musk”

Splitting Data Sets

To avoid overfitting to the training set, you will usually split the data D Into three mutually exclusive subsets:

DTR — training data (80%)

DVA — validation data (10%)

DTE — test data (10%)

A common choice is to split the data 80/10/10, respectively. You then choose a function based on the training data, improve it using the validation data, and evaluate it on the test data.

k-Nearest Neighbors

The k-Nearest Neighbors algorithm is used to determine the label of a given data point based on the label of the other data points closest to it (i.e., its nearest neighbors). For example, in the figure below we have a training data set that contains data points that have been labeled using two different classes, where positive-labeled points are represented as blue plus, and negative-labeled points are represented as negative orange. We want to use k-NN to infer the label of a test point using these training points as a reference.

Let’s insert a black test data point and attempt to label it as positive or negative. The k-NN algorithm makes an assumption that similar data points have similar labels. So, what we will try to do is find the k most similar data points and infer the label of our test data point from the existing data.

Let’s use 3-NN and find the three most similar data points. In this case, we find two blue (+1) and one orange (-1) data points. Now we have a majority vote, and since blue is more common, we assign the blue label to our test data point. That’s all there is to it; you find the k most similar data points, have a majority vote, and then assign that most common label to the test point.

Compute Euclidean Distances in Matrix Form

To begin building a model, you need to first determine which distance metric to use. We will use the Euclidean distance. But before we go into implementation the Euclidean distance function, we must first determine how to represent our vector in a computer program.

In mathematics, typically a vector is represented as a d × 1 matrix (or column vector). However, in computer science it is often more efficient to store a vector as a row vector in a computer. So, for now, we are going to assume all vectors are represented as 1 × d matrix.

Consider two d-dimensional row vectors x and z. The Euclidean distance between these two vectors is d(x,z) = √( x − z ) ( x − z ) T

NumPy is a highly optimized linear algebra library. If we can calculate the Euclidean distance between every pair of points in the training set and the test set using some matrix operations, then we can leverage NumPy’s optimized functions to create an efficient program. To do so, let’s define four matrices:

Based on the four matrices we’ve defined above, we can express D² as a linear combination of G,S,R

D² = S+R-2G

As you can see in above code, we have created function called “calculatedistance’, basically what it does, it will take two matrices with n*d and m*d, calculate distance between them and return one n*m matrix. Inside the function, we are calculating S,G,R and taking square root at the end and returning D matrix. We will use this function to calculate distance for our KNN algorithm.

Now, for facial recognition, we will use faces.mat to load our data. We can also check the data by plotting a graph

Now, we will create another function called findknn, it will find the k nearest neighbors using our above function calculatedistance.

As you can see in above code, we are calling calculatedistance to get the shortest distance matrix and then we are using argsort to find the closest distance neighbor’s index, then sorting the distance matrix at the end and returning indices and distance matrix, we will use indices to find the label(y) in the end.

We have another function to calculate accuracy as shown below.

And final function our KNN classifier that will call all the above functions and return the prediction array.

Final Result: If we run above function with K = 1 then below is accuracy.