Notes

Notes - notes.io

The dataset I used is the following: https://www.kaggle.com/c/facial-keypoints-detection provided by Dr. Yoshua Bengio of the University of Montreal.

Each predicted keypoint is specified by an (x,y) real-valued pair in the space of pixel indices. There are 15 key points, which represent the different elements of the face. The input image is given in the last field of the data files, and consists of a list of pixels (ordered by row), as integers in (0,255). The images are 96x96 pixels.
Now that we have a good idea about the kind of data we are dealing with, we need to preprocess it so that we can use it as inputs to our model.
Step 1: Data Preprocessing and other shenanigans
The above dataset has two files that we need to concern ourselves with — training.csv and test.csv. The training file has 31 columns: 30 columns for the key point coordinates, and the last column containing the image data in a string format. It contains 7049 samples, however, many of these examples have ‘NaN’ values for some key points which make things tough for us. So we shall only consider the samples without any NaN values. Here’s the code that does exactly that: (The following code also normalizes the image and keypoint data, which is a very common data preprocessing step)
Everything well and good? Not really, no. It seems that there were only 2140 samples that didn’t contain any NaN values. These are way fewer samples to train a generalized and accurate model. So to create more data, we need to augment our current data.
Data Augmentation is a technique used to generate more data from existing data, by using techniques like scaling, translation, rotation, etc. In this case, I mirrored each image and its corresponding key points, because techniques like scaling and rotation might have distorted the face images and would have thus screwed up the model. Finally, I combined the original data with the new augmented data to get a total of 4280 samples.
Step 2: Model architecture and Training
Now let’s dive into the Deep Learning section of the project. We aim to predict coordinate values for each key point for an unseen face, hence it’s a regression problem. Since we are working with images, a Convolutional Neural Network is a pretty obvious choice for feature extraction. These extracted features are then passed to a fully connected neural network which outputs the coordinates. The final Dense layer needs to 30 neurons because we need to 30 values(15 pairs of (x,y) coordinates).
‘ReLu’ activations are used after each Convolutional and Dense layer, except for the last Dense layer since these are the coordinate values we require as output
Dropout Regularization is used to prevent overfitting
Max Pooling is added for Dimensionality Reduction
The model was able to reach a minimum loss of ~0.0113, and accuracy of ~80%, which I thought was decent enough.
I also needed to check the model’s performance on an image from my webcam, because that is what the model would receive during the filter implementation, here’s how the model performed on this image of my beautiful face:

Step 3: Put the model into action
We got our model working, so all we gotta do now is use OpenCV to do the following:
Get image frames from the webcam
Detect region of the face in each image frame because the other sections of the image are useless to the model (I used the Frontal Face Haar Cascade to crop out the region of the face)
Preprocess this cropped region by — converting to grayscale, normalizing, and reshaping
Pass the preprocessed image as input to the model
Get predictions for the key points and use them to position different filters on the face
I did not have any particular filters in mind when I began testing. I came up with the idea for the project around 22 December 2018, and being a huge Christmas fanboy like any other normal human being, I decided to go with the following filters:

I used particular key points for the scaling and positioning of each of the above filters:
Glasses Filter: The distance between the left-eye-left-keypoint and the right-eye-right-keypoint is used for the scaling. The brow-keypoint and left-eye-left-keypoint are used for the positioning of the glasses
Beard Filter: The distance between the left-lip-keypoint and the right-lip-keypoint is used for the scaling. The top-lip-keypoint and left-lip-keypoint are used for the positioning of the beard
Hat Filter: The width of the face is used for the scaling. The brow-keypoint and left-eye-left-keypoint are used for the positioning of the hat
The code which does all the above is as follows:

Above, you can see the final output of the project, which contains a real-time video with filters on my face and another real-time video with key points plotted.
Limitations of the project
Although the project works pretty well, I did discover a few shortcomings which make it a little shy of perfect:
Not the most accurate model. Although 80% is pretty decent in my opinion, it still has a lot of room for improvement.
This current implementation works only for the selected set of filters because I had to do some manual tweaking for more accurate positioning and scaling.
The process of applying the filter to the image is pretty computationally inefficient because to overlay the .png filter image onto the webcam image based on the alpha channel, I had to apply the filter pixel-by-pixel wherever the alpha was not equal to 0. This sometimes leads to the program crashing when it detects more than one face in the image.
The complete code for the project is on my Github: https://github.com/agrawal-rohit/Santa-filter-facial-keypoint-regression
If you’d like to improve upon the project, or if you have any suggestions for me to solve the above issue, be sure to leave a response below and generate a pull request on the Github repo. Thanks for stopping by, hope you enjoyed the read.
Ciao!

Notes.io is a web-based application for taking notes. You can take your notes and share with others people. If you like taking long notes, notes.io is designed for you. To date, over 8,000,000,000 notes created and continuing...

With notes.io;

* You can take a note from anywhere and any device with internet connection.
* You can share the notes in social platforms (YouTube, Facebook, Twitter, instagram etc.).
* You can quickly share your contents without website, blog and e-mail.
* You don't need to create any Account to share a note. As you wish you can use quick, easy and best shortened notes with sms, websites, e-mail, or messaging services (WhatsApp, iMessage, Telegram, Signal).
* Notes.io has fabulous infrastructure design for a short link and allows you to share the note as an easy and understandable link.

Fast: Notes.io is built for speed and performance. You can take a notes quickly and browse your archive.

Easy: Notes.io doesn’t require installation. Just write and share note!

Short: Notes.io’s url just 8 character. You’ll get shorten link of your note when you want to share. (Ex: notes.io/q )

Free: Notes.io works for 12 years and has been free since the day it was started.

You immediately create your first note and start sharing with the ones you wish. If you want to contact us, you can use the following communication channels;

Email: [email protected]

Twitter: http://twitter.com/notesio

Instagram: http://instagram.com/notes.io

Facebook: http://facebook.com/notesio

Regards;
Notes.io Team

Notes

Notes - notes.io

Shortened Note Link

Long File

Notes