A Master of Science thesis in Electrical Engineering by Ahsan Jalal entitled, "Semi-supervised Clustering of Facial Expressions," submitted in November 2017. Thesis advisor is Dr. Usman Tariq. Soft and hard copy available.
Automated facial expressions recognition (FER) is an important area in computer vision and machine learning due to its eminent role in human-machine interaction. FER is key in building intelligent user interfaces, particularly in smart cities. It is also used to enable social robots to naturally interact with humans. However, FER is not trivial as it may vary significantly within different genders, age groups and occasions. Limited availability of the labeled dataset for expression recognition task is another challenge. Therefore, semi-supervised learning algorithm using triplet-loss based deep convolutional neural network is proposed with the motivation to cluster known and unknown facial expressions under unconstrained environment. Faces are detected and aligned from the image dataset and are then used to train various supervised and unsupervised dimensionality reduction methods. Transformed faces in the new dimensions are used for clustering using K-means and consensus clustering. Dimensionality reduction methods that are employed include, principal component analysis, linear discriminant analysis and learning embeddings with deep convolutional neural networks (CNN). The motivation behind using supervised CNN is their ability to learn non-linear transformations in a highly complex feature space. The best results could be found using embeddings that are learned using deep convolution neural networks with consensus clustering method. The novelty of the proposed work is to cluster facial expressions, which were not present while learning the supervised dimensionality reduction methods. Experimental results on two constrained datasets, Multi-PIE face and MMI face datasets, show that the proposed algorithm does not only produce best clustering results on discrete expressions compared to other linear embeddings, but also clusters expressions with different intensities. The proposed algorithm is also applied on a complete unconstrained YouTube dataset and the clustering of different facial behaviors shows that the proposed work can be generalized to non-standard expressions and can learn expression classes from the datasets themselves.