Description
A Master of Science thesis in Computer Engineering by Noora Mohammed Abdulla Al Roken entitled, “Arabic Multimodal Emotion Recognition Using Deep Learning”, submitted in May 2022. Thesis advisors are Dr. Gerassimos Barlas and Dr. Osameh Mahmoud Al-Kofahi. Soft copy is available (Thesis, Completion Certificate, Approval Signatures, and AUS Archives Consent Form).
Abstract
Emotions are an essential part of human communication since it shapes how information is received. As part of human-computer interaction, researchers are extending Emotion Recognition (ER) to machines. ER can contribute to many fields such as business, education, psychology, and psychiatry due to the importance of emotional insights. ER has been an active area for decades due to the complexity of the problem. Some methods use speech to extract emotions, and others use facial expressions or text. Recently, works began to combine multiple inputs, or modalities, to extract valuable and accurate insights since different emotions might be presented better in different modalities. Many datasets were made available, especially for the speech modality. Different dataset types were used that include the acted, elicited, and natural, and they were built using different languages, including Arabic. However, works utilizing multiple modalities were mainly focused on western and south Asian countries, and none included Arabic. The classifiers built were all trained on acted datasets that include exaggerated reactions, which cannot be reliable. Therefore, in this thesis we present our Arabic audio-visual emotion dataset built on five basic emotions using natural responses. We implement three existing multimodal classifiers and our proposed classifier on our dataset using five-fold cross-validation. Finally, we evaluated the ER performance based on the visual dataset size, joint and disjoint training, and the single and multimodal networks. The performance of the proposed classifier gave the highest average F1-score of 0.504 and an accuracy of 54.88% for natural emotion recognition.