Description
A Master of Science thesis in Computer Engineering by Soha Galalaldin Khider Ahmed entitled, "Sentiment Mining of Arabic Twitter Data," submitted in January 2014. Thesis advisor is Dr. Michel Pasquier and co-advisor is Dr. Ghassan Qaddah. Available are both soft and hard copies of the thesis.
Abstract
Social networking services such as Facebook and Twitter and social media hosting websites such as Flickr and YouTube have become increasingly popular in recent years. One key factor to their attractiveness worldwide is that these sites and services allow people to express and share their opinions, likes, and dislikes, freely and openly. The opinions posted range from criticizing politicians to discussing football matches, citing top news, appraising movies, and recommending new products and services such as mobiles, restaurants, and software. This development has fueled a new field known as sentiment analysis and opinion mining with the goal of extracting people's sentiment from text to assist customers in their purchase decisions and vendors in enhancing their reputation. This emerging field has attracted a large research interest, but most of the existing work focuses on English text. Hence, in this thesis, we studied sentiment analysis of Arabic text retrieved from a well-known social media site, namely Twitter. Specifically, we studied the topic of target-dependent sentiment analysis of Arabic Twitter text, which has not been addressed in Arabic language before. We developed a system that will acquire Arabic text from Twitter and extract users' opinions towards different topics and products. Key phases of the system are as follows. In the Data Acquisition phase, we collected tweets from Twitter related to specific topics. In the Tweet-Filtering phase, we reduced the noise in the collected tweets data to facilitate the Annotation phase, in which we annotated the collected tweets depending on the specified topic. In the Data Preprocessing phase, we added tags, normalized the words used in tweets, and removed spam tweets. In the Feature identification phase, we extracted stylistic, syntactic, and semantic features, and selected those yielding better results using features selection algorithms. In the Classification phase, the decision to annotate the tweets as negative, positive, or neutral towards a specific topic was made using a trained machine-learning algorithm. Results from different feature sets, classifiers, and datasets are reported in terms of classification accuracy, Kappa statistic, and F-measure.