Description
A Master of Science thesis in Computer Engineering by Mohammed Elnawawy entitled, “FPGA-Based Network Traffic Classification Using Machine Learning”, submitted in November 2019. Thesis advisor is Dr. Tamer Shanableh and thesis co-advisor is Dr. Assim Sagahyroon. Soft copy is available (Thesis, Approval Signatures, Completion Certificate, and AUS Archives Consent Form).
Abstract
Traffic classification is the process of associating network traffic with the application or group of applications that generated it. It is an essential part of network management at datacentres and network operators due to its importance in traffic shaping, bandwidth allocation, and cybersecurity. Several techniques were investigated by researchers to classify traffic accurately with methods based on machine learning achieving encouraging results. In this work, we conduct several experiments using naïve Bayes, support vector machine, k-nearest neighbour, and random forest trees on two traffic datasets which are both publicly available. While the first dataset was collected in an uncontrolled environment that resembles real network behavior, the second was captured using a highly controlled environment. In the experiments conducted in this work, we look at the classifiers’ performance and their effect on the classification accuracy and F-score. We also assess the suitability of extracted features using feature selection techniques. Moreover, we determine the optimal percentage of packets within a flow that need to be considered while extracting flow-level features. It is observed that when a larger number of packets is considered, the classification performance improves, but the required processing delay increases. Thus, we argue that 60% of packets in a flow would be a good compromise that ensures high performance in the least possible time. Several graphs are generated during each experiment to investigate the effect of varying each parameter on the classification performance. The results of our experiments indicate that random forest outperforms all other algorithms achieving a maximum accuracy of 98.5% and an F-score of 0.932. Finally, since software-based classifiers are usually slow and hence incapable of coping with the increasing amount of traffic within congested networks, we implement a highly pipelined random forest classifier on a Field-Programmable Gate Array (FPGA). The implementation makes use of the parallel architecture of the FPGA in accelerating such a time-consuming task. The implemented design is capable of achieving an average throughput of 163.24 Gbps which is more than twice the maximum throughput compared to reported work. This enables datacentres to achieve efficient online traffic classification given the dynamic nature of modern networks.