A Master of Science thesis in Computer Engineering by Tasneem Yousuf entitled, “A Methodology of Rule Discovery from Large-scale Multi-tier Noisy Educational Data”, submitted in May 2018. Thesis advisor is Dr. Imran Ahmed Zualkernan. Soft and hard copy available.
Recent availability of very large amounts of educational data in digital format often leads to data overload where it is difficult to determine important trends and patterns beyond those provided by traditional statistical techniques. Therefore, educational data mining (EDM) has emerged. Association mining is a type of EDM technique which is well-known for discovering relationships from data with high scale and velocity, but low variety and veracity. This analysis can be performed at the micro-level (e.g., for teachers), meso-level (e.g., for cohorts of schools), or at macro-levels (e.g., at region, province, or country level). This thesis proposes a methodology for the application of association mining to multi-tier sparse and error-ridden educational data. The methodology uses rule templates and is organized around the four analytical dimensions of people, process, environment, and outcomes. The methodology defines Extract Transform and Load (ETL) processes for this type of data and shows how data from lower levels is aggregated to baskets at higher levels. The proposed methodology was applied to data collected from a large-scale continuous professional development (CPD) process for 2,613 teachers in a developing country. The methodology was used to mine interesting rules which were evaluated using the objective metrics of Support, Confidence, and Lift to determine the quality of rules. The Confidence for each level was set to be at least 0.85. The results are that micro-level analysis (n = 2613 teachers) yielded little or no rules with a very low mean Support of 0.00345 (sd. = 0.00214) and mean Lift 6.98 (sd. = 4.63). The situation remained somewhat the same at the meso-level (n = 1391 schools) with a mean Support of 0.0059 (sd. = 0.00051) and mean Lift of 5.46 (sd. = 3.23). The results were significantly better at the macro level (n = 59 clusters) with a mean Support of 0.089 (sd. = 0.021) and mean Lift of 5.925 (sd. = 2.5). The mined rules discovered several anomalies and fidelity violations in the CPD process at various levels. The methodology was also useful in identifying small groups of teachers (6-8 teachers), schools (8-10 schools), and clusters (4-7 clusters) with common characteristics that can be further administered to help improve the CPD process.