A Methodology of Rule Discovery from Large-scale Multi-tier Noisy Educational Data

Yousuf, Tasneem

dc.contributor.advisor	Zualkernan, Imran
dc.contributor.author	Yousuf, Tasneem
dc.date.accessioned	2018-06-05T06:52:02Z
dc.date.available	2018-06-05T06:52:02Z
dc.date.issued	2018-05
dc.identifier.other	35.232-2018.12
dc.identifier.uri	http://hdl.handle.net/11073/9357
dc.description	A Master of Science thesis in Computer Engineering by Tasneem Yousuf entitled, “A Methodology of Rule Discovery from Large-scale Multi-tier Noisy Educational Data”, submitted in May 2018. Thesis advisor is Dr. Imran Ahmed Zualkernan. Soft and hard copy available.	en_US
dc.description.abstract	Recent availability of very large amounts of educational data in digital format often leads to data overload where it is difficult to determine important trends and patterns beyond those provided by traditional statistical techniques. Therefore, educational data mining (EDM) has emerged. Association mining is a type of EDM technique which is well-known for discovering relationships from data with high scale and velocity, but low variety and veracity. This analysis can be performed at the micro-level (e.g., for teachers), meso-level (e.g., for cohorts of schools), or at macro-levels (e.g., at region, province, or country level). This thesis proposes a methodology for the application of association mining to multi-tier sparse and error-ridden educational data. The methodology uses rule templates and is organized around the four analytical dimensions of people, process, environment, and outcomes. The methodology defines Extract Transform and Load (ETL) processes for this type of data and shows how data from lower levels is aggregated to baskets at higher levels. The proposed methodology was applied to data collected from a large-scale continuous professional development (CPD) process for 2,613 teachers in a developing country. The methodology was used to mine interesting rules which were evaluated using the objective metrics of Support, Confidence, and Lift to determine the quality of rules. The Confidence for each level was set to be at least 0.85. The results are that micro-level analysis (n = 2613 teachers) yielded little or no rules with a very low mean Support of 0.00345 (sd. = 0.00214) and mean Lift 6.98 (sd. = 4.63). The situation remained somewhat the same at the meso-level (n = 1391 schools) with a mean Support of 0.0059 (sd. = 0.00051) and mean Lift of 5.46 (sd. = 3.23). The results were significantly better at the macro level (n = 59 clusters) with a mean Support of 0.089 (sd. = 0.021) and mean Lift of 5.925 (sd. = 2.5). The mined rules discovered several anomalies and fidelity violations in the CPD process at various levels. The methodology was also useful in identifying small groups of teachers (6-8 teachers), schools (8-10 schools), and clusters (4-7 clusters) with common characteristics that can be further administered to help improve the CPD process.	en_US
dc.description.sponsorship	College of Engineering	en_US
dc.description.sponsorship	Department of Computer Science and Engineering	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	Master of Science in Computer Engineering (MSCoE)	en_US
dc.subject	educational analytics	en_US
dc.subject	association mining	en_US
dc.subject	rule discovery	en_US
dc.subject	Apriori	en_US
dc.subject	market basket analysis	en_US
dc.subject	developing countries	en_US
dc.title	A Methodology of Rule Discovery from Large-scale Multi-tier Noisy Educational Data	en_US
dc.type	Thesis	en_US

Files in this item

Name:: 35.232-2018.12 Tasneem Yousuf.pdf
Size:: 2.132Mb
Format:: PDF

View/ Open

This item appears in the following Collection(s)

Masters Theses

Show simple item record