Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | sandra-lucas |
View: | 218 times |
Download: | 0 times |
Predicting Student Risks Through Longitudinal Analysis
Date : 2015/04/23
Resource : KDD’14
Author: A.Tamhan,S.Ikbal,B.Sengupta,M.Duggirala….
Advisor : Dr. Jia-Ling Koh
Speaker : Sheng-Chih Chu
1
Outline
• Introduction• Data Description & Defining Risk• Data Processing• Experiments• Conclusion
2
IntroductionMotivation:
• K-12 reflects the most critical phase of an personal lifelong learning, during which the opportunities for a successful future need to be created and nurtured.
• Poor academic in K-12 is often precursor to unsatisfactory eduational outcomes,which are associated with social costs and significant personal.
3
IntroductionMotivation:
4
Introduction
Goal:
• Building predictive module to predict students at risk of poor performance is first goal.
• In addition, early prediction can allow teachers take remedial actions in a students’s learning path.
5
Outline
• Introduction• Data Description & Defining Risk• Data Processing• Experiments• Conclusion
6
Data Description
• GCPS is one of the largest school systems in the US,consisting of 132 schools and serving more than168000 students at present.
7
Defining Risk
• CRCTs : (Score rang from 650~900)• 850↑(excedding standards)• 800 (standards)• 800↓(at risk)• (mathematics,science)
• ITBS : (provse PR)• 25% as a thresholds on grade 8 (at risk• (reading,written expression,mathematics,science,…)
• CogAt• (reasonable ability)
8
Outline
• Introduction• Data Description & Defining Risk• Data Processing• Experiments• Conclusion
9
Data Processing and Feature
10
Data warehouse
19 millionSPSS
Modeler
Consider CRCT,ITBS,CogAt
CRCTs for grade7
Mike 750
Jasmine
Thomas
Alice 821
Peter
Jenny 812
Longitudinal Feature Data
Grade CRCTs for grade8
CRCTs for grade7
CRCTs for grade6
CRCTs for grade5
ITBS for grade8
ITBS for grade5
ITBS for grade3
Mike 7 750 680 693 42 43Jasmine 6 823 805 62 58Thomas 5 725 45 42Alice 8 832 821 815 811 68 62 59Peter 4 64Jenny 7 812 795 822 60 63
11
Grade
Mike 7
Jasmine 6
Thomas 5
Alice 8
Peter 4
Jenny 7
CRCTs for grade8
Mike
Jasmine
Thomas
Alice 832
Peter
Jenny
Student Profile
12
gender ethnicity
Free meal
Gifted Special education
Absent day
Sus-pensions
Discipline
Mike M B Y N Y 0 X 85Jasmine F W N Y N 10 X 87Thomas M W N N N 5 X 85Alice F W Y Y N 0 X 92Peter M W N N N 20 O 65Jenny F B N Y N 0 X 90
gender
Mike M
Jasmine F
Thomas M
Alice F
Peter M
Jenny F
ethnicity
Mike B
Jasmine W
Thomas W
Alice W
Peter W
Jenny B
Discipline
Mike 85
Jasmine 87
Thomas 85
Alice 92
Peter 65
Jenny 90
Merged Data Set
13
gender ethnicity
Free meal
Gifted Special education
Absent day
Sus-pensions
Discipline
Mike M B Y N Y 0 X 85Jasmine F W N Y N 10 X 87Thomas M W N N N 5 X 85Alice F W Y Y N 0 X 92Bill M W N N N 20 O 65Jenny F B N Y N 0 X 90
Grade CRCTs for grade8
CRCTs for grade7
CRCTs for grade6
CRCTs for grade5
ITBS for grade8
ITBS for grade5
ITBS for grade3
Mike 7 750 680 693 42 43Jasmine 6 823 805 62 58Thomas 5 725 45 42Alice 8 832 821 815 811 68 62 59Peter 4 64Jenny 7 812 795 822 60 63
• Target variable: CRCT grade 8
Creation of Target Variable Dependent Data
14
Grade CRCTs for grade8
CRCTs for grade7
CRCTs for grade6
CRCTs for grade5
ITBS for grade8
ITBS for grade5
ITBS for grade3
Mike 7 750 680 693 42 43Jasmine 6 823 805 62 58Thomas 5 725 45 42Alice 8 832 821 815 811 68 62 59Peter 4 64Jenny 7 812 795 822 60 63
Imputation of Missing Features
Grade CRCTs for grade8
CRCTs for grade7
CRCTs for grade6
CRCTs for grade5
ITBS for grade8
ITBS for grade5
ITBS for grade3
Mike 7 750 680 693 62 43Jasmine 6 823 805 58Thomas 5 725 65 42Alice 8 832 821 815 811 78 71 59Peter 4 64Jenny 7 812 795 822 72 63Jason 7 790 785 801 58 63Mao 7 635 697 45Marry 8 846 777 51 58Cube 8 545 657 732 39 47Bill 7 753 745 44 49Gary 8 897 902 87 91 96Han 8 801 786 759 70 54
15
Mean: (832+545+897+801)/4 = 769
Mean: (750+821+…+753+902)/8 = 788
Experiments
• Introduction• Data Description & Defining Risk• Data Processing• Experiments• Conclusion
16
Risk Prediction
17
Dataset
• ITBS data set contains 58361 samples containing 15.3% positive(at-risk) and 84.7% negative(non-risk)
• CRCT data set contains 43036 students containing 10.7% and 89.3% samples.
• Used 5-fold cross validation• Used SPSS or Weka
18
Peformance
19
Performance
20
Early Prediction of the Risk
21
Early Prediction of the Risk
22
Outline
• Introduction• Data Description & Defining Risk• Data Processing• Experiments• Conclusion
23
Conclusion
• The result showed that a student’s risk of poor performance can be predicted with reasonable accuracy.
24