Predict student behavior to Predict student behavior to increase retentionincrease retention
Online seminar presented by:Online seminar presented by:Jing Luan, Ph.D., Cabrillo CollegeJing Luan, Ph.D., Cabrillo College
Bob Valencic, SPSS Inc.Bob Valencic, SPSS Inc.
August 22, 2002August 22, 2002
Business issues in higher educationBusiness issues in higher education How to predict student behavior and How to predict student behavior and
increase retention?increase retention? Data mining concepts Data mining concepts Data mining methodsData mining methods
Case studies Case studies Getting started on data miningGetting started on data mining Q&AQ&A
Seminar agendaSeminar agenda
Higher education business Higher education business issuesissues
Institutional effectiveness Student learning outcome assessment Enrollment management
Achieving optimum attraction, retention and persistence goals
Marketing Increasing competition for students
Alumni
How can data mining help?
Institutional effectivenessInstitutional effectiveness
Which students make greatest use of institutional services?
What courses provide high full-time equivalent students (FTES) and allow better use of space?
What are the patterns in course taking? What courses tend to be taken as a group?
Getting to know your students
Enrollment managementEnrollment management
Who are our best students? Where do our students come from? Who is most likely to return for
another semester? Who is most likely to fail or drop out?
Helping your students succeed
MarketingMarketing
Who is most likely to respond to our new campaign?
Which type of marketing/recruiting works best?
Where should we focus our advertising and recruiting?
Making the best use of tight budgets
AlumniAlumni
What are the different types/groups of alumni?
Who is likely to pledge, for how much, and when?
Where and on whom should we focus our fundraising drives?
Continuing the relationship
Our focus today: Our focus today: Predicting student behaviorPredicting student behavior
Acquiring new students Retaining students Increasing persistence to and
beyond graduation
Data mining definedData mining defined
“The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories and by using pattern recognition technologies as well as statistical and mathematical techniques.”
The Gartner Group
Another definitionAnother definition
“Simply put, data mining is used to discover patterns and relationships in your data in order to help you make better business decisions.”
Robert Small, Two Crows
CRISP-DMCRISP-DM
Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
Two types of data miningTwo types of data mining
Supervised Purpose: For
classification and estimation
Algorithms C5.0 C&RT Neural
Network, etc.
Unsupervised Purpose: For
clustering and association
Algorithms Kohonen Kmeans TwoStep GRI, etc.
Algorithm vs. modelAlgorithm vs. model
Algorithm A technical term
describing a specific mathematically driven data mining function
Model A set of
representative rules, behaviors or characteristics against which data are analyzed to find similarities
Output
Hidden layer
Input layer
Neural networksNeural networks Synonymous with Machine Learning Identifies complex relations Somewhat difficult to interpret Long computation times
Cat. % nBad 52.01 168
Good 47.99 155Total (100.00) 323
Credit ranking (1=default)
Cat. % nBad 86.67 143
Good 13.33 22Total (51.08) 165
Paid Weekly/MonthlyP-value=0.0000, Chi-square=179.6665, df=1
Weekly pay
Cat. % nBad 15.82 25Good 84.18 133Total (48.92) 158
Monthly salary
Cat. % nBad 90.51 143
Good 9.49 15Total (48.92) 158
Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1
Young (< 25);Middle (25-35)
Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7
Old ( > 35)
Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49
Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1
Young (< 25)
Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109
Middle (25-35);Old ( > 35)
Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8
Social ClassP-value=0.0016, Chi-square=12.0388, df=1
Management;Clerical
Cat. % nBad 58.54 24
Good 41.46 17Total (12.69) 41
Professional
Decision treesDecision trees
Easy to interpret - income < $40K
– job > 5 yrs then yes– job < 5 yrs then no
- income > $40K– high debt then no– low debt then yes
Apriori Apriori Discovers events that occur togetherOften called ‘market basket’ analysisExample – What groups classes do certain students take in the same semester that may impact facilities and course scheduling?
Kohonen networkKohonen network
Seeks to describe dataset in terms of natural clusters of cases
Example – identify similar groups of students
Predicting student persistence
Case study using Case study using ClementineClementine®®
Examining dataExamining data
Clustering using Clustering using TwoStepTwoStep
Building models for Building models for persistence in streamspersistence in streams
A node is being executed (notice the red arrows denoting the flow of data.
Seeing the work of Seeing the work of neural thinkingneural thinking
Graphic display
showing an ANN is
learning the data.
Results of neural nodeResults of neural node
These are the outputs of the Neural Networks. Overall accuracy and significance of features (left). Predicted number of policies using fresh data vs. known data (above).
Examining C5.0Examining C5.0
The control panel of the C5.0
node, (Expert)
Results of C5.0 nodeResults of C5.0 nodeView the
prediction by individual
records (PNXT vs. $C-PNXT).
View the overall
prediction accuracy.
Comparing C&RT and Comparing C&RT and C5.0C5.0
Use the Analysis node to examine the difference in accuracy for C&RT and
C5.0.
Which one is better:Which one is better:C&RT & C5.0C&RT & C5.0
C5.0 has an accuracy rate of
66.3% and C&RT 63.7%.
They agree 72% of the time.
Visualizing Results
Visualizing Results
Scoring new dataScoring new data
Moment of truth. The most powerful feature of data mining is to use learned “rules” to predict (score)
using fresh data for business purposes. Shown
here is the change of dataset to a fresh data set
unseen by Clementine before now.
Using models to score Using models to score new datanew data
Model Results Scored Results
Additional case studyAdditional case study
How best to identify future transfer students so college can groom them?
What can a community college do to increase transfer rates?
Using decision tree models, the top rule for successful transfers was: taking more than 12 units, taken less than 5 non-transfer courses, must have taken at least one math course.
Predicting the behavior of transfer students
Getting startedGetting started
Company stability and customer feedback
User interface Scalability Server/Client Modeling capacities Learning curve Join a listserv, such as CLUG Cost
Evaluate data mining software
Getting startedGetting started
Determine business needs Determine technology infrastructure and
management support Identify mining area and business problems Determine data source(s) Invite an expert to jump start Pilot test mining results CRISP-DM and Real-time data mining,
Knowledge Discover in Databases (KDD)
Develop a data mining plan for your institution
Want to Learn MoreWant to Learn More??Full training course descriptions at:Full training course descriptions at:
www.spss.com/training
Contact us or one of our other data mining experts by Contact us or one of our other data mining experts by callingcalling 800-543-5815800-543-5815..
Check out the Knowledge Management/Data Mining Check out the Knowledge Management/Data Mining Discussion Group:Discussion Group:
http://www.kdl1.com/kmdm
Obtain the book,Obtain the book, “Knowledge Management – Building A “Knowledge Management – Building A Competitive Advantage in Higher Education,” Competitive Advantage in Higher Education,” published by published by
Jossey-Bass:Jossey-Bass:http://josseybass.com/cda/product/0,,0787962910,00.html
Bob Valencic Bob Valencic [email protected] Luan Jing Luan [email protected]
Thank you!Thank you!
Predict student behavior Predict student behavior to increase retentionto increase retention
22ndnd Annual Public Sector Roadshow Annual Public Sector RoadshowOctober 15 in Washington, D.C.October 15 in Washington, D.C.
www.spss.com/psroadshow