Copyright © 2010 SAS Institute Inc. All rights reserved.
DASI: Analytics in Practice and Academic Analytics PreparationMia Stephens – [email protected]
2
Copyright © 2010, SAS Institute Inc. All rights reserved.
Background
§ TQM Coordinator/Six Sigma MBB
§ Founding Partner, Statistical Trainer and Consultant, North Haven Group
§ Senior Consultant and Trainer, George Group (now Accenture)
§ Adjunct Professor Statistics, University of New Hampshire
§ Academic Ambassador, JMP Academic Team
§ Author
3
Copyright © 2010, SAS Institute Inc. All rights reserved.
Background
4
Copyright © 2010, SAS Institute Inc. All rights reserved.
What is Analytics?“…an encompassing and multidimensional field that uses mathematics, statistics, predictive modeling and machine-learning techniques to find meaningful patterns and knowledge in recorded data.”
§ Descriptive/explanatory statistics. Understand what happened in the past and why.
§ Predictive analytics. Using past data and predictive algorithms to determine what will happen next.
§ Prescriptive analytics. Answering the question of what to do, by providing information on optimal decisions based on the predicted future scenarios.
From SAS Analytics: http://www.sas.com/en_us/insights/analytics/what-is-analytics.html
5
Copyright © 2010, SAS Institute Inc. All rights reserved.
Analytics Frameworks§ SEMMA (SAS)
§ Sample§ Explore§ Modify§ Model§ Assess
§ CRISP-DM (Cross Industry Standard Process for Data Mining)§ Business Understanding§ Data Understanding§ Data Preparation§ Modeling§ Evaluation§ Deployment
6
Copyright © 2010, SAS Institute Inc. All rights reserved.
The Business Analytics Process
Define the Problem
Prepare for Modeling
Modeling
Deploy Model
Monitor Performance
Business Problem
BusinessAnalyticsProcess
From Building Better Models with JMP Pro, Grayson, Gardner and Stephens, 2015.
May loop back at any step
7
Copyright © 2010, SAS Institute Inc. All rights reserved.
The Business Analytics Process
Define the Problem
Prepare for Modeling
Modeling
Deploy Model
Monitor Performance
Business Problem
BusinessAnalyticsProcess
From Building Better Models with JMP Pro, Grayson, Gardner and Stephens, 2015.
May loop back at any stepThe focus
of most courses?
What do we like to teach?
8
Copyright © 2010, SAS Institute Inc. All rights reserved.
Key Tools:• Multiple Regression• Logistic Regression• Naïve Bayes• kNN• Classification and Regression Trees• Bootstrap Forests and Boosted Trees
• Neural Networks• Generalized Linear Models• Survival Models• Forecasting/Time Series• Model Comparison• Text Mining
Modeling: Activities and ToolsKey Activities:• Choose the appropriate modeling method or methods
• Fit one or more models• Evaluate the performance of each model using validation statistics (misclassification, RMSE, Rsquare)
• Choose the best model or set of models to address the analytics problem (and ultimately the business problem)
• **Create ensemble models
9
Copyright © 2010, SAS Institute Inc. All rights reserved.
The Business Analytics Process
Define the Problem
Prepare for Modeling
Modeling
Deploy Model
Monitor Performance
Business Problem
BusinessAnalyticsProcess
From Building Better Models with JMP Pro, Grayson, Gardner and Stephens, 2015.
May loop back at any stepWhat is the
most time-consuming step?
10
Copyright © 2010, SAS Institute Inc. All rights reserved.
Key Tools:• SQL, data import• Data table structuring - join, concatenate, update, stack, summarize,…
• Summary statistics and graphical displays, interactive tools and filtering
• Multivariate procedures (clustering, PCA,…)
• Transformations, creating derived variables, recoding, binning
• Addressing missing data and outliers
• Creating holdout set(s)
Data Preparation: Activities and ToolsKey Activities:• Determine which data are needed
• Compile (or collect new) data• Explore, examine and understand data
• Assess data quality• Clean and transform data• Define features• Reduce dimensionality• Create training, validation and test sets
11
Copyright © 2010, SAS Institute Inc. All rights reserved.
The Business Analytics Process
Define the Problem
Prepare for Modeling
Modeling
Deploy Model
Monitor Performance
Business Problem
BusinessAnalyticsProcess
From Building Better Models with JMP Pro, Grayson, Gardner and Stephens, 2015.
May loop back at any stepMost neglected
topics – both academically and in practice?
12
Copyright © 2010, SAS Institute Inc. All rights reserved.
Define the Problem
§ Understand the business problem(s) and objectives
§ ROI
§ Frame the analytics problem and objectives
§ Define project goals
§ Develop a project plan
§ Obtain resources, buy-in, approvals
13
Copyright © 2010, SAS Institute Inc. All rights reserved.
• Deliver the model and model results to the business or internal customers
• Communicate results (graphs and profilers, summaries, explore “what if” scenarios)
• Assist in applying model insights and implementing ongoing use of the model
• Document the project• Close out the project• Evaluate and quantify improvement• Revise model• Identify additional opportunities/problems
Deploy Models and Monitor Performance
14
Copyright © 2010, SAS Institute Inc. All rights reserved.
Best Practices from Top Programs
How to provide context and real-world experience?
§ Focus on application to real business problems§ Method-specific case studies
§ Capstone projects
§ Work in cross-functional teams
§ Industry partnerships and projects
§ Internships
15
Copyright © 2010, SAS Institute Inc. All rights reserved.
Best Practices from Top Programs
Analytics is different from statistics§ Not statistics programs/courses simply rebranded or
renamed
§ Heavy in application versus theory, equations and computation
§ Start from the ground up in designing programs/courses
16
Copyright © 2010, SAS Institute Inc. All rights reserved.
Best Practices from Top Programs
Predictive analytics is different from descriptive or explanatory modeling
§ De-emphasize: » hand-computation
» p-values
» rigorous adherence to meeting assumptions
» measures of goodness-of-fit
§ Emphasize: model accuracy and predictive ability on holdout sample
17
Copyright © 2010, SAS Institute Inc. All rights reserved.
Feedback from Industry
Most important skill - data understanding and intuition § More important than understanding of theories and equations
Students need to be able to:§ think with data§ know which methods to apply and when§ to understand and communicate the story in the data
18
Copyright © 2010, SAS Institute Inc. All rights reserved.
Feedback from Industry
Learning software/programming language is secondary to learning the concepts and methods
§ The end goal is not to teaching statistical software or programming
§ “We can teach the software we use, we can’t teach data intuition”
§ Software should facilitate learning
§ Toolkit – use the tools that best support the learning objectives (exploration of concepts, development of models, deployment, communication,…)