Date post: | 22-Mar-2017 |
Category: |
Data & Analytics |
Upload: | jason-geng |
View: | 160 times |
Download: | 3 times |
Data Science Project Lifecycle
Jason Geng @Data Application Lab
Miya Du @Data Science Association
Business Requirement
Data Acquisition
Data Preparation
Hypothesis & Modeling
Evaluation & Interpretation
Deployment
Operations
Optimization
Business Requirements
u Data scientists need to work with business people and those with expertise in understanding the data, understanding the business
u Specify the business requirements
u For instance, the healthcare data
e.g. ‘DISCWT’:‘This the discharge-level weight on the HCUP nationwide data to
produce national estimates’
Understand the data:
Understand the Business:
Goal:Predict Readmission Rate
Database:
Healthcare:Readmissions Database
Modeling
Data Collection
u Data from product line
u Purchase third party data
u Social media (Facebook, LinkedIn)
u Web crawling
u Open source data (Opendata, U.S. Census Data)
Challenge
Data Storage
Data Management
Legacy data
OLTP Web Log
Web Crawler
Open Source
Third Party Data
Social Media Data
XML
CSV
LOG
SQL
…
Product Line
Business Intelligence
Data Science App
Data Preparation (Data Wrangling)
u Cleaning data (semantic errors, missing entries, or inconsistent formatting)
u Challenge: data integration
u 80% time in project workflow
Data Source A
Data Source B
Data Source B
ETLData
Warehouse
Feature Engineering
Select or creating features
Research feature
relevance
Experiment and
validation
Change the feature set
Go back to feature
selection step
Modeling
Reference Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/
Deploy to Product Line
Thank you!
https://www.DataAppLab.com
Feb 2017PPT: Xiaolu Zhao @ Feb 16, 2017