Date post: | 11-Jan-2017 |
Category: |
Engineering |
Upload: | databricks |
View: | 3,270 times |
Download: | 1 times |
Transitioning from Traditional DW to Spark in OR Predictive Modeling
Ayad Shammout and Denny LeeOctober 21st, 2015
About Ayad Shammout
• Director of Business Intelligence, Beth Israel Deaconess Medical Center
• Helped build Business Intelligence, highly available / disaster recovery infrastructure for BIDMC
2
About Denny Lee
• Technology Evangelist, Databricks
• Former Sr. Director of Data Sciences Eng, Concur
• Helped bring Hadoop onto Windows and Azure
3
We are Databricks, the company behind Spark
Founded by the creators of Apache Spark in 2013
Share of Spark code contributed by Databricksin 2014
75%
4
Data Value
Created Databricks on top of Spark to make big data simple.
Why is Operating Room Scheduling Predictive Modeling Important?
6
$15-$20 / minute for a basic surgical procedure
Time is an OR's most valuable resource
Lack of OR availabilitymeans loss of patient
OR efficiency differs depending on theOR staffing and allocation (8, 10, 13, or 16h), not the workload (i.e. cases)
7
“You are not going to get the elephant to shrink or change its size. You need to face the fact that the elephant is 8 OR tall and 11hr wide”
Steven Shafer, MD
8
Operating RoomBetter utilization =
Better profit margins
Reduce support andmaintenance costs
Medical StaffBetter utilization =
Better profit margins
Better medical staffefficiencies = Better
outcomes
PatientsShorter wait times
and less cancellations
Better medical staffefficiencies = Better
outcomes
Develop Predictive Model
• Develop a predictive model that would identify available OR time 15 business days in advance.
• Allow us to confirm wait list cases two weeks in advance, instead of when the blocks normally release four days out.
9
Forecast OR Schedule
• Case load 15 business days in advance
• Book more cases weeks in advance to prevent under-utilization
• Reduce staff overtime and idle time
10
Background
• Three surgical groups• GYN, urology, general surgery, colorectal, surgical
oncology• Eyes, plastics, ENT• Orthopedics, podiatry
• Currently built using SQL Server Data Mining
11
Using Traditional Data Warehousing Techniques
OR DWSSAS Data
MiningData Sources
OR Reports
Traditional Data Warehousing & Data Mining OR Predictive Model
Process mining model every 3 hours
OR Prediction DB
Data inserts every 3 hours
Prediction results
14
Original Design
• Multiple data sources pushing data into SQL Server and SQL Server Analysis Server Data Mining
• Hand built 225 different DM modules (5 days, 15 business days ahead, 3 different groups)
• Pipeline process had to run 225 times / day (3 pools x 75 modules)
15
Regression Calculations
SSAS Data Mining T-SQL Code
Intercept R2
Mean Adjusted R2
Coefficients Standard Deviation
Variance Standard Error
Taking advantage of Spark’s DW Capabilities and MLlib
OR DWData Sources
OR Reports
OR Predictive Model in Spark
Data inserts every 3 hours
18
demoOR Block SchedulingExtract History data and run linear regression with SGD with multiple variables
19
21
22
23
24
25
26
27
OR Schedule Report (example)
28
Why the model is working
• Can coordinate waitlist scheduling logistics with physicians and patients within two weeks of the surgery
• Plan staff scheduling and resources so there are less last-minute staffing issues for nursing and anesthesia
• Utilization metrics are showing us where we can maximize our elective surgical schedule and level demand
Key Learnings when Migrating from Traditional DW to Spark
30
Transitioning to the CloudBeth Israel Deaconess Medical Center is increasingly moving to cloud infrastructure services with the hopes of closing its data center when the hospital's lease is up in the next five years. CIO John Halamka says he's decommissioning HP and Dell servers as he moves more of his compute workloads to Amazon Web Services, where he's currently using 30 virtual machines to test and develop new applications. "It is no longer cost effective to deal with server hosting ourselves because our challenge isn't real estate, it's power and cooling," he says.
31
Transitioning to the Cloud
• Need time for engineers, analysts, and data scientists to learn how to build for the cloud
• Build for security right from start – process heavy, a lot of documentation, audits / reviews
• Differentiating data engineers and engineers (REST APIs, services, elasticity, etc.)
32
Transitioning to Spark
• No more stored procedures or indexes• Good for Spark SQL, services design
• Prototype, prototype, prototype • Leverage existing languages and skill sets • Leverage the MOOCs and other Spark training• Break down the silos of data engineers, engineers, data
scientists, and analysts
33
Transitioning DW to Spark• Understand Partitioning, Broadcast Joins, and Parquet
• Not all Hive functions are available in Spark (99% of the time that is okay) due to Hive context
• Don’t limit yourself to build star-schemas / snowflake schemas
• Expand outside of traditional DW: machine learning, streaming
Thank you.For more information, please contact [email protected]@databricks.com