Big data analystics

Post on 23-Dec-2014

50 views 0 download

Tags:

description

 

transcript

Translational Bioinformatics (TBI)

Sushil K. Meher MCA(NIT, RKL), MBA (Hospital Management), M.Phil (CS),(Ph.D(eHealth)).

Computer FacilityALL INDIA INSTITUTE OF MEDICAL SCIENCES

NEW DELHI

Big Data Analystics in Health Care

The Cycles of Innovation

Innovation in Health Industry

“Keeping Afloat in a Sea of 'Big Data”ITBusinessEdge – 9/6/11

“Why big data is a big deal”InfoWorld – 9/1/11

“The challenge–and opportunity–of big data”McKinsey Quarterly—5/11

“Getting a Handle on Big Data with Hadoop”Businessweek-9/7/11

“Ten reasons why Big Data will change the travel industry”Tnooz -8/15/11

“The promise of Big Data in Health Care”Intelligent Utility-8/28/11

Big Data Buzz

Our Journey To The Cloud/Big Data

OLTP: Online Transaction Processing (DBMSs)OLAP: Online Analytical Processing (Data Warehousing)RTAP: Real-Time Analytics Processing (Big Data Architecture & Technology)

So What is Big Data?

Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the

typical database software tools.

How much is Big?It is not a single number but a set of

parameters

!!!

!!!

!!!

!!!

!!!

“Big Data Is Less About Size, And More About Freedom”

―Techcrunch

!!!

!!!

!!!“Findings: ‘Big Data’ Is More Extreme Than Volume” ― Gartner

“Big Data! It’s Real, It’s Real-time, and It’s Already Changing Your World”

―IDC

“Total data: ‘bigger’ than big data”

― 451 Group

THE ERA OF

BIG DATAIS HERE

Big Data Analytics

The Path to Advanced Health Care

Big Data in Healthcare

VOLUME VELOCITY VARIETY VARACITY

SOCIAL

BLOG

SMARTMETER

101100101001001001101010101011100101010100100101

HEALTH

• In 2011 alone, 1.8 zettabytes of data were created globally. To put this into perspective, this volume of data equated to 200 billion, 2-hour long HD movies, which one person would need 47 million years to watch in their entirety.

• U.S health care data alone reached 150 exabytes in 2011. Five exabytes (1018

gigabytes) of data would contain all the words ever spoken by human beings on earth. At this rate, big data for U.S. health care will soon reach zettabyte (1021 gigabytes) scale and yottabytes (1024 gigabytes) not long after.

“Translating Health Care Through Big Data, Strategies for leveraging big data in the health care industry” - Institute for Health Technology Transformation

The register.co.uk

Data Measurement Units

The Model Has Changed… The Model of Generating/Consuming Data has Changed

Old Model: Few Hospitals are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming data

Healthcare is Positioned to Gain from Big Data

Innovate With Big Data AnalyticsBig Data Analytics Accelerate Health Care 2.0 for Evidence-based Care Provider

TRADITIONAL DATA LEVERAGED

LOW

HIGH

Qua

lity

of C

are

LegacySystem

Treatment Pathways on

Summary Data

Database

BI Reporting

TreatmentPathways onAll the Data

Delivering 10 Years Of Data In Seconds

International

ResultsDrug

Interaction

Predictions Individual

Patient Histo

ry

BIG DATA LEVERAGED

Social &

Economic

Factors

Big Data Analytics

In-Database

Analytics

Geographical

Facto

rs

NIH Historic

al

Data

Associative Rule Mining and User Clustering Improves Pathways

External Data Sources Enable Personalized Medicine

USE CASE

Big Data Key Drivers

PopulationHealth

Patient Experience

Per Capita Cost

• New Delivery Models

• Meaningful Use

• ICD-10 / SNOMED-CT

• Better Data = Improved Outcomes

• Shift from volume-based care to value-based care

• Fraud Detection

• Cost Savings

What’s driving Big Data

- Ad-hoc querying and reporting- Data mining techniques- Structured data, typical sources- Small to mid-size datasets

- Optimizations and predictive analytics- Complex statistical analysis- All types of data, and many sources- Very large datasets- More of a real-time

Who’s Generating Big Data in Health Care

Where Does the Data Come From?

Supply Chain and Revenue Cycle

Clinical and HIM Administrative• Structured

─ EHR ─ HIS

• Unstructured─ Image based – PACS

and radiology, EKG’s, Monitor data

─ Insurance card, patient photo, consent forms, orders

─ Paper based patient information

• Semi-Structured─ DNA-RNA- Protein

Genomics

• Human Resources– HR Management

Systems – Documents such as

new hire paperwork, employee records, credentialing, etc.

• Legal– Documents include

contracts and agreements, correspondence, compliance

• Finance– Statements

• Business Office– Back Office

• Supply Chain

– Materials Management

– Documents such as requisitions, purchase orders, invoices, packing slips, receiving paperwork

• Revenue Cycle

– Pre-registration

– Denials Management

– Documents include EOB’s, correspondence

Definition of Translational Bioinformatics (TBI)

• Development of storage, analytic, and visualization methods.

Bergman, 2010

Our Aim

Personalizing Health & Care (PHC)

1. Better understanding health, ageing & disease2. Effective health promotion, prediction, screening and disease prevention3. Early diagnosis (detection)4. Innovative treatments & technologies5. Advancing active & healthy ageing6. Integrated, sustainable, citizen-centered care7. Improving health information, data exploitation & knowledge translation

Approach for 4P

Basic Biomedic

al Research

Clinical Knowled

ge&

Research

Population

Health

Personal Health

Public HealthTranslational Research

Text mining, BioPatch, CAMA, DzMap, CKD, PWAS, Drug repositionReverse translational research

Data Interaction Model for Translational Bioinformatics Research

Patient Profile

Diagnosis/Problem

ProceduresMedication

Lab/Exam

Age, sex, allergy, weight, height, blood type, body

temperature, …etc.

YC (Jack) Li et. al., 2004

Current and/or chronic dz, malignancy,

Pregnancy…etc.

Surgery, transfusion, endoscopy,

angiogram, PTCA, rehabilitation…etc.

Fluoruracil vs Theophylline,

Doxorubicin vs Methotrexate, …etc.

CBC, D/C, LFT, hCG, PT, APTT, INR…etc.

e.g. Fluorouracil vs thrombocytopenia

e.g. Wafarin vs colonoscopy

e.g. Tamoxifen vs Nausea

e.g. Valproic acid vs

pregnancyGene

The Galaxy of Disease Map

Forwarding towards

• Formulate new questions and become much more agile

• Make evidence based decisions• Democratize your data• Visualize invisible knowledge

• Big data is here – now• Data breaches• Intrusion of privacy• Unfair use of Data

Big Data in Health Care

Big Data Technology

What Technology Do We HaveFor Big Data ??

Hadoop NoSQL Databases Analytic Databases

Hadoop• Low cost, reliable

scale-out architecture• Distributed computing

Proven success in Fortune 500 companies

• Exploding interest

NoSQL Databases• Huge horizontal scaling

and high availability• Highly optimized for

retrieval and appending• Types

• Document stores• Key Value stores• Graph databases

Analytic RDBMS• Optimized for bulk-load

and fast aggregate query workloads

• Types• Column-oriented• MPP• In-memory

Major Hadoop Utilities

Apache Hive

Apache Pig

Apache HBase

Sqoop

Oozie

Hue

Flume

Apache Whirr

Apache Zookeeper

SQL-like language and metadata

repository

High-level language for

expressing data analysis programs

The Hadoop database. Random,

real -time read/write access

Highly reliable distributed

coordination service

Library for running Hadoop in the

cloud

Distributed service for collecting and aggregating log and event data

Browser-based desktop interface

for interacting with Hadoop

Server-based workflow engine

for Hadoop activities

Integrating Hadoop with

RDBMS

Thank you for attention.

Q/A