+ All Categories
Home > Technology > DataEngConf SF16 - Multi-temporal Data Structures

DataEngConf SF16 - Multi-temporal Data Structures

Date post: 13-Feb-2017
Category:
Upload: hakka-labs
View: 224 times
Download: 1 times
Share this document with a friend
42
Clover is building a Time Machine DataEngConf Apr 7, 2016
Transcript

Cloveris building a Time Machine

DataEngConf Apr 7, 2016

What is time?

Time | tīm | noun The indefinite continued progression of existence and events that occur in apparently irreversible succession from the past through the present to the future.

”~The Internet

A software engineer’s worst nightmare

Who are these people talking about time machines?!

< Jasmine

Alyssa >

Clover Health is reinventing the health insurance model by using data

Complex & Rich.

Made with love and COBOL

Footer

Footer

Not owning your data

Typical lifecycle of data

App1459671246251

User clicks on a button 1459671246251

App1459671246251

User clicks on a button 1459671246251

Life event = publish event

Clover’s lifecycle of data

CloverClover publishes claim data in our DWH and knows a member went to the doctor on Jan 10, 2016 as of Apr 10, 2016

Member goes to doctor (Jan 10, 2016)

Billing enters claims (Apr 1, 2016)

Transaction clearinghouse systems and third-party claims processor

Data entry human at claims processor

Our pipelines

Happy path

What did Clover really know about someone’s health event that happened on Jan 10, 2016 as of April 2016 , May 2016, vs Jun 2016? What did the claims processor know?

Oops, processed the claim wrong (restatement Apr 11, 2016)

Oops, there was a data entry error

(restatement Jun 11, 2016)

Unreliable pathOops, the pipeline is broken

(breakage Jun 12, 2016)

Operational complexity

Operational complexity

Members Providers Financials

Clover Data Platform

Applications, Data warehouse, Data Science models

Trying to figure out what happened

• [drawing - WHY IS THE TRASH CAN ON FIRE]

• ENTAILS: VASE, CRYING CHILD, CAT POOPING ON THE FLOOR

In order to predict what will happen

Not just about trash cans on fire

In the context at Clover Health

How do we make decisions to affect health outcomes?

‹#›Footer

Handling complexity

Temporal data structures

Current state and friends

Lossy!Hard to

analyze!

Current state batch redux

Upsert?

Replace?

Keeping the event log

Sensible!!! (kinda)

Restating history (event log)

Amnesia!

Footer

Time: It’s a matter of perspective

1/1

1/3

Two time dimensionsEffective Time

Publ

ishe

d Ti

me

10/154/15

2/5

3/1

4/20

11/193/2

6/2

8/5

Time as spatial data

Rectangles!

Footer

• Uniform treatment of event logs and snapshots

• Reproduce event and snapshot views from one structure

• Relatively simpler data access patterns

How this helps us

Footer

Multi-temporal

CloverMember goes to doctor

Claims SFTP DWHClearinghouse

Footer

Implementations at Clover

Footer

Why we use relational (PostgreSQL)

• Industry standard

• Wide adoption

• Robust

• Approachable

• Not constrained by scale

• Distributed / sharding

• Transactions!

• Global clock!

PostgreSQL

• Not limited to scalar types

• GiST indexes!

• Exclusion constraints!

Footer

An example of bitemporal merge in SQL

INSERT tableSELECT id id, LOWER(publish_tr) publish_tb, TSRANGE(LOWER(publish_tr), `publish_ts`, '[)') publish_tr, effective_tr effective_tr, state stateFROM tableWHERE id = `id` AND publish_tr @> `publish_ts`UNION ALLSELECT `id` id, `publish_ts` publish_tb, TSRANGE(`publish_ts`, NULL, '[)') publish_tr, TSRANGE(`effective_tb`,`effective_te`,'[)') effective_tr, state stateON CONFLICT UPDATESET publish_tr = publish_tr, effective_tr = effective_tr, state = state

Abstracting that away — SQLAlchemy

@sa_compiler.compiles(BitemporalMerge)

def _bitemporal_merge(element, compiler, **kw):

return (';\n').join([

compiler.process(element.create_stg_working_table),

compiler.process(element.process_intersecting_set),

compiler.process(element.publish_new_set),

compiler.process(element.clean_up_working_tables),

])

Abstracting that away — Airflow

BitemporalMerge operator (template for a task)

Abstracting that away Alembic

@Operations.register_operation('create_bitemporal_table')

class CreateBitemporalTableOp(MigrateOperation):

"""Create a bitemporal src table”""

identities = identities or []

identity_constraints = [(expr, '=') for identity, expr in identities.items()]

additional_exclusions = additional_exclusions or []

exclusion_contraints = identity_constraints + additional_exclusions

exclusion = sa.dialects.postgresql.ExcludeConstraint(

('published_as_of', '&&'),

('{}'.format(self.as_on_name), '&&'),

*exclusion_contraints)

current_publish_ixes = []

current_publish_current_as_on_ixes = []

…..

Temporality as a concept

import sqlalchemy as sa

import clover_web.models.temporal as temporal

@temporal.add_clock('prop_a', 'prop_b')

class MyModel(temporal.Clocked, SomeBase):

prop_a = sa.Column(sa.Integer)

prop_b = sa.Column(sa.Text)

prop_a_hm = temporal.get_history_model(MyModel.prop_a)

Temporality as a conceptef

fect

ive/

valid

published

S3 archives/versions

Using the time machine

What was member’s status according to the claims processor on Dec 1, 2015?

What was member’s status according to us on Dec 1, 2015?

What is the member’s current full effective history?

What is our latest understanding of the member’s status according to the claims processor?

Footer

[drawing with cause and effect enumerated]

Figuring out what happened

Footer

• How do we know if a call queue campaign was successful?

• How do we know how and where to deploy our nurses?

• How do we know what impact a certain data integration will have on understanding the risk profile of our members?

Making meaningful decisions about health outcomes

Footer

• Richard Snodgrass (http://www.cs.arizona.edu/~rts/publications.html)

• Developing Time-Oriented Databases in SQL

Further resources

Footer

Questions


Recommended