Date post: | 13-Feb-2017 |
Category: |
Technology |
Upload: | hakka-labs |
View: | 224 times |
Download: | 1 times |
Time | tīm | noun The indefinite continued progression of existence and events that occur in apparently irreversible succession from the past through the present to the future.
Typical lifecycle of data
App1459671246251
User clicks on a button 1459671246251
App1459671246251
User clicks on a button 1459671246251
Life event = publish event
Clover’s lifecycle of data
CloverClover publishes claim data in our DWH and knows a member went to the doctor on Jan 10, 2016 as of Apr 10, 2016
Member goes to doctor (Jan 10, 2016)
Billing enters claims (Apr 1, 2016)
Transaction clearinghouse systems and third-party claims processor
Data entry human at claims processor
Our pipelines
Happy path
What did Clover really know about someone’s health event that happened on Jan 10, 2016 as of April 2016 , May 2016, vs Jun 2016? What did the claims processor know?
Oops, processed the claim wrong (restatement Apr 11, 2016)
Oops, there was a data entry error
(restatement Jun 11, 2016)
Unreliable pathOops, the pipeline is broken
(breakage Jun 12, 2016)
Operational complexity
Members Providers Financials
Clover Data Platform
Applications, Data warehouse, Data Science models
Trying to figure out what happened
• [drawing - WHY IS THE TRASH CAN ON FIRE]
• ENTAILS: VASE, CRYING CHILD, CAT POOPING ON THE FLOOR
Footer
• Uniform treatment of event logs and snapshots
• Reproduce event and snapshot views from one structure
• Relatively simpler data access patterns
How this helps us
Footer
Why we use relational (PostgreSQL)
• Industry standard
• Wide adoption
• Robust
• Approachable
• Not constrained by scale
• Distributed / sharding
• Transactions!
• Global clock!
PostgreSQL
• Not limited to scalar types
• GiST indexes!
• Exclusion constraints!
Footer
An example of bitemporal merge in SQL
INSERT tableSELECT id id, LOWER(publish_tr) publish_tb, TSRANGE(LOWER(publish_tr), `publish_ts`, '[)') publish_tr, effective_tr effective_tr, state stateFROM tableWHERE id = `id` AND publish_tr @> `publish_ts`UNION ALLSELECT `id` id, `publish_ts` publish_tb, TSRANGE(`publish_ts`, NULL, '[)') publish_tr, TSRANGE(`effective_tb`,`effective_te`,'[)') effective_tr, state stateON CONFLICT UPDATESET publish_tr = publish_tr, effective_tr = effective_tr, state = state
Abstracting that away — SQLAlchemy
@sa_compiler.compiles(BitemporalMerge)
def _bitemporal_merge(element, compiler, **kw):
return (';\n').join([
compiler.process(element.create_stg_working_table),
compiler.process(element.process_intersecting_set),
compiler.process(element.publish_new_set),
compiler.process(element.clean_up_working_tables),
])
Abstracting that away Alembic
@Operations.register_operation('create_bitemporal_table')
class CreateBitemporalTableOp(MigrateOperation):
"""Create a bitemporal src table”""
identities = identities or []
identity_constraints = [(expr, '=') for identity, expr in identities.items()]
additional_exclusions = additional_exclusions or []
exclusion_contraints = identity_constraints + additional_exclusions
exclusion = sa.dialects.postgresql.ExcludeConstraint(
('published_as_of', '&&'),
('{}'.format(self.as_on_name), '&&'),
*exclusion_contraints)
current_publish_ixes = []
current_publish_current_as_on_ixes = []
…..
Temporality as a concept
import sqlalchemy as sa
import clover_web.models.temporal as temporal
@temporal.add_clock('prop_a', 'prop_b')
class MyModel(temporal.Clocked, SomeBase):
prop_a = sa.Column(sa.Integer)
prop_b = sa.Column(sa.Text)
prop_a_hm = temporal.get_history_model(MyModel.prop_a)
Using the time machine
What was member’s status according to the claims processor on Dec 1, 2015?
What was member’s status according to us on Dec 1, 2015?
What is the member’s current full effective history?
What is our latest understanding of the member’s status according to the claims processor?
Footer
• How do we know if a call queue campaign was successful?
• How do we know how and where to deploy our nurses?
• How do we know what impact a certain data integration will have on understanding the risk profile of our members?
Making meaningful decisions about health outcomes
Footer
• Richard Snodgrass (http://www.cs.arizona.edu/~rts/publications.html)
• Developing Time-Oriented Databases in SQL
Further resources