Confidential Computing - Analysing Data Without Seeing Data

Post on 15-Apr-2017

48 views 1 download

transcript

www.csiro.au

DataAnaly1csWITHOUTSeeingtheDataMaxO>…withinputfromtheen1reN1Teammax.o>@data61.csiro.au

FutureValueofData

Data Analytics Without Seeing the Data 2|

time

value

release

Data decays with time!

FutureValueofData

Data Analytics Without Seeing the Data 3|

time

value

release

Joined with another data set – more value!!

FutureValueofData

Data Analytics Without Seeing the Data 4|

time

value

release New analytics techniques – more value!!

FutureValueofData

Data Analytics Without Seeing the Data 5|

time

value

release Data decay

+ Joining new data

+ New analytics techniques

Uncertain future value Unknown future risk

Challenge

Computa.on

Result

Confidential

Learnthis!

LearnNOTHING

DataAnaly.csWithoutSeeingtheData6|

TheProblem

Howcanwelearnvaluableinsightsfromsensi1vedatafrommul1pleorganisa.ons?

Insights

Sensitive data

Sensitive data

Joint Analysis

Confidential Confidential

DataAnaly.csWithoutSeeingtheData7|

ThreeBasicBuildingBlocks

• Privatecomputa.on• Arithme.conencryptednumbers

• Distributed,confiden.alanaly.cs• Distributedalgorithms,computa.on&protocols

• PrivateRecordLinkage•  Privacypreservingrecordlevelmatching

DataAnaly.csWithoutSeeingtheData8|

Solu1on(1):Privatecomputa1on

3 E7117593598749643033862322306020184392520845976281563526294981559259516861516633702469933935260534155369128712003211669147527394965883186987430405887069486581926553537132809459595364742532851158563479115837779718562708357817416015729957944589069202390269842442766563604072938327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597109…..........

655353713280945959536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805720754176976456091364980410538327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089…....

2 E

+ “+”

9536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805118867293477923338583556495053804260814673481851259710956280997821095895622448011352839812888469270046257630846965506077009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089270046257630846…....

D5

= =

DataAnaly.csWithoutSeeingtheData9|

Solu1on(1):Privatecomputa1on

3 E7117593598749643033862322306020184392520845976281563526294981559259516861516633702469933935260534155369128712003211669147527394965883186987430405887069486581926553537132809459595364742532851158563479115837779718562708357817416015729957944589069202390269842442766563604072938327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597109…..........

655353713280945959536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805720754176976456091364980410538327792655060957281939887206011322264791188672934779233385835564950538042608146734818512597009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089…....

2 E

+ “+”

9536474253285115856347911583777971856270835781741601572995794458906920239026984244276656360407297610413871592061969995217697451818900805118867293477923338583556495053804260814673481851259710956280997821095895622448011352839812888469270046257630846965506077009355808913268579338921386560873168564095306973507787453445216634333195600873200349632089270046257630846…....

D5

= =

10| DataAnaly.csWithoutSeeingtheData

Solu1on(2):Distributedanaly1cs

Compute

DataDept2

Compute

DataN1 Secure computeConfidentiality boundary

Dataalwaysremainsconfiden1altothesourceins.tu.on

Dept1

Compute N1 Coordinator

Messagescontainingencrypteddata

11| DataAnaly.csWithoutSeeingtheData

Solu1on(3):PrivateRecordLinkage

DatasetA DatasetB

Tori Mckone 7/06/1921 F

Tori Mackon 6/07/1921 F

Victoria Mckon 7/06/1921 F ?

?

12| DataAnaly.csWithoutSeeingtheData

UseCases

Scoring

Model

OwnData

OtherData

Quality

??

15| DataAnaly.csWithoutSeeingtheData

SuspiciousAc1vi1esNeedtoreport?

Model Builder

16| DataAnaly.csWithoutSeeingtheData

IndustryusingGovData

Model Builder

OwnData

GovData

17| DataAnaly.csWithoutSeeingtheData

Benchmarking

OwnData

Model Builder

18| DataAnaly.csWithoutSeeingtheData

DeviceAnaly1cs

Data Analytics Without Seeing the Data

Modelofnormalbehaviour

OK OK NG OK

PrivateModeling

learn

deploy

OK NG OK

19|

PrivateComputa1on

Homomorphicencryp1on

Partial Homomorphic

Encryption

Somewhat Homomorphic

Encryption

Fully Homomorphic

Encryption

Allows either addition or multiplication of encrypted numbers

Allows evaluation of low order polynomials

Allows evaluation of arbitrary functions

Mor

e ge

nera

l

Fast

er

DataAnaly.csWithoutSeeingtheData21|

PaillierEncryp1on

c = gmrnmodn2Encryption of m:

D E m1( ).E m2( )modn2( ) =m1 +m2 modn

D E m1( )m2 modn2( ) =m1m2 modn

Addition of encrypted numbers:

Multiplication of encrypted number by a scalar:

DataAnaly.csWithoutSeeingtheData22|

PaillierEncryp1on

c = gmrnmodn2Encryption of m:

Addition of encrypted numbers:

Multiplication of encrypted number by a scalar:

gm1 × gm2 = gm1+m2

gm1( )m2= gm1m2

DataAnaly.csWithoutSeeingtheData23|

PaillierImplementa1ons

• Python–opensource• www.github.com/nicta/python-paillier

• Java–opensource• www.github.com/nicta/javallier

• Javascript–s.llundercloseddevelopment

24| DataAnaly.csWithoutSeeingtheData

Distributed,Confiden1alAnaly1cs

DistributedCompu1ngwithaTwist

Compute

DataOrg2

Compute

DataN1 Secure computeConfidentiality boundary

Dataalwaysremainsconfiden1altothesourceorganisa.on

Org1

Compute N1 Coordinator

MessagescontainingONLYencrypteddata

DataAnaly.csWithoutSeeingtheData26|

GraphComputa1onEngine

Domains

CE

CE

CE

DF DF

CE

DF

CE

Coordinator

Worker

Workers

Properties

M

M

M

M M

Messages

M JSON Message

CE AKKA actors

DF Data frames

27| DataAnaly.csWithoutSeeingtheData

N1Analy1csPla[orm

Privacy Technologies

Partial homomorphic encryption

Private Record Linkage

Irreversible aggregation

Distributed Graph Computation Engine

Analytics Statistics Regression Clustering

Data Auth

Machine Learning Learn Evaluate Deploy

Network

DataAnaly.csWithoutSeeingtheData28|

Logis1cRegression

p x;θ( ) = 11+ e−θ .x

L θ( ) = yi log p xi;θ( )+ 1− yi( )i=0

n

∑ log 1− p xi;θ( )( )

Logis.cfunc.on

Loglikelihood

Minimisefor:

Evaluate:

θ

Requires“securelog”and“secureinverse”protocolusingPaillierencryp.on

29| DataAnaly.csWithoutSeeingtheData

Builds on Han et al. 2010 “Privacy Preserving Gradient Descent Methods”

ExamplePaillierLogis1cRegression

Org B

CE CE

Coordinator

Worker

Secure Log

Logistic Learner

Secure Inverse

M JSON Message

CE AKKA actors

DF Data frames

Gradient Descent

Private key holder

Features & labels Features

Org A

N1Analytics

30| DataAnaly.csWithoutSeeingtheData

Performance

•  Learning•  Learntmodelshavethesame

accuracyasunencryptedcalcula.ons

•  “Privatelearning”is(1000x)slowerduetoencryptedcomputa.ons.Learning.mesareseveralhours.

•  Deployment•  Ascorecanbegeneratedinreal

.me(<50ms)•  Customerdatathatcontributesto

thescoreremainsprivate.

��� ���� ������������� (����)

���

����

�������

�������� ���� (�)

�������� �������� ����������������� ���� ��� ����

31| DataAnaly.csWithoutSeeingtheData

Scaling

Coordinator

Data Provider 1

Data Provider 2

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

��������

●●

● ●●

■■ ■ ■

◆◆ ◆

0 100 200 300 400Cores

5

10

50

100

500Minutes

Learning time scaling

● 10,000x10 features

■ 100,000x10 features

◆ 1,000,000x10 features

32| DataAnaly.csWithoutSeeingtheData

Confiden1alRecordLinkage

RecordLinkageChallenge

DatasetA DatasetB

Tori Mckone 7/06/1921 F

Tori Mackon 6/07/1921 F

Victoria Mckon 7/06/1921 F ?

?

41| DataAnaly.csWithoutSeeingtheData

Solu1on(3):PrivateRecordLinkage

JaneDoe

PaulDoe

JimClark

KateClark

ShanBo

RegPal

JanetDoe

BobDoe

JimClark

KatClark

ShanBo

JoeSmith

a8bf342

f72630b

14oe54

a72bef4

7830530

4bf6021

a8bf242

b3894f3

14oe54

672bef4

7830530

80ac364FuzzyMatching

Onewayhashfunc.ons Onewayhashfunc.ons

42| DataAnaly.csWithoutSeeingtheData

PrivateRecordLinkage

FuzzyMatcher

SharedSecretSaltHasher

PersonallyIden.fiableInforma.on

AnonymousBloomfilter

Hasher

PersonallyIden.fiableInforma.on

AnonymousBloomfilter

LinkageTableN1

CompanyA CompanyB

PIIcannotberecoveredfromthehashes43| DataAnaly.csWithoutSeeingtheData

PrivateRecordLinkage

44|

44

Organisa.onB

FuzzyMatcher

Organisa.onA

N1Analy.cs

A's$PII$dataName DOB Gender

John/Smith 12/01/82 MMark/Gorgon 1/12/90 MHanna/Smith 4/02/78 F

… … …… … …

Juliet/Baker 2/11/72 F

B's$PII$dataName DOB Gender

Mark.Gorgon 1/12/90 MJuliet.Baker 2/11/72 F

Andrew.Roberts 4/02/93 M… … …… … …

Hanna.Smith 4/02/78 F

A's$Cryptographic$HashesRow Key

1 10110110...001010102 01110110...110101013 10011001...10100110… …… …

100000 01101011...00101101

B's$Cryptographic$HashesRow Key

1 01110110…110101012 01101011...001011013 01111000…00110011… …… …

100000 10011101...10100111

SharedSecretSaltHasher Hasher

Linkage(TableRow$A Row$B

1 X2 13 100000… …… …

100000 X

Similar in approach to MERLIN - Ranbaduge, Vatsalan, Christen (2015) DataAnaly.csWithoutSeeingtheData

Probabilis1cRecordLinkage

Commoncategoricalfeatures(e.gpostcode,agerange,gender)

Recordlinkagecanbeaprivacyissue

45| DataAnaly.csWithoutSeeingtheData

Classifica1onwithoutiden1tylinking

46|

FeaturesLabe

lsRadosFeatures

Shared

feature

Labe

ls*

LabelPropor.ons

Learning from Label Proportions

Patrini, Nock, Caetano, & Rivera, NIPS (2014), (Almost) No label no cry

DataAnaly.csWithoutSeeingtheData

Classifica1onwithoutiden1tylinking

47|

FeaturesLabe

lsRadosFeatures

Shared

feature

Labe

ls*

EncryptedLabelPropor.ons

Learning from Encrypted Label Proportions

DataAnaly.csWithoutSeeingtheData

CurrentStatus

CurrentCapabili1esofN1pla[orm

•  Standarddataanaly.cstechniquesonconfiden.aldata:•  Correla.onanalysis•  Classifica.on/predic.on•  Regression•  Clustering/outlierdetec.on

•  Automatedprivaterecordlinkage

•  Finegrainedauthorisa.onandaccesscontrol

Dept1

Org2

Comp3Privaterecord

linkage

Sta.s.cs Classifiers AnomalyDetec.on

Privateanaly.cs

Federatedmodel–NocentraldatabaseDataiskeptlocaltothesource

49| DataAnaly.csWithoutSeeingtheData

Betaprogram

• Notopensourced(yet!)• Lookingforpartnerswhowanttouseoursystemintheirapplica1ons

• S.llsomewarts,butworkingincommercialsesng

50| DataAnaly.csWithoutSeeingtheData

Acknowledgements

51|

Engineering Mr. Brian Thorne Dr. Mentari Djatmiko Dr. Guillaume Smith Dr. Wilko Hanecka Dr. Hamish Ivey-Law

Research Dr. Richard Nock Mr. Giorgio Patrini Dr. Roksana Borelli Dr. Arik Friedman Prof. Hugh Durrant-Whyte

Business Mr. Warren Bradey Ms. Shelley Copsey

Lead: Dr. Stephen Hardy

DataAnaly.csWithoutSeeingtheData

www.csiro.au

DataAnaly1csWithoutSeeingtheDataMaxO>…withinputfromtheen1reN1Teammax.o>@data61.csiro.au