+ All Categories
Home > Documents > IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf ·...

IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf ·...

Date post: 02-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
24
Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee Boxiang Dong Ruilin Liu Wendy Hui Wang Department of Computer Science Stevens Institute of Technology Hoboken, NJ December 10, 2013
Transcript
Page 1: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Integrity Verification of OutsourcedFrequent Itemset Mining with

Deterministic Guarantee

Boxiang Dong Ruilin Liu Wendy Hui Wang

Department of Computer ScienceStevens Institute of Technology

Hoboken, NJ

December 10, 2013

Page 2: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Data-mining-as-a-service (DMaS)

Data Mining as a Service:

• Weak client• Computationally powerful service provider (e.g. cloud)• Result integrity: are the returned mining results the sameas if the computation were locally executed?

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

2 / 25

Page 3: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Outsourcing Setting

• We focus on the problem of result integrity of outsourcedfrequent itemset mining.

• The architecture of outsourcing frequent itemset mining

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

3 / 25

Page 4: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Verification Goal

Given a transaction dataset D and its correct frequent itemsetmining result F , let F S be the errorneous mining result thatthe server returns.• Integrity concerns:

Completeness no frequent itemset is missing in F S .Correctness all itemsets in F S are frequent.

• We propose an efficient approach to catchincorrect/incomplete mining result with 100% certainty.

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

4 / 25

Page 5: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Verification Framework

• The server constructs cryptographic proofs of the miningresults.• We use the set intersection verification protocol[PTT11]to construct the proofs.

• Use the proof to verify the true support of afrequent/infrequent itemset.

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

5 / 25

Page 6: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Set Intersection Verification ProtocolGiven a collection sets S = {S1, . . . , Sm}, an intersectionresult Y = {y1, . . . , yδ}, Y = S1 ∩ S2 ∩ · · · ∩ Sm is the correctintersection of S if and only if:• (Y ⊆ S1) ∧ · · · ∧ (Y ⊆ Sm) (subset condition);

• (S1 − Y ) ∩ · · · ∩ (Sm − Y ) = ∅ (completeness condition).

[PTT11] server prepares Π(Y ) = {B,A,W, C} client checkscoefficients B ={bδ, bδ−1, · · · , b0} of B ={b0, . . . , bδ}polynomial (s + y1)(s + y2) · · · (s + yδ) are correct.

accumulation values A ={acc(Sj)|∀Sj ∈ S} A are correct

where acc(Sj) = g∏

x∈Sj(s+x)

subset witness W = {Wj |∀Sj ∈ S} e(∏|Y |

k=0(g sk )bk ,Wj)

where Wj = gPj (s), ?= e(acc(Sj), g)

Pj(s) =∏

x∈Sj−Y (x + s) for j = 1, · · · ,mcompleteness witness C = {Cj |∀Sj ∈ S}

∏mj=1 e(Wj ,Cj)

for each set Sj ∈ S, Cj = gqj (s) ?= e(g , g)

s.t. q1(s)P1(s) + q2(s)P2(s) + · · ·+ qm(s)Pm(s) = 1

6 / 25

Page 7: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Set Intersection Verification ProtocolGiven a collection sets S = {S1, . . . , Sm}, an intersectionresult Y = {y1, . . . , yδ}, Y = S1 ∩ S2 ∩ · · · ∩ Sm is the correctintersection of S if and only if:• (Y ⊆ S1) ∧ · · · ∧ (Y ⊆ Sm) (subset condition);

• (S1 − Y ) ∩ · · · ∩ (Sm − Y ) = ∅ (completeness condition).

[PTT11] server prepares Π(Y ) = {B,A,W, C} client checkscoefficients B ={bδ, bδ−1, · · · , b0} of B ={b0, . . . , bδ}polynomial (s + y1)(s + y2) · · · (s + yδ) are correct.

accumulation values A ={acc(Sj)|∀Sj ∈ S} A are correct

where acc(Sj) = g∏

x∈Sj(s+x)

subset witness W = {Wj |∀Sj ∈ S} e(∏|Y |

k=0(g sk )bk ,Wj)

where Wj = gPj (s), ?= e(acc(Sj), g)

Pj(s) =∏

x∈Sj−Y (x + s) for j = 1, · · · ,mcompleteness witness C = {Cj |∀Sj ∈ S}

∏mj=1 e(Wj ,Cj)

for each set Sj ∈ S, Cj = gqj (s) ?= e(g , g)

s.t. q1(s)P1(s) + q2(s)P2(s) + · · ·+ qm(s)Pm(s) = 17 / 25

Page 8: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Basic Solution

Given a dataset D that contains n unique items, the clientdoes the following:

1 Build the item-based inverted index E I that consists of ninverted lists {L1, . . . , Ln}.

2 Construct the Merkle hash tree T of the inverted index.• Leaf lj is assigned hj = hash(acc(Lj)

(s+j)).• Internal node v with children c1, . . . , ck is assignedhv = hash(hc1 || . . . ||hck ).

Mapping to the set intersection verification problemVerifying whether any itemset I is included in a set oftransactions T I is equivalent to verifying whether T I is thecorrect intersection of the inverted lists of all items in I .

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

8 / 25

Page 9: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Basic Solution

Given a dataset D that contains n unique items, the clientdoes the following:

1 Build the item-based inverted index E I that consists of ninverted lists {L1, . . . , Ln}.

2 Construct the Merkle hash tree T of the inverted index.• Leaf lj is assigned hj = hash(acc(Lj)

(s+j)).• Internal node v with children c1, . . . , ck is assignedhv = hash(hc1 || . . . ||hck ).

Mapping to the set intersection verification problemVerifying whether any itemset I is included in a set oftransactions T I is equivalent to verifying whether T I is thecorrect intersection of the inverted lists of all items in I .

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

9 / 25

Page 10: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Basic Solution

Given a dataset D that contains n unique items, the clientdoes the following:

1 Build the item-based inverted index E I that consists of ninverted lists {L1, . . . , Ln}.

2 Construct the Merkle hash tree T of the inverted index.• Leaf lj is assigned hj = hash(acc(Lj)

(s+j)).• Internal node v with children c1, . . . , ck is assignedhv = hash(hc1 || . . . ||hck ).

Mapping to the set intersection verification problemVerifying whether any itemset I is included in a set oftransactions T I is equivalent to verifying whether T I is thecorrect intersection of the inverted lists of all items in I .

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

10 / 25

Page 11: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Basic Solution

Drawbacks• Total number of proofs is 2n − 1.• Too much overhead. 11 / 25

Page 12: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Verification Optimization

Maximal frequent itemset (MFI) A subset of F S s.t. for eachitemset I ∈ MFI , there does not exist any itemsetI ′ ∈ F S s.t. I ⊆ I ′.

Minimal infrequent itemset (MII) A set of itemsets that do notappear in F S s.t. for each itemset I ∈ MII , theredoes not exist any itemset I ′ 6∈ F S s.t. I ′ ⊆ I .

(Itemsets in dotted rectangles are maximal frequent itemsets.)

Advantage |MFI |+ |MII | � |F S |+ |F S | 12 / 25

Page 13: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Optimized Solution

Security Analysis Our optimized solution provides the samesecurity guarantee as the basic solution.

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

13 / 25

Page 14: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Complexity

Proof construction at server side O(Mlog 3M + nεlogn)

• M =∑

I∈MFI∪MII

∑i∈I |Li |

• n is the number of unique items of D.• ε ∈ (0, 1)

Verification at client side O(N + F )

• N =∑

I∈MFI∪MII |I |• F =

∑I∈MFI∪MII sup(I )

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

14 / 25

Page 15: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Experiments

• EnvironmentLanguage C++

Testbed Macbook Pro, 2.4GHz CPU, 4 GB memory

• Dataset # of # of Avg. trans. minsup # of freq.trans. items length itemsets

S1 103 49 10 250 36S2 104 49 10 250 3854S3 105 49 10 250 149744S4 106 49 10 250 3074610R 500 100 2.4 5 97

• Simulation of malicious actionsError ratio r = 1%, 2%, 5%, 10%, 20%Incomplete Randomly delete r percent mining result.

Incorrect Randomly insert r percent infrequent itemsets.15 / 25

Page 16: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Proof Optimization Ratio & VerificationTime

Optimization Ratio & Verification Time (R dataset)

0

0.2

0.4

0.6

0.8

1

1% 2% 5% 10% 20%

Opt

imiz

atio

n R

atio

(%

)

Error Ratio (%)

Completeness VerificationCorrectness Verification

0 0.05 0.1

0.15 0.2

0.25 0.3

0.35 0.4

0.45

1% 2% 5% 10% 20%

Ver

ifica

tion

Tim

e (S

econ

ds)

Error Ratio (%)

Completeness VerificationCorrectness Verification

(a) Proof optimization ratio (b) Client verification time

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

16 / 25

Page 17: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Scalability

Scalability (error ratio=1%)

0 5

10 15 20 25 30 35 40

103 104 105 106

Tim

e (S

econ

ds)

Dataset Size

0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008

103 104 105 106Ver

ifica

tion

Tim

e (S

econ

ds)

Dataset Size

(a) Construction time of one proof (itemset length = 3) (b) Client verification time

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

17 / 25

Page 18: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

References I

[Bab85] László Babai.Trading group theory for randomness.In Proceedings of the seventeenth annual ACM symposium on Theory of computing, pages421–429. ACM, 1985.

[DLW13] Boxiang Dong, Ruilin Liu, and Hui Wendy Wang.Result integrity verification of outsourced frequent itemset mining.In Data and Applications Security and Privacy XXVII, pages 258–265. Springer, 2013.

[GGP10] Rosario Gennaro, Craig Gentry, and Bryan Parno.Non-interactive verifiable computing: Outsourcing computation to untrusted workers.In Advances in Cryptology–CRYPTO 2010, pages 465–482. Springer, 2010.

[GMR89] Shafi Goldwasser, Silvio Micali, and Charles Rackoff.The knowledge complexity of interactive proof systems.SIAM Journal on computing, 18(1):186–208, 1989.

[LWM+12] Ruilin Liu, Hui Wendy Wang, Anna Monreale, Dino Pedreschi, Fosca Giannotti, andWenge Guo.Audio: an integrity auditing framework of outlier-mining-as-a-service systems.In Proceedings of the 2012 European conference on Machine Learning and KnowledgeDiscovery in Databases-Volume Part II, pages 1–18. Springer-Verlag, 2012.

[PJRT05] HweeHwa Pang, Arpit Jain, Krithi Ramamritham, and Kian-Lee Tan.Verifying completeness of relational query results in data publishing.In Proceedings of the 2005 ACM SIGMOD international conference on Management ofdata, pages 407–418. ACM, 2005.

18 / 25

Page 19: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

References II

[PRV12] Bryan Parno, Mariana Raykova, and Vinod Vaikuntanathan.How to delegate and verify in public: Verifiable computation from attribute-basedencryption.In Theory of Cryptography, pages 422–439. Springer, 2012.

[PTT11] Charalampos Papamanthou, Roberto Tamassia, and Nikos Triandopoulos.Optimal verification of operations on dynamic sets.In Advances in Cryptology–CRYPTO 2011, pages 91–110. Springer, 2011.

[RHPH13] Liu Ruilin, (Wendy) Wang Hui, Mordohai Philippos, and Xiong Hui.Integrity verification of k-means clustering outsourced to infrastructure as a service (iaas)providers.In Proceedings of 2013 SIAM International Conference on Data Mining (SDM), pages632–640. SIAM, 2013.

[Sio05] Radu Sion.Query execution assurance for outsourced databases.In Proceedings of the 31st international conference on Very large data bases, pages601–612. VLDB Endowment, 2005.

[WCH+09] Wai Kit Wong, David W Cheung, Edward Hung, Ben Kao, and Nikos Mamoulis.An audit environment for outsourcing of frequent itemset mining.Proceedings of the VLDB Endowment, 2(1):1162–1173, 2009.

[XWYM07] Min Xie, Haixun Wang, Jian Yin, and Xiaofeng Meng.Integrity auditing of outsourced data.In Proceedings of the 33rd international conference on Very large data bases, pages782–793. VLDB Endowment, 2007.

19 / 25

Page 20: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Q & A

Thank you!

Questions?

Page 21: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Related Work

Verifiable Computation• [Bab85, GMR89, PRV12, GGP10] the expensivepre-processing phase is amortized over the futureexecutions.

Integrity Verification of Database-as-a-Service (DaS)• [PJRT05, Sio05, XWYM07] provide assurance for SQLquery results.

Integrity Verification of DMaS• [WCH+09, DLW13] only provide probabilistic resultintegrity guarantee.

• [LWM+12, RHPH13] focus on other mining tasks (outlierdetection, clustering)

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

21 / 25

Page 22: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Related Work

Verifiable Computation• [Bab85, GMR89, PRV12, GGP10] the expensivepre-processing phase is amortized over the futureexecutions.

Integrity Verification of Database-as-a-Service (DaS)• [PJRT05, Sio05, XWYM07] provide assurance for SQLquery results.

Integrity Verification of DMaS• [WCH+09, DLW13] only provide probabilistic resultintegrity guarantee.

• [LWM+12, RHPH13] focus on other mining tasks (outlierdetection, clustering)

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

22 / 25

Page 23: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Related Work

Verifiable Computation• [Bab85, GMR89, PRV12, GGP10] the expensivepre-processing phase is amortized over the futureexecutions.

Integrity Verification of Database-as-a-Service (DaS)• [PJRT05, Sio05, XWYM07] provide assurance for SQLquery results.

Integrity Verification of DMaS• [WCH+09, DLW13] only provide probabilistic resultintegrity guarantee.

• [LWM+12, RHPH13] focus on other mining tasks (outlierdetection, clustering)

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

23 / 25

Page 24: IntegrityVerificationofOutsourced FrequentItemsetMiningwith ...dongb/slides/icdm2013.pdf · ReferencesI [Bab85]LászlóBabai. Tradinggrouptheoryforrandomness. InProceedingsoftheseventeenthannualACMsymposiumonTheoryofcomputing,pages

Client versus Server

Comparison on S1 dataset

minsup # of Freq. Client side Server sideItemsets Verify Proof prep. mining

402 10 0.000164 24.72 0.03707203 50 0.001358 266.985 0.08984157 99 0.00332 572.591 0.1355

(time measured in seconds)

Integrity Verification of Outsourced Frequent Itemset Mining withDeterministic Guarantee. ICDM 13. Dong, Liu, Wang

24 / 25


Recommended