+ All Categories
Home > Data & Analytics > Information Security in Big Data : Privacy and Data Mining

Information Security in Big Data : Privacy and Data Mining

Date post: 15-Apr-2017
Category:
Upload: wanani181
View: 642 times
Download: 4 times
Share this document with a friend
59
INFORMATION SECURITY IN BIG DATA: PRIVACY AND DATA MINING Wafaa Anani (MCDBA, MCSD) Electrical & Computer Engineering – Software Engineering, UWO [email protected]
Transcript
Page 1: Information Security in Big Data : Privacy and Data Mining

INFORMATION SECURITY IN BIG DATA: PRIVACY AND DATA MININGWafaa Anani (MCDBA, MCSD)Electrical & Computer Engineering – Software Engineering, [email protected]

Page 2: Information Security in Big Data : Privacy and Data Mining

INDEX Introduction Data Mining Roles

Data Provider Data Collector Data Miner Decision Maker

Game Theory None Technical Solution Future Research Area Conclusion References

Page 3: Information Security in Big Data : Privacy and Data Mining

INTRODUCTION

Big Data

Is a term that describes the large volume of data – both structured and unstructured.

Is a term used for data set so large or complex that it is difficult to process using traditional database and software techniques.

Data Mining

Data mining is the process of discovering interesting patterns and knowledge from large amount of data.

Data Mining has been successfully applied to many domains, such as business intelligence, web search, scientific discovery, digital library, etc.

Page 4: Information Security in Big Data : Privacy and Data Mining

MINING PROCESS

Page 5: Information Security in Big Data : Privacy and Data Mining

THE PROCESS OF DATA MINING (KDD)Data Mining is also refers to “Knowledge Discovery from Data” (KDD)

To obtain useful knowledge from data as the following steps :

Step 1 : Data Preprocessing (Data selection, cleaning, and integration)

Step 2 : Data Transformation (transform data into form appropriate for the mining task)

Step 3 : Data Mining (extract data patterns)

Step 4 : Pattern Evaluation and Presentation (present the knowledge in an easy to understand)

Page 6: Information Security in Big Data : Privacy and Data Mining

PRIVACY & DATA MINING Data Mining technologies bring serious threat to the security of individual’s

sensitive information.

Reduce the privacy risk brought by Data Mining operations.

We need to modify the data in such a way so as to perform Data Mining algorithms effectively without compromising the security of sensitive information contained in the data.

Page 7: Information Security in Big Data : Privacy and Data Mining

THE PRIVACY AND PPDM Individual’s privacy maybe violated due to the unauthorized access to

personal data. Thus there is a conflict between data mining and privacy security.

Privacy Preserving Data Mining (PPDM) To deal with the privacy issues in data mining. Objective of PPDM is to safeguard sensitive information from unsolicited or

unsanctioned disclosure, and mean while, preserve the utility of the data.

Consideration of PPDM is: 1. Sensitive raw data (IDs, Phone number.. Etc.) Should not be used in Data

Mining. 2. Sensitive mining results whose disclosure will result in privacy violation should

be excluded.

Page 8: Information Security in Big Data : Privacy and Data Mining

Data Database

Data Provider Data Collector Data Minor

Extracted Info.

Information Transmitter

Decision Maker

TYPE OF USERS, IN A TYPICAL DATA MINING PROCESS

The user who owns some

data that are desired by the data mining

task

The user who collects data

from data provider and

then publish it to the data miner

The user who performs data mining tasks on the data.

The user who makes decisions based on the data mining results in order to achieve certain goals

Page 9: Information Security in Big Data : Privacy and Data Mining

USER ROLE Privacy Concerns of each Role

Approaches to Privacy Protection Data ProviderData CollectorData MinerDecision Maker

Page 10: Information Security in Big Data : Privacy and Data Mining

DATA PROVIDERThe user who owns some data that are desired by the data mining task

Page 11: Information Security in Big Data : Privacy and Data Mining

DATA PROVIDER – CONCERNS If the Data Provider reveals his data to the Data Collector, his privacy might

be compromised due to the unexpected data breach.

The privacy concern of the Data Provider is weather he can take control over what kind of and how much information other people can obtain from his data.

Data Provider should be able to make his sensitive data, inaccessible to the data collector, However, the Data Provider has to provide some data, and get enough compensation for the possible loss in privacy

Page 12: Information Security in Big Data : Privacy and Data Mining

DATA PROVIDER – APPROACHES TO PRIVACY Limit The Access

Security tools developed for internet environment to protect data: Anti-tracking Extensions (Do Not Track Me, Ghostery, etc.) Advertisement and script blockers (AdBlock Plus, NoScript, FlashBlock, etc.) Encryption Tools (MailCloack, TorChat, etc.)

Trade Privacy Data Provider needs to make a trade-off between the loss of privacy and the benefit brought by participating in data

mining. Data Provider needs to know how to negotiate with the data collector, so that he will get enough compensation for any

possible loss in privacy Data Provider may be willing to provide his sensitive data to Data Collector who promises that his sensitive information will

not be revealed.

Provide False Data Using “Sockpuppets” to hide one’s true activities Using fake Identity to create phony information Using security tools to mask one’s Identity

Page 13: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTORThe user who collects data from data provider and then publish it to the data miner

Page 14: Information Security in Big Data : Privacy and Data Mining

Data Database

Data Provider Data Collector Data Minor

Extracted Info.

Information Transmitter

Decision Maker

TYPE OF USERS, IN A TYPICAL DATA MINING PROCESS

Page 15: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTOR – CONCERNS The original data collected from Data Providers usually contains a sensitive

information about individuals. If the Data Collector doesn’t take sufficient precautions before releasing the data to public or data miners, those sensitive information maybe disclosed.

It is necessary for the Data Collector to modify the original data before releasing it to others, so that sensitive information about the Data Provider can not be found.

The modifications to the data should retained the sufficient utility of the data after the modifications.

Page 16: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTOR APPROACHES TO PRIVACY

1. Basic Of PPDP2. Privacy-Preserving publishing of social

media3. Attack Model 4. Privacy-Preserving Publishing of

trajectory data

Page 17: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTOR – APPROACHES TO PRIVACYBASIC OF PPDPThe data modification process adopted by the Data Collector, with the goal of

preserving privacy, and utility simultaneously, is usually called Privacy-Preserving Data Publishing (PPDP)

Basic Of PPDP The original data is assumed to be private table consisting of multiple records, each

record contains : Identifier (ID), Quasi-Identifier (QID), Sensitive Attribute (SA), Non-sensitive Attribute (NSA).

The table should be anonymized before published to others, IDs should be removed, QID should modified.

K-Anonymity are the most privacy model used, among other privacy models.

Page 18: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTOR – APPROACHES TO PRIVACYBASIC OF PPDP Anonymization operations:

Generalization : Replace some values with a parent value Suppression : Replace some values with a special value e.g. ‘*’ Anatomization : De-associate the relationship between the QID and sensitive attribute Permutation: De-associate the relationship between the QID and the numerical Sensitive

attribute) Perturbation: Replace the original data value with synthetic data value, so the computation

would be still the same if it was to be done on the original data

The Anonymization operation will reduce the utility of the data, there are various metrics for measuring the information loss.

A fundamental problem of PPDP is how to make a trade-off between privacy and utility

Page 19: Information Security in Big Data : Privacy and Data Mining
Page 20: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTOR – APPROACHES TO PRIVACYPRIVACY-PRESERVING PUBLISHING OF SOCIAL MEDIA

Social network usually modeled as a graph, where the vertex represents an entity and the edge represent the relationship between two entities.

PPDP in the context of social network mainly deals with anonymizing graph data.

It is more challenging than anonymizing relation data table

There are three challenges in social network: Modeling adversary’s background knowledge about network is much harder Measuring the information loss in anonymizing social network data is harder than relations

data. Devising anonymization method for social network data is much harder than for relational

data.

Page 21: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTOR – APPROACHES TO PRIVACYATTACK MODEL

Given the anonymized network data, adversaries usually rely on background knowledge to de-anonymize individuals and learn relationships between de-anonymized individuals

Attack Model is to find the social relationship between the de-anonymized individuals.

Type of back ground knowledge: Attribute of vertices, vertex degrees, Link relationship, Neighborhoods, embedded

subgraphs and graph metrics

A proposed algorithm called ‘Seed-and-Grow’ to identify uses from an anonymized social graph. The algorithm identifies a seed sub-graph which is either planted by an attacker or divulged by collusion of small group of users, then grows the seed larger based on the existing knowledge of t user’s social relations. e.g. (Structural attack, Mutual friend attack, Friendship attack, degree attack.)

Page 22: Information Security in Big Data : Privacy and Data Mining
Page 23: Information Security in Big Data : Privacy and Data Mining
Page 24: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTOR – APPROACHES TO PRIVACYATTACK MODEL

Privacy Model In order to protect the privacy of relationship from the mutual friend attack, a variant of

k-anonymity introduces k-NMF anonymity.

If the Network satisfies k-NMF anonymity then each edge e, here will be at least k - 1 other edges with the same number of mutual friends as e. It can be guaranteed that the probability of an edge being identified is not greater than 1/k

Page 25: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTOR – APPROACHES TO PRIVACYATTACK MODEL

Data Utility In the context of network data anonymization, the implication of data utility is : whether

and to what extent properties of the graph are preserved.

Most Existing K-anonymization algorithms for network data publishing perform edge insertion and/or deletion operation, to reduce the utility loss.

Page 26: Information Security in Big Data : Privacy and Data Mining

DATA COLLECTOR – APPROACHES TO PRIVACYPRIVACY-PRESERVING PUBLISHING OF TRAJECTORY DATA

Location Based Services (LBS) : by utilizing the location information of individuals. Locate a restaurant, or monitor congestion levels of traffic

Use of private location information may raise a privacy issues in LBS, for publishing trajectory data of individuals.

Redefine the k-anonymity for trajectories and proposed (k, ẟ)-anonymity

Page 27: Information Security in Big Data : Privacy and Data Mining

DATA MINERThe user who performs data mining tasks on the data.

Page 28: Information Security in Big Data : Privacy and Data Mining

Data Database

Data Provider Data Collector Data Miner

Extracted Info.

Information Transmitter

Decision Maker

TYPE OF USERS, IN A TYPICAL DATA MINING PROCESS

Page 29: Information Security in Big Data : Privacy and Data Mining

DATA MINER – CONCERNS Personal Information can be directly observed in the data and data breach happens.

If the Data Miner is able to find out information underlying the data. (Sometimes the data mining may reveal sensitive information bout the data owners)

Data Miner also face the Privacy-Utility trade-off problem.

The main concern of the Data Miner is HOW to prevent sensitive information from appearing in the mining result

To perform a privacy-preserving data mining, the Data Miner usually need to modify the data he got from the Data Collector

Page 30: Information Security in Big Data : Privacy and Data Mining

DATA MINER – APPROACHES TO PRIVACY Based on the distribution of data, PPDM approaches can be classified:

Approaches for Centralized Data Mining Approaches for Distributed Data Mining

Horizontally partitioned data Vertically partitioned data

Page 31: Information Security in Big Data : Privacy and Data Mining

DATA MINER – APPROACHES TO PRIVACY With distributed data mining, Secure Multi-party Computation (SMC)

widely used

The goal of SMC to make sure that each participant can get the correct data mining result without revealing his data to others.

P1, P2, P3, ……….. , Pm Participants

X1, X2, X3, ………. , Xm Data

Page 32: Information Security in Big Data : Privacy and Data Mining

DATA MINER APPROACHES TO PRIVACY

Privacy-Preserving Association Rule Mining

Privacy-Preserving Classification

Privacy-Preserving Clustering

Page 33: Information Security in Big Data : Privacy and Data Mining

DATA MINER – APPROACHES TO PRIVACYPRIVACY-PRESERVING ASSOCIATION RULE MINING Privacy-Preserving Association Rule Mining

Finding interesting associations and correlation relationships among large set of data items (e.g. Basket Analysis)

Some of the rule considered to be sensitive Generate a sanitized data set (Rule Hiding)

Heuristic distortion approaches Heuristic blocking approaches Probabilistic distortion approaches Reconstruction-based approaches

Hybrid partial hiding (HPH) Inverse frequent set mining (IFM)

Page 34: Information Security in Big Data : Privacy and Data Mining
Page 35: Information Security in Big Data : Privacy and Data Mining
Page 36: Information Security in Big Data : Privacy and Data Mining

DATA MINER – APPROACHES TO PRIVACYPRIVACY-PRESERVING CLASSIFICATION Privacy-Preserving Classification

Classification : is a form of data analysis that extract models describing important data classes

Data Classification seen as two-steps: Step 1: Learning step, classification algorithm is employed to build a classifier

(Classification model). Step 2: the classifier is used for classification

Classification models : Decision Tree Naïve Bayesian Classification Support Vector Machine

Page 37: Information Security in Big Data : Privacy and Data Mining

DATA MINER – APPROACHES TO PRIVACY Privacy-Preserving Clustering

Clustering the data to group them.

Page 38: Information Security in Big Data : Privacy and Data Mining

DATA MINER Data Miner can modify the original data via randomization, blocking, or

reconstruction. The modification often has negative affect on the utility of the data.

Data Miner needs to make a balance between privacy and utility. The implication of privacy and utility vary with the characteristic of data and purpose of the mining task.

Page 39: Information Security in Big Data : Privacy and Data Mining

DECISION MAKERThe user who makes decisions based on the data mining results in order to achieve certain goals.

Page 40: Information Security in Big Data : Privacy and Data Mining

Data Database

Data Provider Data Collector Data Minor

Extracted Info.

Information Transmitter

Decision Maker

TYPE OF USERS, IN A TYPICAL DATA MINING PROCESS

Page 41: Information Security in Big Data : Privacy and Data Mining

DECISION MAKER – CONCERNS The privacy concerns of the Decision Maker are:

How to prevent unwanted disclosure of sensitive mining result

How to evaluate the credibility of the received mining result.

Page 42: Information Security in Big Data : Privacy and Data Mining

DECISION MAKER – APPROACHES TO PRIVACY 1ST Issue:

Legal Measures making a contract with the data miner to forbid the miner from disclosing the mining

result to a third party. 2nd Issue:

The Decision Maker can utilize methodologies from Data Provenance, credibility analysis of web information, or other related research fields

Page 43: Information Security in Big Data : Privacy and Data Mining

DECISION MAKER – APPROACHES TO PRIVACYDATA PROVENANCE Data Provenance :

The information that helps determine the derivation history of the data, starting from the original source

Provenance, which describe Where the data come from, and How the data evolved over the time, can help people to evaluate the credibility of the data.

Provenance contains two kinds of information: Ancestral data from which current data evolved. Transformations applied to ancestral data that helped to produce the current data.

However, in most cases provenance of the data mining results is not available

The major approach to present the provenance information is adding annotations to data.

Page 44: Information Security in Big Data : Privacy and Data Mining

DECISION MAKER – APPROACHES TO PRIVACYWEB INFORMATION CREDIBILITY Web Information Credibility

Users can differentiate false information from the truth based on : Authority : the real author of false information is usually not clear

Accuracy: false information does not contain accurate data

Objectivity: false information is often prejudicial

Currency: for false information, the data about its source, time, and place of its origin is incomplete, out of date or missing

Coverage : false information usually contains no effective links to other information online

Page 45: Information Security in Big Data : Privacy and Data Mining

GAME THEORY IN DATA PRIVACY

Page 46: Information Security in Big Data : Privacy and Data Mining

GAME THEORY PRELIMINARIES Game theory provides a formal approach to model situations where a group

of agents have to choose optimum actions considering the mutual effects of other agents' decisions.

The essential elements of a game are: players, actions, payoffs, and information.

Players have actions that they can perform at designated times in the game. As a result of the performed actions, players receive payoffs.

Page 47: Information Security in Big Data : Privacy and Data Mining

GAME THEORETICAL APPROACHES PRIVATE DATA COLLECTION AND PUBLICATION

In this data collection game, the level of privacy protection has significant influence on each player's action and payoff.

PRIVACY PRESERVING DISTRIBUTED DATA MINING SMC-Bases privacy preserving distributed Data Mining Recommender System Linear Progression as a non-cooperative game

DATA ANONYMIZATION

Page 48: Information Security in Big Data : Privacy and Data Mining

ASSUMPTIONS OF THE GAME MODEL Game Model :

Define the elements of the game, namely the players, the actions and the payoffs

Determine the type of the game: static or dynamic, complete information or incomplete information

Solve the game to find equilibriums Analyze the equilibriums to obtain some implications for practice

Page 49: Information Security in Big Data : Privacy and Data Mining
Page 50: Information Security in Big Data : Privacy and Data Mining

MECHANISM DESIGN AND PRIVACY PROTECTION The Data Collector wants Data Providers to participate in the data mining

activity, i.e. hand over their private data, but the Data Providers may choose to opt-out because of the privacy concerns. In order to get useful data mining results, the Data Collector needs to design mechanisms to encourage Data Providers to opt-in.

Mechanisms for Truthful Data Sharing A mechanism requires agents to report their preferences over the outcomes.

Privacy Auctions

Page 51: Information Security in Big Data : Privacy and Data Mining

NONE TECHNICAL SOLUTION

Page 52: Information Security in Big Data : Privacy and Data Mining

NONE TECHNICAL SOLUTIONS TO PRIVACY PROTECTION Law and regulations

USA – Privacy Act 1974 European commission – General Data Protection Regulation 2012

Industry conventions. Agreement between organization to how to collect, analyze, and store personal

data, should help to create Privacy-Safe environment

Enhance the education to increase the awareness of information security

Page 53: Information Security in Big Data : Privacy and Data Mining

FUTURE RESEARCH

Page 54: Information Security in Big Data : Privacy and Data Mining

FUTURE RESEARCH DIRECTIONS Personalized Privacy Preserving

Developing practical personalized anonymization methods. Introducing Personalize Privacy into other type of PPDP/PPDM.

Data Customization A concept was introduced for data mining called “Reverse Data Management “ (RDM)

which it is similar to Inverse data mining. RDM covers a lot of Data problems: Inversion mapping, provenance, data generation, view update, constraint-based repair, etc.

(We may consider RDM to be a family of data customization methods)

Provenance for Data Mining New techniques and mechanisms that can support Provenance in Data Mining context

should receive more attention.

Page 55: Information Security in Big Data : Privacy and Data Mining

CONCLUSION

Page 56: Information Security in Big Data : Privacy and Data Mining

CONCLUSION Each user role has its own privacy concerns and approaches to Preserve-

Privacy with maintain the data utility.

Page 57: Information Security in Big Data : Privacy and Data Mining

REFERENCES …

Page 58: Information Security in Big Data : Privacy and Data Mining

REFERENCES Lei Xu, Chunxiao Jiang, Jian Wang, Jain Yuan and

Young Ren, Information security in Big Data: Privacy and Data Mining, Access, IEEE, 2014

Page 59: Information Security in Big Data : Privacy and Data Mining

QUESTIONS…

Thank you


Recommended