Post on 14-Jan-2017
transcript
FAST DETECTION OF TRANSFORMED DATA LEAKS
Submitted By,JOSNA KRISHNA
S7 CSEROLL No.:35
CONTENTS: INTRODUCTION SENSITIVE DATAS IN COMPANIES DATA LEAKAGE-------HOW??? DANGER… TOWARDS SECURITY EXISTING SYSTEM PROPOSED SYSTEM INTO THE ALGORITHM CONCLUSION
INTRODUCTION
DATA LEAKAGE: Data leakage is the unauthorized transmission of sensitive data or information from within an organization to an external destination .
SENSITIVE DATAS OF COMPANIES INCLUDES………
•Intellectual Properties•Financial Information•Patient Information•Personal Credit Card Data,•& Other Information Depending Upon the Business and the industry.
DATA LEAKAGE-----------HOW???•In the course of business, data must be handed over to trusted 3rd Parties for some operations.
•Sometimes these trusted 3rd Parties may act as points ofData leakage.
•Data Leakage mainly happens due to Human Errors.
EXAMPLES ARE……
•A hospital may give patient records to researcher who will devise new treatment.•Company may have partnership with other companies that require sharing of customer data.•An enterprise may outsource it’s data processing, so data must be given to various othercompanies.
DANGER….•Number of leaked sensitive data records has grown 10 times in recent years.•Data leakage by accidents exceeds the risk posed by vulnerable software.•Sensitive data leakage is more in cases where there is no End-to-End encryption (example: PGP-Pretty Good Privacy)
TOWARDS SECURITY……•Prevent clear text sensitive Data from Direct Access.•Deploy a Screening Tool:
-To scan computer file systems.-To scan server storage.-Inspect outbound network traffic.
•Data leak detection differs from AntiVirus and Network Intrusion Detection System (AV&NIDS).
DATA LEAK DETECTION HAS.. :
->New security requirements &
->Algorithmic Challenges.
Algorithmic Challenges:-Data Transformation-Scalability
•Direct usage of Automata-based string matching is not possible.
EXISTING SYSTEM :It is based on Set Intersection.Operation performed on 2 sets of n-grams.One from content and one from sensitive data.This method is used to detect similar documents on:
• The web.• Shared malicious traffic
pattern.• Malware.• E-mail spam.
EXISTING SYSTEMS ARE :
Symantec DLP Identity Finder Global Velocity GoCloud DLP etc.
DISADVANTAGES OF EXISTING SYSTEM :
Set Intersection is order less.(Ordering of shared n-grams is not analyzed)
Generates false alerts.(When n is set to small value)Cannot detect the partial data leakage.It is not an adequate method.
PROPOSED SYSTEM:This one is holding sequential alignment algorithm.Executed on :• Sampled sensitive data sequence.• Sampled content being inspected.
Alignment produces the amount of sensitive data in a content.More accuracy is achieved.
FACING THE CHALLENGES :
Scalability issue is solved by sampling
both the Sensitive Data & Content
Sequence before aligning.
A pair of algorithms is used:
• Comparable Sampling Algorithm
• Sampling Oblivious Alignment
Algorithm
High detection specificity.Pervasive & localized modifications.
ABOUT THE ALGORITHM :
o The Comparable Sampling Algorithm yields constant samples of a sequence wherever the sampling starts and ends
o The Sampling Oblivious Alignment Algorithm infers the similarity between the original unsampled sequence with sophisticated techniques through dynamic programming.
CONTINUATION : In this method, both sensitive data
& content sequence are sampled. The alignment is performed on
sampled sequences Here, a ‘Comparable Sampling’
property is used. Both the algorithms performs more
faster on a GPU than a CPU. Promises high speed security
scanning.
INTO THE ALGORITHMS
COMPARABLE SAMPLING ALGORITHM
Requirements:
Definition 1: A substring is a consecutive segment of the original string.
Definition 2: A subsequence does not require its items to be consecutive in the original string.
Definition 3: Given string x is substring of y ,comparable sampling on x and y yields x’ and y’. x’ is similar to a substring of y’.Definition 4: Given x as a substring of y, a subsequence preserving sampling on x and y yield two subsequences x’ and y’ ,so that x’ is substring of y’.
ADVANTAGES : It is deterministic and
subsequence preserving.
This algorithm is unbiased.
It yields a constant samples of a sequence wherever the sampling starts and ends.
COMPARABLE SAMPLING ALGORITHM : Input: an array S of items, a size |w| for a sliding
window w, a selection function f (w, N) that selects N smallest
items from a window w, i.e., f = min(w, N) Output: a sampled array T 1: initialize T as an empty array of size |S| 2: w ←read(S, |w|) 3: let w.head and w.tail be indices in S
corresponding to the higher-indexed end and lower-indexed end of w,
respectively 4: collection mc ← min(w, N) 5: while w is within the boundary of S do
6: mp ←mc 7: move w toward high index by 1 8: mc ← min(w, N) 9: if mc = mp then 10: item en ← collectionDiff (mc,mp) 11: item eo ← collectionDiff (mp,mc) 12: if en < eo then 13: write value en to T at w.head’s position 14: else 15: write value eo to T at w.tail’s position 16: end if 17: end if 18: end while
ALGORITHM ANALYSIS :We set our sampling procedure with a sliding
windowof size 6 (i.e., |w| = 6) and N= 3. The input sequence is 1,5,1,9,8,5,3,2,4,8. The initial
windoww= [1,5,1,9,8,5] and collection mc =
sliding{1,1,5}.
COMPLEXITY :
The complexity of selection function is O(n log|w|) or O(n),where n is the size of input, |w| is the size of the window.
The factor O(log|w|) comes from maintaining the smallest N items within the window.
SAMPLING OBLIVIOUS ALIGNMENT ALGORITHM
Requirements:
The algorithm runs on compact sampled sequences L .
Extra fields for scoring matrix cells in dynamic
programming.
Extra step in recurrence relation for updating the null
region.
Complex weight function computes similarities
between two null region.
SECURITY ADVANTAGES :
Order –aware comparison
High Tolerance to pattern variation
Capability of detecting partial leaks
Consistent
SAMPLING OBLIVIOUS ALIGNMENT ALGORITHM Input: A weight function fw, visited
cells in H matrix that areadjacent to H(i, j ): H(i −1, j −1), H(i, j
−1), and H(i −1, j ),and the i -th and j -th items Lai,Lbjin two sampled sequences Laand Lb, respectively.
CONCLUSION :
•Presented here is a content inspection technique for sensitive data leakage.•Detection approach is based on aligning 2 samples for similarity comparison.•Our alignment method is useful for common data scenarios.