Date post: | 01-Jul-2015 |
Category: |
Presentations & Public Speaking |
Upload: | jitendra-chauhan |
View: | 156 times |
Download: | 0 times |
Big Data Security
Top 5 Security Risks and Best Practices
Jitendra Chauhan
Head R&D, iViZ Security
Agenda
• Key Insights of Big Data Architecture
• Top 5 Big Data Security Risks
• Top 5 Best Practices
Key Insights of Big Data
Architecture
Distributed Architecture(Hadoop as example)
Data Partition, Replication
and Distribution
Auto-tiering
Move the
Code
Real Time, Streaming and Continuous
Computation
No SQL Roadshow| 12
Integration Patterns
Real
timeVariety of
Input
Sources
Adhoc
Queries
Parallel & Powerful Programming
Framework
Example:
• 16TB Data
• 128 MB Chunks
• 82000 Maps
Java vs SQL / PLSQL
Frameworks:
• MapReduce
• Storm Topology
(Spouts & Bolts)
Big Data ArchitectureNo Single Silver Bullet
• Hadoop is already unsuitable for many Big
data problems
• Real-time analytics• Cloudscale, Storm
• Graph computation o Giraph and Pregel (Some examples graph
computation are Shortest Paths, Degree of
Separation etc.)
• Low latency queries
o Dremel
Top 5 Security Risks
Insecure Computation
Sensitive
Info
• Information Leak
• Data Corruption
• DoSHealth Data
Untrusted
Computation program
Input Validation and Filtering
• Input Validationo What kind of data is untrusted?
o What are the untrusted data sources?
• Data Filtering
o Filter Rogue or malicious data
• Challengeso GBs or TBs continuous data
o Signature based data filtering has limitations
How to filter Behavior aspect of data?
Granular Access Controls
• Designed for Performance, almost no
security in mind
• Security in Big Data still ongoing research
• Table, Row or Cell level access control gone
missing
• Adhoc Queries poses additional challenges
• Access Control is disabled by default
Insecure Data Storage
• Data at various nodes, Authentication,
Authorization & Encryption is challenging
• Autotiering moves cold data to lesser secure
medium o What if cold data is sensitive?
• Encryption of Real time data can have
performance impacts
• Secure communication among nodes,
middleware and end users are disabled by
default
Privacy Concerns in Data Mining
and Analytics
• Monetization of Big Data generally involves
Data Mining and Analytics
• Sharing of Results involve multiple
challengeso Invasion of Privacy
o Invasive Marketing
o Unintentional Disclosure of Information
• Exampleso AOL release of Anonymzed search logs, Users can
easily be identified
o Netflix faced a similar problem
Top 5 Best Practices
• Secure your Computation Code• Implement access control, code signing, dynamic
analysis of computational code
• Strategy to prevent data in case of untrusted code
• Implement Comprehensive Input Validation
and Filtering
• Implement validation and filtering of input data, from
internal or external sources
• Evaluate input validation filtering of your Big Data
solution
Top 5 Best Practices
• Implement Granular Access Control• Review Role and Privilege Matrix
• Review permission to execute Adhoc queries
• Enable Access Control
• Secure your Data Storage and Computation• Sensitive Data should be segregated
• Enable Data encryption for sensitive data
• Audit Administrative Access on Data Nodes
• API Security
Top 5 Best Practices
• Review and Implement Privacy Preserving
Data Mining and Analytics• Analytics data should not disclose sensitive
information
• Get the Big Data Audited
Big Data ArchitectureKey Insights
• Distributed Architecture & Auto Tiering
• Real Time, Streaming and Continuous
Computation
• Adhoc Queries
• Parallel and Powerful Computation
Language
• Move the Code, Not the data
• Non Relational Data
• Variety of Input Sources
Top 5 Security Risks
• Insecure Computation
• End Point Input Validation and
Filtering
• Granular Access Control
• Insecure Data Storage and
Communication
• Privacy Preserving Data Mining and
Analytics