CRISC
CGEIT
CISM
CISA2013 Fall Conference – “Sail to Success”
Auditing Big Data for
Privacy, Security and Compliance
Davi Ottenheimer @daviottenheimer
Senior Director of Trust, EMC
In-Depth Seminars – D21
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Introduction
• Davi Ottenheimer (@daviottenheimer)
– 19th Year InfoSec
– CISM, Platinum ISACA (1997)
– Ex-Big 5 Auditor, Ex-PCI QSA/PA-QSA
– Co-Author “Securing the Virtual Environment”
• @EMCTrustedIT
2
PivotalPivotal
VMwareVMware
EMCEMC
RS
AR
SA
Big
DataCloud
Trust
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Agenda
• Big Data
• Operations
• Auditing
3
10/3/2013 4
CRISC
CGEIT
CISM
CISA2013 Fall Conference – “Sail to Success”
BIG DATA
4
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
The Big Data Dilemma
“We have massive amounts of data. We know
who you are. We know what your history has
been on the airline. We can customize our
offerings.”
- Delta CEO
5
http://m.apnews.com/ap/db_289563/contentdetail.htm?contentguid=InpMBxrL
“Airlines have yet to find the right balance
between being helpful and being creepy.”
- Associated Press
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Big Data Definitions• Many Long-Standing Examples
– Astronomy (“Billions and billions”)
– Meteorology (Storms)
– Geology (Quakes)
– Anatomy (Disease)
– Economics (Fraud)
– Espionage (Echelon SIGINT, PRISM…)
• Three V’s (Volume, Variety, Velocity)
6
http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Structured + Unstructured Data = Big
7
Internet of Things
Telemetry, Location-Based, etc.
Non-Enterprise
Structured in
Relational Databases
Managed, Unmanaged
& Unstructured
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Global Flight Analysis
8
http://www.spatialanalysis.ca/2011/global-connectivity-mapping-out-flight-routes/
http://www.computerweekly.com/news/2240176248/GE-uses-big-data-to-power-machine-services-business
� 60,000 Total Routes
� 1 Tb/day Data Each Gas Turbine Engine
� 400K gal/yr Saved by AA Paperless Pilot (-35lb)
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
What is So Pinterest-ing?
• Lists
– Users you follow
– Boards (and related users) you follow
– Your followers
– People who follow your boards
– Boards you follow
– Boards you unfollowed after following a user
• Followers and unfollowers of each board
9
http://blog.gopivotal.com/case-studies-2/using-redis-at-pinterest-for-billions-of-relationships, http://engineering.pinterest.com/post/55272557617/building-a-follower-model-from-scratch
Stored and Ready on Login for 70m Users
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
What Makes Data “Ready”?
10
Where’s the Line for Service and Surveillance?
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Should Users Trust?
11
Credit Score
Contacts
Behavior
Preferences
Credit Score
Contacts
Behavior
Preferences
Location / MovementLocation / Movement
IdentifierIdentifier
Connection Relationship Affiliation
Connection Relationship Affiliation
Picture / Video
Picture / Video
Browsing History
Browsing History
FinancesFinances
https://www.unboundid.com/blog/2013/09/05/the-value-of-identity-data-and-identity-etiquette/
Cu
sto
me
r C
on
cern
(2) What will you do with it?(1
) Is
it
safe
?
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Should Users Trust?
12
http://money.cnn.com/2013/09/05/pf/acxiom-consumer-data/index.html
(1)
Is i
t sa
fe?
“…a 26-year-old mother of two teenagers…is just
about biologically impossible”
“…a 26-year-old mother of two teenagers…is just
about biologically impossible”
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Finding Value in Data
13
“…we know the estimated numbers of people being
served by each waste water treatment plant, we can
back-calculate daily [drug] loads…”- Dr Kasprzyk-Hordern
Chicago and suburbs = 1.5b
gal/day wastewater
• Disease
• Drugs
• Environmental Risk
http://gizmodo.com/5844925/chicagos-stickney-wastewater-treatment-plant-is-the-crappiest-place-on-earth, http://planetearth.nerc.ac.uk/news/story.aspx?id=1185&cookieConsent=A
http://www.treehugger.com/natural-sciences/fish-near-water-treatment-plants-are-harmed-by-human-drugs.html
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Finding Errors in Data
14
http://pewinternet.org/Reports/2013/Anonymity-online.aspx, http://www.connecture.com/the-connecture-difference/
of Internet users have
taken steps online to
remove or mask their
digital footprints
10/3/2013 15
CRISC
CGEIT
CISM
CISA2013 Fall Conference – “Sail to Success”
OPERATIONS
15
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Transformation of IT
16
Mobile
Virtual
Cloud
Big
Data
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
New Wave of Big Data Technology
17
Business
Objectives
InsightsAnalytics
SciPy
Mahout
MATLAB
Revolution R
SPSS
AMPL
SAS
Machine Learning
Behavior Analysis
Sentiment Analysis
Predictive Models
Network Analysis
Visualization
Simulation
Data
HadoopHadoop
Vertica
MapReduce
Esper
kdb
Pivotal
Netezza
TeradataECL
ETL
Hive
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
(Over?) Emphasis on Performance
• Nodes Distributed
• Data Shared
• Access Controls Open
• Networks Open
• Clients Unauthenticated
• Web Services Open
18
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Hadoop Machine Roles
19
1. Clients
2. Masters
3. Slaves1. Storage Management (Name Node -> HDFS)2. Compute Management (Job Tracker -> MapReduce)
1. Storage (Data Node) 2. Compute (Task Tracker)
1. Load Data to Cluster2. Submit MapReduce Jobs3. View Results
?
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Hadoop Machine Roles
20
Clients
Job Tracker Name NodeSecondary Name Node
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Slaves
Masters
HDFSMapReduce
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Hadoop Cluster
21
switch switch switch switch
name node job tracker 2ndry name node client
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
switch
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Data Node & Task Tracker
Rack 1 Rack 2 Rack 3 Rack 4 Rack n
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Typical Cluster Environment
• Petabytes
• 10,000s Slots Per Cluster
• Shared or Dedicated
• Production and Non-Production
• Mixed Software and Uses
22
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Google MapReduce: 2004
23
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
MapReduce Today
24
Output Files
Task/Data
Reduce HDFS
BlockTask/Data
Reduce HDFS
Block
Task/Data Task/Data Task/Data
Map Map Map HDFS
Block
HDFS
Block
HDFS
Block
Output Files
SplitsSplitsSplitsSplitsSplitsSplits
SplitsSplitsSplits
Task
Job TrackerJSON
RPC Read
Data
NameNode
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
MapReduce Today
25
Node0 Node1
Local File
System
HDFS
Name and
Data Node
output
tmp
input
MapReduce
Job and
Task Tracker reduce
map
dir owner perm
name hdfs 700
data hdfs 700
dir owner perm
HDFS hdfs 775
MAPRED mapred 775
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Job Perimeter
26
ProductionAd-Hoc
• User Accounts
• Ticket at Login
• Tickets Expire
• Batch Accounts
• Tickets from Keytab
• Tickets AutoRenew
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Domain Perimeter
27
Name Node
Data Node & Task Tracker
Data Node & Task Tracker
ADAD
Hadoop AuthenticationIT Auth
LDAP
SSH
hdfs/nn@company hdfs/nn@hadoop
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Domain Perimeters
28
Job Tracker
Name Node
Data Node & Task Tracker
Data Node & Task Tracker
ADAD
Bastion
Hadoop AuthenticationIT Auth
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Authentication Perimeters?
29
NNWebHDFS
ClientSNN
Service UsersUsers
HDFS Client
MR Client JT
DN / TTDN / TT
DN / TT
HTTPFS
Map Task
Reduce Task
usernames: hdfs, httpfs, mapredend user
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Authentication Perimeters?
30
ServicesClients
service user
end user
Clients
Hadoop
Zookeeper
Hue
Oozie
Flume
Impala
Hbase
Hive
Metastore
Hbase
Oozie
www
Flume
Impala
Zookeeper
WebHdfs
Pig
Crunch
Cascading
Sqoop
Hve
Map
Red
RPC
HTTP
HTTPHTTP
Avro
RPC
ThriftRPC
RPC
Thrift
RPC
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 201331
Moment of Reflection
10/3/2013 32
CRISC
CGEIT
CISM
CISA2013 Fall Conference – “Sail to Success”
AUDITING
32
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Big Data Security Today
33
Authentication Authorization Encryption
Policy: “Cluster accessible only to trusted personnel”
Caveats– All or Nothing: Nodes without security cannot communicate with secure nodes
– Rolling upgrades to enable cluster security are impossible
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Authentication
• End User to Service
• Service to Service
• Service to Service, for a User
• Job Task to Service, for a User
34
“Big Data Security Configuration is a PITA. Do only
what you really need.” -- 2013 Hadoop Summit
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Authorization
• Data
HDFS, Hbase, Hive Metastore, Zookeeper
• Jobs
Hadoop, MR, Pig, Oozie, Hue…
• Queries
Impala, Drill
35
https://hadoop.apache.org/docs/stable/Secure_Impersonation.html
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Encryption
• Stored Data (Emerging)
– Data Nodes Only
– Keys Stored Separate from Data Node Storage
– Keys for “Secure Process” Not Users
• Transmitted Data
– RPC Supports Only SASL
– HTTPS
36
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Typical “Security” Setup
Access Controls
1. Authentication (Kerberos)
2. Role Based Authorization (ACLs)
3. Identity Management (AD or LDAP)
37
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Strong Authentication?
38
user process
hdfs namenode, datanode, secondary namenode
mapred jobtracker, tasktracker, child tasks
group users
hadoop hdfs, mapred
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
Older versions ran all daemons as single user (hadoop)
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Trying to Get to Compliance
1. Node Access (Authentication)
2. Node API Authentication
3. Store Data (Disk Encryption)
4. Transmit Data (Net Encryption)
5. RBAC (Job ACLs)
6. Logs
Ea
se
MitM?
root?NIC control?
sudo -u hdfs hadoop fs -rmr /
39
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Threat Model Thoughts
• Confidentiality
– Regulated Data
– Intellectual Property
• Integrity
– Analysis
– Output
• Availability
– Data Load
– Jobs, Analysis
40
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Future
41
Business
Objectives
InsightsAnalyticsData
Confidentiality
Integrity
Availability
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Regulated Data Example
• Confidentiality
– Stored
– Transmitted
• Integrity
– Backup
– Restore
• Availability
42
Cost Calculations
• Error
• Load Time
• Drop Time
• Re-Load Time
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Regulated Data Example: Sqoop2
43
Bulk Data Transfer Tool – Import/Export
https://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop
“Sqoop2 Does Not
Support Security”
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Future DirectionsBuilding Controls for Risks
Data
Acquire
StoreQuery
� MapReduce
� Pig (simple query language)
� Hive (SQL queries)
� Cascading (workflow)
� Mahout (machine learning)
� Hama (scientific compute)
� Drill (ad-hoc query)
� Hbase (column-orient DB)
� Zookeeper (coordination)
� HDFS
44
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Acquire
• Data Validation
• Aggregation
– DOB
– Age Group (18-24, 25-36, etc.)
• Salt (Noise)
• Imitation (Synthetic)
• Replacement (Swapping)
• Wiping (Suppression)
Data Acquire
45
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Store
• Encryption
• Unique Users (RBAC)
• API Authentication
• “Hardened Clusters”
• Logs
• Forensics
Data Store
46
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Query
• Encryption
• Unique Users (RBAC)
• Load Restrictions
• Add, Remove, Change Jobs
• Job Alerts
• Scheduling
• Secure Code Practices / Review
Data Query
47
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
CSA “Securing Big Data”
SQL Security – Old NoSQL Security – New
Config Management Cell-Level Access Labels
Multi-Factor Authentication Keberos-Based Authentication
Data Classification ACLs
Data Encryption
Consolidated Audit/Report
Database Firewall
Vulnerability-Scanner
48
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Example: Accumulo
• NoSQL (HBase) perf…with security
• Timeline
– 2006 – Google BigTable
– 2008 – NSA
– 2011 – accumulo.apache.org
Cell-Level Access – Distributed Key/Value Store
// specify which visibilities we are allowed to seeAuthorizations auths = new Authorizations("public"); Scanner scan =
conn.createScanner("table", auths); scan.setRange(new Range("alexander","snowden")); scan.fetchFamily("attributes"); for(Entry<Key,Value> entry : scan) {
String row = entry.getKey().getRow(); Value value = entry.getValue();
}
https://accumulo.apache.org/1.5/accumulo_user_manual.html
2013 Fall Conference – “Sail to Success”
September 30 – October 2, 2013
Conclusions
• Big Data
– Performance Pressure
– Customer Concern / Backlash
• Operations Risks
– Perimeter Holes
– Business Logic Error
– Weak Policies
• Auditing
– Trusted Personnel
– Perimeters
50
CRISC
CGEIT
CISM
CISA2013 Fall Conference – “Sail to Success”
Auditing Big Data for
Privacy, Security and Compliance
Davi Ottenheimer @daviottenheimer
Senior Director of Trust, EMC
In-Depth Seminars – D21
THANK YOU!