Protegrity Presentation for University of Advanced Technology
Hadoop MeetupLes McMonagle (CISSP, CISA, ITIL)
Chief Security Strategist
February 3, 2016
1
Introductions
Today’s Threat Landscape
Data Protection Methods
Hadoop Security
Protegrity Hadoop Data Security Platform
Q & A
Agenda
2
Today’s Threat
Landscape
Anatomy of a Breach
4
http://eval.symantec.com/mktginfo/enterprise/white_papers/b-anatomy_of_a_data_breach_WP_20049424-1.en-us.pdf
Data types that are potential candidates for protection
5
SSN Credit Card PAN
Best Practices
Bank Account Numbers
Employee Personnel Records
R&D
Pending Patents Health
Records Accounts Payable
Production Planning
Sales Forecasts
Order HistoryTrade Secrets
Payroll Data
Prescriptions
Accounts Receivable
Customer Lists
Customer Contact
Information
Home Addresses
DOB
Income Data
Health History
Passwords
PIN
Salary Data
Location Data
Project Plans
Protect sensitive data while maintaining the freedom to use it
New Focus on Data-Centric Protection
Our Philosophy
7
The only way to secure sensitive
data is to protect the data itself
4472-8302-9115-3562
381 - 58 - 6294
Multiple ways to selectively protect data at rest
• Encryption
Data is converted to binary Ciphertext using mathematical algorithm. Can be one-way
(Hash) or reversible (Symmetric or Asymmetric).
• Tokenization
Real data is replaced with randomly generated characters of same data type.
• Masking
Stored data unchanged. Masked on presentation only (in Views or Web Pages).
• Obfuscation
Data is converted to irreversible values stored as same data type (copy Prod to Dev).
• De-Identification or “Anonymization”
Enough data fields are “protected” to sufficiently de-identify or anonymize records
• Format Preserving Encryption (FPE)
Some benefits of both encryption & tokenization at a significant performance cost
Data-Centric Protection
8
!@#$%a^///&*B()..,,,gft_+!@4#$2%p^&*
XXX - XX - 6294
John W. SmithMark S. Wilson
John W Smith Owner Detroit MI 248-632-1292Xxx X Xxxxxx Owner Detroit MI 248-999-9999
Which is most appropriate is dependent on multiple factors
To Encrypt or Tokenize . . . This is the Question
Large - Field Size relative to width of lookup table - Small
More - Structured - Less
Increasing
Data
Sensitivity
Less - Percent of Access Requiring Clear Text - More
More - Logic in portions of the data element - Less
Tokenization Encryption
SSN
CC-PAN
Bank Acct No.
PIN, CID, CV2Password
DOBCustomer ID #
X-Ray
Cat Scan
HIV-Pos*
Diagnosis
report
Healthcare Records
Patient ID #
* With Initialization Vector (IV)
Very
Large
Adding an Additional Layer of Security
Encrypt
Tokenize
Comprehensive Data Protection
11
At Rest
In Transit
In Use
Ideally with a single, centralized enterprise solution
In-Database, Column-
Level Encryption or
Tokenization are typically
only applied to 1% to 3%
of the data in Hadoop or
a Data Warehouse
80% to 90% of analytics
can be performed on data in
its “protected” form
About Protegrity - Data Security for the Enterprise
12
• Fortune 1000 customers
• Global Presence across every industry
• Trusted Partner for Security Innovation
Scalable, data-centric encryption, tokenization and masking to help businesses secure sensitive information while maintaining data usability
We help companies protect sensitive data and
maintain the freedom to use it to transform and
innovate as market leaders.
What to look for . . .
13
A single solution that works across all core platforms
Scalable, centralized enterprise class solution
Segregation of duties between DBA and Security Admin
Data layer / Data-Centric solution
Tamper-proof audit trail
Transparent (as possible) to authorized end-users
High availability (HA)
Optional in-database vs ex-database encryption/tokenization
Granularity of Protecting Sensitive Data
Coarse Grained
Protection
(File/Volume)
Fine Grained
Protection
(Data/Field)
• At the file level: File or Volume
• Encryption
• “All or nothing” approach
• Does NOT secure file contents in use
• Secures data at rest and in transit
• At the individual field level
• Fine Grained Protection Methods:
• Vaultless Tokenization
• Masking
• Encryption (Strong, Format Preserving)
• Data is protected wherever it goes
• Business intelligence can be retained
Audit
Log
Central Management – Policy Deployment
15
Application
Protector
File
Protector
Database
Protector
Big Data
Protector
Cloud Gateway
Inline Gateway
Protection
Servers
EDW
Protector
IBM Mainframe
Protectors
Enterprise
Security
Administrator
PolicyPolicyPolicyPolicyPolicyPolicyPolicyPolicyPolicy
Protegrity Confidential
Security Office /
Security Team
File Protector
Gateway
Audit
Log
Audit
Log
Audit
Log
Audit
Log
Audit
Log
Audit
Log
Audit
LogAudit
Log
Audit
Log
Audit
Log
Central Management – Audit Log Collection
16
Application
Protector
File
Protector
Database
Protector
Big Data
Protector
Cloud Gateway
Inline Gateway
Protection
Servers
EDW
Protector
IBM Mainframe
Protectors
Enterprise
Security
Administrator
Protegrity Confidential
Security Office /
Security Team
File Protector
Gateway
Complete coverage for the enterprise
Relationships with major providers of data processing solutions
Committed to blanket protection in complex, heterogeneous environments
Deeply invested in required certifications
Complete coverage – applications, databases, files servers, mainframes,
EDW, Big Data, Cloud
Big Data
ProtectorFor Hadoop Distributions
Protegrity’s Big Data Protector for Hadoop
19
Hive
MapReduce
YARN
HDFS
OS File System
Pig OtherName
Node
Data
Node
Data
Node
Data
Node
Edge
Node
Edge
Node
Data
Node
Edge
Node
Data
Node
Edge
Node
Edge
Node
Edge
Node
Edge
Node
Data
Node
Data
Node
Data
Node
Edge
Node
Hadoop Cluster Hadoop Node
Policy
Audit
Protegrity Big Data Protector for Hadoop delivers protection at every node
and is delivered with our own cluster management capability.
All nodes are managed by the Enterprise Security Administrator that delivers
policy and accepts audit logs
Protegrity Data Security Policy contains information about how data is de-
identified and who is authorized to have access to that data.
Policy is enforced at different levels of protection in Hadoop.
HDFS
YARN
MapReduce 1
Pig
Hive
Beeline
Beeswax
Impala
Hcatalog
Hue
Cascading
Ranger
Knox
Ambari
Zookeeper
Oozie
Cloudera Agent
Manager
HttpFS
Talend
Java
Scala
Hbase
Phoenix
Accumulo
Lily Hbase Indexer
Mahout
Tez
Slider
Storm
Solr
Spark
HiveServer2
Falcon
WebHDFS
NFS
Flume
Flume NG
Sqoop
Sqoop2
Sentry
Whirr
Kafka
Animals in the Zoo that have been requested
• Status• Many have no relation to data centric data protection
• For many others data centric data protection is completely transparent
• Most relevant “animals” are supported in various different ways
The End
21
Q & A
What Our Customers Say About Us
“Protegrity tokenization has yielded world-class results. Our day-to-day use of
Protegrity solution is seamless, and we appreciate the reliable architecture and the
knowledge that Protegrity software ‘just works’ in the background.”
• Darrell Jones, CISO, Herbalife International
“Since 2006, we have trusted Protegrity’s solution to make sure that private
financial and personal information is secured across our enterprise, internally and
externally, and to keep us in compliance with regulations and policies. Protegrity’s
solution is not only effective but it also helps us meet our business needs quickly.”
• Mark Butler, Principal Information Security Analyst,
Pearson Education
“We had strict requirements for our PCI DSS compliance project, including a
limited timetable and minimal modifications. Protegrity Vaultless Tokenization
provided an elegant solution to easily meet them all.”
• Richard Atkinson, Chief Information Officer, JustGiving
“We are consistently looking to bring the highest level of security to our customers.
Protegrity's tokenization technology is a key component of our end-to-end solution
that helps merchants protect sensitive data throughout the transaction lifecycle —
while it is in use, in transit and at rest.“
• Robert McMillon, Vice President of Global Security
Solutions, Elavon