+ All Categories
Home > Documents > Federal Big Data and Cognitive Metadata - Goodier

Federal Big Data and Cognitive Metadata - Goodier

Date post: 12-Feb-2016
Category:
Upload: zeroun
View: 51 times
Download: 0 times
Share this document with a friend
Description:
Federal Big Data and Cognitive Metadata - Goodier. Agenda. F ederal big data is enhanced by cognitive metadata Clearly understanding the paradigm shift Review of Security and Privacy implications for the federal government Cyber Threat Cognitive Metadata solution. - PowerPoint PPT Presentation
Popular Tags:
43
Federal Big Data and Cognitive Metadata - Goodier
Transcript
Page 1: Federal Big Data  and Cognitive  Metadata - Goodier

Federal Big Data andCognitive Metadata

- Goodier

Page 2: Federal Big Data  and Cognitive  Metadata - Goodier
Page 3: Federal Big Data  and Cognitive  Metadata - Goodier

AgendaFederal big data is enhanced by cognitive metadata1. Clearly

understanding the paradigm shift

2. Review of Security and Privacy implications for the federal government

3. Cyber Threat4. Cognitive Metadata

solution3

Page 4: Federal Big Data  and Cognitive  Metadata - Goodier

The Internet was built without a way to know who or what you were connecting to

– Federal internet service providers workaround this with a patchwork of identity security controls and NIAP certifications

– No fair blaming the user – no framework, no cues, no control

1. Balancing the Cyber Big Data equation

Page 5: Federal Big Data  and Cognitive  Metadata - Goodier

2. Safeguarding and Sharing Information

5

• “One of the biggest questions is how to evolve the risk management model. What is secure enough and agile enough to support the mission?Security, agility, and transparency decisions are driven by mission priorities.”

– Major Linus J. Barloon II, Chief, J3 Cyber Operations Division at

White House Communications Agency

http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r4.pdf

Page 6: Federal Big Data  and Cognitive  Metadata - Goodier

2. Safeguarding and Sharing Information

6

“ For example, the United States Government Accountability Office (GAO) aggregates data from many agencies.

Recognizing the inherent risks, GAO sets up discrete network enclaves that are distinct from their agency-wide network, for Big Data. It assigns appropriate levels of security to each enclave driven by the sensitivity of the data therein.

• Other agencies note they ensure Big Data is stripped of personally identifiable information (PII) before it leaves the originating agency’s control.

•Data aggregation needs will expand as more elements of the critical infrastructure adopt increased cyber protection and detection capabilities that will drive enhanced data/ information sharing.”

• - www.meritalk.com

• Beacon Report

• Balancing the Cyber Big Data Equation

Page 7: Federal Big Data  and Cognitive  Metadata - Goodier

Federal agencies are required by law (e.g., the Privacy Act of 1974) to give notice to individuals, when collecting information from them, of the authority, purpose, and uses of PII when such data will be maintained as agency records that will be retrieved by individual name or other identifier.1

When agencies use a Web site to collect or share data, agencies must post a privacy policy, as required by Section 208 of the E-Gov Act and OMB guidance.2

In all cases, privacy notices must be prominent, salient, clearly labeled, written in plain language, and available at all locations where notice is needed.

2. Synopsis of Security and Privacy for Federal Big Data

Page 8: Federal Big Data  and Cognitive  Metadata - Goodier

Over time, agencies, digital developers, and data users may also create, discover, or propose new and innovative ways to combine, share, or otherwise leverage the power of the digital data and content collected or disseminated by their digital services or programs. If data will be re-combined, used or shared in ways that individuals did not originally contemplate or expect, agencies must consider the need, under applicable law or policy, to provide such individuals with additional or updated notice of their privacy rights and choices.4

In determining precisely when, where, and how to give such notice, agencies, their digital developers, and partners will need to exercise creativity and ingenuity to ensure that required notices are clearly communicated to individuals at the right time and place, and in the right manner, without unduly interfering with the user experience. The timing and format of such notices may need to vary, depending on the digital or mobile platform involved. 5

2. Review: Federal Big Data is different from industry

Page 9: Federal Big Data  and Cognitive  Metadata - Goodier

https://it.ojp.gov/default.aspx?area=privacy&page=1295

Page 10: Federal Big Data  and Cognitive  Metadata - Goodier

Page 10

2. Federal Big Data today

http://www.google.com/intx/en/enterprise/apps/government/products.html?section=drive

https://explore.data.gov/

http://catalog.data.gov/harvest

Privacy advocates are concerned about the threat to privacy represented by increasing storage and integration of personally identifiable information; expert panels have released various policy recommendations to conform practice to expectations of privacy.[99][100][101]

Cognitive metadata are sets of innovative privacy-enhancing technologies which enable new techniques for data analytics that minimize costs to privacy.

Page 11: Federal Big Data  and Cognitive  Metadata - Goodier

Page 11

2. FED RAMP certified commercial clouds

Data/Compute Storage/Metadata Utility/Networking Content Delivery

Shared physical resourcesPhysical infrastructure

Software-platform-as-a-

service

App-components-as-a-service

Virtual-Infrastructure-as-a-Service

Data IntensiveAmazon Hadoop, Public Data Sets, Simple DB

GoogleApp Engine

GCDS Akamai

GOV CLOUD certified government clouds

Page 12: Federal Big Data  and Cognitive  Metadata - Goodier

2. Before clouds swallowed the enterprise, Gov met requirements with defined EA

structuresPattern - Use Case Focus EA Notional Pattern SchematicInternal• Fine grained access control to

data• Auditing, etc.

Participant• User to service interaction• Service to service interaction

Sub-Enterprise• Share security & infrastructure• Operations• Certification and Accreditation

inheritance

Super-Enterprise• Enterprise alignment of sub-

enterprises• Federation

Mission Service

Data

Data?

MissionService

MissionService

Data

MissionServiceMission

ServiceMissionService

Data

SecuritySub-

Ente

rpris

e

Dept/AgencyDept/Agency

Network Network

Enterprise alignment – trust, credentials

Sub

Ente

rpris

e Sub

Ente

rpris

e Sub

Ente

rpris

e

Federation Federation

Source:H Reed DoD Multi Service SOA team

Page 13: Federal Big Data  and Cognitive  Metadata - Goodier

2. EA Privacy & Security focused on message exchange – NIEM 3.0 – and dissemination labels

Super-Enterprise

Sub-Enterprise

Participant

Internal

Ope

ratio

nal

Man

agem

ent

Prog

ram

mati

c

Fede

ratio

n

According to the Multi-Service SOA community: -- Focus of DoD/IC Security is primarily at the “participant” and “operational” level. -- Implication is that most Service Oriented security discussion will be at this level.

GOVERNANCE

SHARED SECURITY

MESSAGE EXCHANGE

Our Typical securityfocus was here

SERVICE CODE

Unfortunately that leaves lots of gray area for data spills!

Page 14: Federal Big Data  and Cognitive  Metadata - Goodier

https://www.niem.gov/training/Pages/train.aspx

2. Example: NIEM and NISS Message ExchangesEach encounter describes an interaction with a person-of-interest (POI). A POI is one who possesses an identity that is associated with derogatory information residing in a system-of-record (SOR) containing watchlisted individuals. The Encounter specification is designed to convey encounter activity (e.g., who, what, when, where), any watchlist searches performed, and any encounter analysis results for Suspicious Activity Reports (SARs).

Testing PII incident responses at scale

Page 15: Federal Big Data  and Cognitive  Metadata - Goodier

3. PII Incident Federal Use Case at 4V Message scale – what is the worst that can happen?

Page 16: Federal Big Data  and Cognitive  Metadata - Goodier

3. As Federal Big Data apps expand, our data channels grow and our exposure to risk increases

http://www.verizonenterprise.com/DBIR/2013/

Page 17: Federal Big Data  and Cognitive  Metadata - Goodier

3. Federal PII Protections for April 15

Page 17

• http://www.cnbc.com/id/101496551

Identity thieves are stealing billions of dollars a year through fraudulent tax refunds—and the IRS isn't the only target. The 43 states that collect an income tax are also being flooded with these bogus returns.

Page 18: Federal Big Data  and Cognitive  Metadata - Goodier

3. Risk Exposure goes across Federal Lines of Business

Page 19: Federal Big Data  and Cognitive  Metadata - Goodier

3. Risk Exposure grows as our use of Federal Shared Services grows

19

Quicksilver2001

Cloud-First2010

E-Government Act2002

Clinger-Cohen1996

E-Gov InitiativesInitial 25

2003

Lines of BusinessInitial 5 (HR, GM, FM, FHA,CM)

2004

Lines of BusinessRound 2 (Geo, BFE, ITI, ISS)

2006

Payroll Consolidation Completes

2009

GAO Report: Opportunities to Reduce Potential Duplication

2011

E-Gov InitiativesRound 2 (DAIP, ITDS, IAD-Loans/Grants)2008

Shared Services

2011

Page 20: Federal Big Data  and Cognitive  Metadata - Goodier

4. Ensuring adherence to Security and Privacy regulations across identities shared in the federal clouds

• To

– retain MEANING (aka, contextual semantics)– in loosely coupled, highly flexible– multi-tenant environments

Page 21: Federal Big Data  and Cognitive  Metadata - Goodier

4. Solutions for the Federal Use Case from Industry

8118

Amazon Fire TV review: the set-top that tries to do everything ASAP Advanced Stream and Predictionhttp://www.engadget.com/2014/04/09/amazon-fire-tv-review/

Movies or tv shows are buffered for playback before users hit the play button, the company says; those choices are made by analyzing users’ watch lists and recommendations. As users’ viewing habits change, the caching prediction algorithm will adjust accordingly, and personalization capabilities should get better over time

http://www.ibmbigdatahub.com/blog/caveat-use-internet-things-behavioral-analytics

Page 22: Federal Big Data  and Cognitive  Metadata - Goodier

4. Solutions for the Federal Use Case from Research

8118

Cognitive metadata: Advanced Streaming and Prediction for improved regulatory and incentive performance

Caching prediction algorithms will adjust according to risk exposure, and personal information protection capabilities should get better over time

Page 23: Federal Big Data  and Cognitive  Metadata - Goodier

4. Metadata solutions shared across government at the new scale of IT

• Federal Risk and Authorization Management Program – FedRAMP

1. Align budget and acquisitions with the technology cycle;

2. improve program management;

3. streamline governance and increase accountability;

4. increase engagement with the IT community; and

5. adopt lighter technologies and shared solutions--including the adoption of a "cloud-first" policy.

– www.cio.gov

Page 24: Federal Big Data  and Cognitive  Metadata - Goodier

4. What is the Cognitive Metadata Solution

…cognitive metadata (i.e. metadata coming from our perception, reasoning, or intuition such as preference for a type of content), which is very useful for personalization purposes and conversely for limiting PII incidents.Personalities and personas

We protect the personal identifying information of people that link to us, and protect what they’re interested in, so we identify and encrypt the following:

What does this person care about?What are the types of things they’ll respond to?What’s the value-add our content offers them?What are their turn-ons and turn offs?

Initially this is a mostly qualitative process, since we're manually reviewing the data. It's not perfect science. but it does benefit from information sharing patterns that build the cognitive metadata repository to ultimately improve automated reasoning.

Page 25: Federal Big Data  and Cognitive  Metadata - Goodier

4. Cognitive Metadata tagging landscape

Page 26: Federal Big Data  and Cognitive  Metadata - Goodier

4. Federal Use Case and Cognitive Metadata

Page 26

• http://en.wikipedia.org/wiki/Sensitivity_and_specificityImagine a study evaluating a new test that screens people for a disease. Each person taking the test either has or does not have the disease. The test outcome can be positive (predicting that the person has the disease) or negative (predicting that the person does not have the disease). The test results for each subject may or may not match the subject's actual status. In that setting:

– True positive: Sick people correctly diagnosed as sick– False positive: Healthy people incorrectly identified as sick– True negative: Healthy people correctly identified as healthy– False negative: Sick people incorrectly identified as healthy

In general, Positive = identified and negative = rejected. Therefore:– True positive = correctly identified– False positive = incorrectly identified– True negative = correctly rejected– False negative = incorrectly rejectedCognitive metadata identifies PII in the context of this study so individuals involved can be protected

Page 27: Federal Big Data  and Cognitive  Metadata - Goodier

4. Federal Use Case and Machine Learning

Page 27

• http://en.wikipedia.org/wiki/AdaBoost• Problems in machine learning often suffer from the

curse of dimensionality — each sample may consist of a huge number of potential … and evaluating every feature can reduce not only the speed of classifier training and execution, but in fact reduce predictive power....

• Unlike neural networks and SVMs, the AdaBoost training process selects only those features known to improve the predictive power of the model, reducing dimensionality and potentially improving execution time as irrelevant features do not need to be computed.

Page 28: Federal Big Data  and Cognitive  Metadata - Goodier

4. Current State of Language Technology

Coreference resolution

Question answering (QA)

Part-of-speech (POS) tagging

Word sense disambiguation (WSD)

Paraphrase

Named entity recognition (NER)

ParsingSummarization

Information extraction (IE)

Machine translation (MT)Dialog

Sentiment analysis

mostly solved

making good progressstill really hard

Spam detection

Let’s go to Agra!

Buy V1AGRA …

✓✗

Colorless green ideas sleep furiously.

ADJ ADJ NOUN VERB ADV

Einstein met with UN officials in PrincetonPERSON ORG LOC

You’re invited to our dinner party, Friday May 27 at 8:30

PartyMay 27add

Best roast chicken in San Francisco!

The waiter ignored us for 20 minutes.

Carter told Mubarak he shouldn’t run again.

I need new batteries for my mouse.

The 13th Shanghai International Film Festival…

第 13届上海国际电影节开幕…

The Dow Jones is up

Housing prices rose

Economy is good

Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness?

I can see Alcatraz from the window!

XYZ acquired ABC yesterday

ABC has been taken over by XYZ

Where is Citizen Kane playing in SF?

Castro Theatre at 7:30. Do you want a ticket?

The S&P500 jumped

Big Data works well

Page 29: Federal Big Data  and Cognitive  Metadata - Goodier

Page 29

4. Cognitive metadata employs predictive algorithms from Big Data Machine Learning combined with Natural Language Processing

Cognitive metadata uses a three-step management process that translates Policy documents into formal policy rule sets that computers can understand and evaluate.

1. Policy documents are translated into digital policies, using Natural Language Processing technologies.

2. Policy deconfliction ensures consistency and operational desirability. Automated deconfliction, using Turing methods and Theorem Proving Techniques that work with the constructs defined in XML, delivers active models of the resulting policy via a Policy Based Tool GUI. DPM delivers this new user interface to data stewards and Foreign Disclosure Officiers (FDOs) giving them total control over both the design and the approval of the resulting model. Then the human-approved set of deconflicted digital policies are translated into standard QOS policy-labeled services.

3. Digital policies are defined in a computer interpretable language which is also friendly to humans.

Page 30: Federal Big Data  and Cognitive  Metadata - Goodier

Page 30

4. How cognitive metadata works• Regular expressions (regex) play a surprisingly

large role– Sophisticated sequences of regular expressions are often

the first model for any text processing text• For many hard tasks, we use machine learning

classifiers– But regular expressions are used as features in the

classifiers– Can be very useful in capturing generalizations

Page 31: Federal Big Data  and Cognitive  Metadata - Goodier

4. Cognitive Metadata is a result of data science

18/18

Substantive expertise

Math & Statistics

KnowledgeHacking SkillsMachine

Learning

Traditional Research

DataScience

DangerZone!

Convergence

Predictions that enhance machine learning fueled by knowledge at the Intersection of Our Digital Lives

Page 32: Federal Big Data  and Cognitive  Metadata - Goodier

Page 32

4. What are some applications of Cognitive Metadata

– Machine Learning– Question Answering: IBM’s Watson– Paraphrase– Summarization– Information Extraction– Sentiment Analysis– Machine Translation– Coreference resolution– Word Sense disambiguation– Parsing– SPAM detection– Part Of Speech parsing– Named entity recognition

Page 33: Federal Big Data  and Cognitive  Metadata - Goodier

Page 33

4. Cognitive Metadata provides automated reasonors for Federal PII policy adherence at scale

Attribute Service(AS)

Certificate Validation

Service (CVS)

CERT

Metadata Service

CognitiveMetadata

SmartData

Policy Decision Point (PDP)

RepositoryPolicy

Administration Point (PAP)

Context Handler

Policy Information Point (PIP)

Policy Decision Service (PDS)

9

4

3

11

12

13

14

13

11

15

7

Ozone Widget

Framework

1

2

1a

1b

1c

Audit

Service

1

0

1

7

1

7

5

IT SUPPORT TEAM

DataProducer

Access Request

~ X.pdf Secure Map ~ USER Team

Member

NPECert

6

SoftCert

SoftCert

8

NPECert

Access Request

~ X.pdf Secure Map ~

~ Reason ~ Location

not relevant to data

USER Team Member

1

6

15

Cloud GatewayPolicy

Enforcement Point (PEP)

Valid Access

Invalid No Access

Page 34: Federal Big Data  and Cognitive  Metadata - Goodier

34

Because the Federal government has No shortage of policy…• SCAP does NOT resolve security needs for SA when

we are OUTSIDE the NETWORK.

Page 35: Federal Big Data  and Cognitive  Metadata - Goodier

No shortage of governance…

Page 36: Federal Big Data  and Cognitive  Metadata - Goodier

No shortage of standards…

Page 37: Federal Big Data  and Cognitive  Metadata - Goodier

But people drive standards and policy. People do not move at Cyber speed.People need cognitive metadata and data to support decision-making.

Data-driven situational awareness augments governance.

Codifying federal big data decisions

37

But knowing this is still a challenge …

Page 38: Federal Big Data  and Cognitive  Metadata - Goodier

Using Cognitive metadata Rules engines

Page 39: Federal Big Data  and Cognitive  Metadata - Goodier

Org

aniz

e b

y M

issi

onWe divide and conquer the complexity of regulatory compliance by codifying big data relationships by mission, to maintain situational awareness of all known risk mitigations, and waivers.

You can apply data and metadata according to the mission’s specific risk profile and known standards and waivers.

MISSION Area Of Responsibility

Tier 1

Tier 2

Tier 3

Enterprise

Regional

Local 39

To perform Continuous Monitoring

Page 40: Federal Big Data  and Cognitive  Metadata - Goodier

Why Cognitive Metadata? • Cognitive metadata provides the answers

you need when– Sorting through millions of data items to

pinpoint key PII incidents that may be crucial.– By including sophisticated semantic analytics,

vastly reduces the time and budget that might otherwise be needed for a substantive analysis of the regulatory compliance for any set of records.

Page 41: Federal Big Data  and Cognitive  Metadata - Goodier

Cognitive metadata maps the right Context to the right Policy as an ASAP-style service

Major Categories of Content requiring Unique Identification

Intelligence Category Focus (Intel Users) Objects of Analysis Reporting Cycle Strategic or National Intelligence

Understanding of current and future status and behavior of foreign nations. Estimates of the state of global activities. Indications and warnings of threats.

(National policymakers)

Foreign policy

Political posture

National stability

Socioeconomics

Cultural ideologies

Science and technology

Foreign relationships

Military strength, intent

Infrequent (annual, monthly) long-duration estimates and projections (months, years)

Long-term analyses (months, years)

Frequent status reports (weekly, daily)

Military Operational Intelligence

Understanding of military powers, orders of battle, technology maturity, and future potential.

(Military commanders)

Orders of battle

Military doctrine

Science and technology

Command structure

Force strength

Force status, intent

Continually updated status databases (weekly)

Indications and warnings (hours and days)

Crisis analysis (daily, hourly)

Military Tactical Intelligence

Real-time understanding of military units, force structure, and active behavior (current and future) on the battlefield.

(Warfighters)

Military platforms

Military units

Force operations

Courses of action (past, current, potential future)

Weapon support (real-time: seconds to hours)

Situation awareness applications (minutes, hours, days)

Page 42: Federal Big Data  and Cognitive  Metadata - Goodier

Cognitive metadata helps support Computer Network Defense (CND) data.

Cognitive metadata supports executive orders EO 13587 for rapid response to Insider Threat.

Cognitive metadata helps support dynamic data for audit event management.

AVTRAVTRAVTRAVTR

Vulnerability WS IAVM WS

CND User & AgentCND User & Agent

IAVMIAVMNVDNVD

Service DiscoveryService Discovery

CND PortalCND Portal

Geocoding WSGeocoding WS

Web Mapping WSWeb Mapping WS

AssetAssetAsset

EventEvent

Vul.Vul.Vul.

IAVMIAVM

PROMPROMPROMPROM

Asset WS

SCAP Standards & CND Schemas Used

Cognitive metadata = PII protection as a service

Page 43: Federal Big Data  and Cognitive  Metadata - Goodier

Recommended