Federal Big Data and DoD Ontologies 2014 - Goodier
Transcript
Federal Big Data andDoD Ontologies 2014
- Goodier
AgendaFederal big data and ontology in DoD – The current state of innovation
• How does DoD categorize big data vs. ontology research?
• How much research in big data vs. ontology is there in DoD?
• What is the status of DoD research in big data vs. ontology – open or private?
• Who are the major publishers of big data vs. ontology research in DoD?
• Augmented governance of federal big data is a semantic big data use case.
• Demo of automated compliance toolkit
3
Presenter
Presentation Notes
All photos and this presentation are free: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This presentation is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Recent DoD Research on Ontology versus Big DataOntology Subject Taxonomy Big Data Subject Taxonomymedicine and medical research (25) medicine and medical research (407)ontology (16) theses (289)computer programming and software(25)
computer programming and software(272)
algorithms (11) algorithms (270)cybernetics (12) cybernetics (257)foreign reports (12) government and political
science (206)decision making (12) military forces and
organizations (201)psychology (12) anatomy and physiology (194)information science (22) training (190)software engineering (11) logistics, military facilities and
supplies(183)
4210
Research
Ontology Big Data
123
Page 4
Presenter
Presentation Notes
Research limited to the DoD collections in the DoD Research and engineering portal containing over 4 million volumes. Recent is in the last 12 months – limited to Unclassified Public research
Recent DoD Research on Ontology versus Big DataOntology Subject Taxonomy Big Data Subject Taxonomymedicine and medical research (61) medicine and medical research (789)
ontology (27) theses (325)computer programming and software(34)
sbir reports (61) sbir reports (771)military intelligence (35) anatomy and physiology (355)information science (73) electrical and electronic
equipment (311)computer systems (24) pe65502f (302)
6880
Research
Ontology Big Data
250
Ontology research is private -over half -127 of 250 reports . Big data is a bit more public –with over 60% choosing public publication. SBIR is private release.
Page 5
Presenter
Presentation Notes
Research limited to the DoD collections in the DoD Research and engineering portal containing over 4 million volumes. Recent is in the last 12 months – expanded to Unclassified Public and private research
Publishers of DoD Ontology research include:• charles river analytics inc cambridge ma (11)• surviac (10)• csiac-bco (8)• aptima inc woburn ma (7)• intelligent automation inc rockville md (6)• army center for environmental health research fort detrick md (5)• intelligent software solutions colorado springs co (4)• modus operandi inc melbourne fl (4)• stottler henke associates inc san mateo ca (4)• army command and general staff college fort leavenworth ks (3)
Page 6
Presenter
Presentation Notes
Charles River Analytics – https://www.cra.com/ Modus Operandi - http://www.modusoperandi.com/ Institute for Defense Analysis - https://www.ida.org/ Securboration, Inc - http://www.securboration.com/ SURVIAC – http://iac.dtic.mil/ CSIAC-BCO - https://www.csiac.org/about/policies/terms-of-use Fort Detrick - http://usacehr.amedd.army.mil/ Fort Leavenworth - http://usacac.army.mil/cac2/cgsc/ Intelligent Software Solutions - http://www.issinc.com/ Aptima - http://www.aptima.com/ Intelligent automation - http://www.i-a-i.com/ Ubiquiti - http://www.ubiquiti.com/ Decisive Analytics Corporation - http://www.dac.us/ Florida Institute for Human and Machine Cognition - http://www.ihmc.us/ Carnegie Mellon University - http://www.cmu.edu/index.shtml Air Force Research Lab - http://www.wpafb.af.mil/AFRL/ New Mexico University - http://www.unm.edu/ Kristine Fallon Associates - http://kfa-inc.com/kfa12/ State University of New York at Buffalo National Center for Ontological Research - http://ontology.buffalo.edu/bcor/ Stottler Henke Associates - http://www.stottlerhenke.com/
Publishers of DoD Big Data research include:• surviac (403)• naval postgraduate school monterey ca (185)• library of congress washington dc congressional research service (112)• riac (77)• government accountability office washington dc (71)• army research lab aberdeen proving ground md weapons and materials research
directorate (45)• army command and general staff college fort leavenworth ks (44)• army war college carlisle barracks pa strategic studies institute (41)• carnegie-mellon univ pittsburgh pa software engineering inst (41)• air univ maxwell afb al air force research inst (24)
Page 7
DoD Big Data requires semantically AUGMENTED GOVERNANCE for dynamic event management.
• T
• to retain relevance– in loosely coupled– multi-tenant environments
Presenter
Presentation Notes
So ultimately we created augmented governance in order to support the requirements for providing event management that is relevant in all architectures including Cloud architectures. These architectures can be loosely coupled and chocked full of multiple tenants that come and go based on computationally negotiated contracts.
Why? It is needed to enable the BIG DATA scale of IT.
• 25 Point Plan - Nov. 19 2010 • Federal Risk and Authorization Management
Program – FedRAMP– focused on SPEED OF CHANGE by removing the
barriers that get in the way of consistent execution
– www.cio.gov
Presenter
Presentation Notes
An Office of Management and Budget official announced broad reform in government IT, Nov. 19 2010 at a Northern Virginia Technology Council breakfast in McLean, Va. The new “25 point” strategy aims to improve purchasing, increase workforce productivity, promote the adoption of cloud technologies, and trim weak agency programs and bloated inventory of data centers. The five structural changes to governmentwide IT strategy are reflected in the Federal Risk and Authorization Management Program – FedRAMP: Align budget and acquisitions with the technology cycle; improve program management; streamline governance and increase accountability; increase engagement with the IT community; and adopt lighter technologies and shared solutions--including the adoption of a "cloud-first" policy.
Cloud First Challenge – automated audit assertions for FedRAMP clouds using open, structured, industry accepted formats• The Standards Acceleration to Jumpstart Adoption of
Cloud Computing (SAJACC) project at the National Institute of Standards and Technology (NIST) generated concrete data about how different kinds of cloud system interfaces support portability, interoperability, and security.
• The SAJACC project facilitates Standards Development Organizations in their efforts to develop high-quality standards that address these important needs.
Presenter
Presentation Notes
For the Cloud First approach augmented governance is supporting the Standards Acceleration to Jumpstart Adoption of Cloud Computing (SAJACC) project at the National Institute of Standards and Technology (NIST). All use cases, test codes, and test results on the openly-accessible NIST Cloud Portal (www.nist.gov/itl/cloud), for use by any interested parties. CloudAudit/A6 URI Ontology - The Automated Audit, Assertion, Assessment, and Assurance API http://cloudaudit.org/ SCAP http://scap.nist.gov/ Excluded – private formats CloudTrust ISACA's Cloud Computing Management Audit Assurance Program
NIST –based open, structured, industry accepted formats
• All use cases, test codes, and test results are on the openly-accessible NIST Cloud Portal:
For the Cloud First approach augmented governance is supporting the Standards Acceleration to Jumpstart Adoption of Cloud Computing (SAJACC) project at the National Institute of Standards and Technology (NIST). All use cases, test codes, and test results on the openly-accessible NIST Cloud Portal (www.nist.gov/itl/cloud), for use by any interested parties. CloudAudit/A6 URI Ontology - The Automated Audit, Assertion, Assessment, and Assurance API http://cloudaudit.org/ SCAP http://scap.nist.gov/ Excluded – private formats CloudTrust ISACA's Cloud Computing Management Audit Assurance Program
• DoD’s big data semantics problem– Sorting through millions of daily logs and records
and communication exchanges to pinpoint key individuals or groups that may be crucial to a given investigation is ultimately driven by semantics.
– By including sophisticated semantic analytics, we vastly reduce the time and budget that might otherwise be needed for a substantive analysis of the regulatory compliance for any set of records.
Presenter
Presentation Notes
AUTOMATED COMPLIANCE TOOL (ACT) demoVersion 2
Automated Compliance Tool
ACT is an enhanced parser/data extraction design tool as shown in this demonstration.
It enables rapid decision making that supports legal, regulatory, and policy compliance using cognitive metadata.
14
policy violations
ACT database
state of the system
ACTUse cases
ACT represents violations and the
configurations that are causing them
Cognitive Metadata
Dynamic policies related to violations
applicable events
Augmented Governance
ACT Module
15
ACT database
ACT Module
16
Start/Stop the xml database
17
Presenter
Presentation Notes
eXist is an open source database management system entirely built on XML technology, also called a native XML database. Unlike most relational database management systems, eXist uses XQuery, which is a W3C Recommendation, to manipulate its data. eXist Benefits eXist allows software developers to persist XML data without writing extensive middleware. eXist follows and extends many W3C XML standards such as XQuery. eXist also supports REST interfaces for interfacing with AJAX-type web forms. Applications such as XForms may save their data by using just a few lines of code. The WebDAV interface to eXist allows users to "drag and drop" XML files directly into the eXist database. Because eXist automatically indexes documents using a keyword indexing system it is very easy to create high-performance document search systems with eXist. eXist Standards and Technologies eXist has support for the following standards and technologies XPath - XML Path language XQuery - XML Query language WebDAV - Web distributed authoring and versioning REST - Representational state transfer (URL encoding) SOAP - Simple Object Access Protocol XACML - XML Access Control Language XInclude - server-side include file processing (limited support) XML-RPC - a remote procedure call protocol XProc - a XML Pipeline processing language eXist History eXist was created in 2000 by Wolfgang Meier who still is the lead developer as of 2010. In September 2006, it reached version 1.0 and 1.1 (new numbering scheme). Current maintenance activities are on the 1.4.x versions and new developments are on the 1.5dev version that will be released as 1.6.0. eXist was nominated in 2006 as the best XML database of the year by InfoWorld. eXist is used heavily in the XRX web application architecture See also XML database XQuery - w3c XML query language XPath - w3c XML data selection language WebDAV - Web Distributed Authoring and Versioning SOAP - Simple Object Access Protocol CouchDB - a document-oriented database based on JSON External links eXist Database eXist Database Demoservers eXist wiki eXist Discussion List XML Query Test Suite
Start/Stop JBOSS middlewarerun Binding Factory
18
Presenter
Presentation Notes
JBoss Application Server (or JBoss AS) is a free software/open-source Java EE-based application server. An important distinction for this class of software is that it not only implements a server that runs on Java, but it actually implements the Java EE part of Java. Because it is Java-based, the JBoss application server operates cross-platform: usable on any operating system that supports Java. JBoss AS was developed by JBoss, now a division of Red Hat. Key features:[4] Eclipse-based Integrated Development Environment (IDE) is available using JBoss Developer Studio Supports Java EE and Web Services standards[5] Enterprise Java Beans (EJB) Java persistence using Hibernate Object request broker (ORB) using JacORB for interoperability with CORBA objects JBoss Seam framework, including Java annotations to enhance POJOs, and including JBoss jBPM JavaServer Faces (JSF), including RichFaces Web application services, including Apache Tomcat for JavaServer Pages (JSP) and Java Servlets Caching, clustering, and high availability, including JBoss Cache, and including JNDI, RMI, and EJB types Security services, including Java Authentication and Authorization Service (JAAS) and pluggable authentication modules (PAM) Web Services and interoperability,[5] including JAX-RPC, JAX-WS, many WS-* standards, and MTOM/XOP Integration and messaging services, including J2EE Connector Architecture (JCA), Java Database Connectivity (JDBC), and Java Message Service (JMS) Management and Service-Oriented Architecture (SOA) using Java Management Extensions (JMX) Additional administration and monitoring features are available using JBoss Operations Network Key components:[2] JBoss Application Server, the framework used to support the development and implementation of applications Hibernate, an object/relational mapping and persistence (ORM) framework JBoss Seam, a framework for building web applications JBoss Web Framework Kit for building Java applications: Google Web Toolkit RichFaces Spring Framework Apache Struts Lists of components,[6] features,[4] and standards supported[5] are available. [edit] Licensing and pricing JBoss itself is open source, but Red Hat charges to provide a support subscription for JBoss Enterprise Middleware. [edit] Related products These products are also part of the JBoss Enterprise Middleware portfolio of software.[3] JBoss Enterprise Web Platform (or JBoss EWP)�This is a lighter weight version of the JBoss Enterprise Application Platform. The key components are essentially the same as the full JBoss Enterprise Application Platform, but uses a slimmed down profile of the JBoss Application Server.[7]�Lists of components[8] and standards supported[9] are available. JBoss Enterprise Web Server (or JBoss EWS)�This is a platform for lightweight Java applications, but also handles large scale websites.[10] JBoss EWS may be deployed as a standard enterprise web server, a simple Java application server, or an enterprise open source application infrastructure.[11]�Key components: Apache Tomcat, including Java Servlet and JavaServer Pages Apache Web Server, including common modules and connectors for authentication, caching, proxying, filtering, and load balancing (mod_jk) Lists of components[12] and standards supported[13] are available. JBoss Cache (or JBC)�This is a cache for frequently accessed Java objects to improve application performance. The cache can be replicated and transactional. The cache can be replicated across one or more Java Virtual Machines (JVM) across a network. The cache can be transactional because a JTA compliant transaction manager can be configured and make any cache interaction transactional. The two types of JBoss Cache are Core and POJO, with the POJO library built on top the Core library.[14]
ACT database
applicable events
ACT Module
19
Administer ACT event database Establish character sets for multiple languages
20
Administer ACT event database Establish global variables for cognitive metadata
21
Administer ACT event database Establish schema privileges for cognitive metadata
22
Administer ACT event database Establish session variables for cognitive metadata
23
Administer ACT event database Establish constraints for cognitive metadata
24
Administer ACT event database Review cognitive metadata table catalog
25
ACT database
Cognitive Metadata
applicable events
ACT Module
26
Administer ACT event database Code smart query - cognitive metadata
27
Administer ACT event database Code smart query – set cognitive metadata timer
28
Administer ACT event database Establish cognitive metadata collection sites
29
Administer ACT event database Establish collection sites - backup
30
policy violations
ACT database
state of the system
ACTUse cases
ACT represents violations and the
configurations that are causing them
Cognitive Metadata
applicable events
ACT Module
31
Administer ACT event database Establish cognitive metadata collection sites
• Currently these are “test” event collection sites.– For each ACT Use Case the event collection site varies– For example,
• Web Data warning Use Case ACT will collect from “known” problem sites like CRYPTOME, etc.
• Public Cloud Data Spillage Prevention Use Case ACT collects from the Public Cloud Storage site
32
CRYPTOME example
33
NIST Public Clouds
34
ACT Use Cases are defined
Use Cases expand as Dynamic Policies and Violations change
35
ACT Use Cases – Uses effective reporting standards set by GAO