© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 1
Dave Henry SVP Enterprise Solutions, Pentaho
December 2013
Pentaho & MongoDB Partner to Solve Government Big Data Challenges
Bob Gourley Publisher, CTOvision.com
Will LaForest Director of Federal, MongoDB
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 2
Best Practices for Federal Big Data Projects
Big Data Management
Bob Gourley Publisher, CTOvision.com
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 3
A focus on a new discipline of “Big Data
Management”
Intro to top 5 “Best
Practices” of Federal
Data activities
Invitation to collaborate and refine
approaches
A perpetual draft - your
input is requested
Brief Purpose Research & Reports
Contribute your thoughts at
CTOvision.com
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 4
Big Data Government Newsletter - reader survey 2,600 readers 2% response rate, across Federal agencies
Review of openly published research by Wikibon, TDWI, IDC, Gartner, Forrester and of course our own CTOvision Review of best practices and use cases from the best vendors in
Enterprise Big Data Engagement of the community at events like Strata and Hadoop World
Update Sources
Planning Assumption The ability to collect, parse, analyze machine data in real time,
whether on premise or in the cloud, will continue to grow
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 5
Big Data Management
Agencies are thinking through the right changes to concepts and technologies Old approaches still important, but cannot solve emerging problems Big Data Management is an evolved discipline which builds on existing data
management approaches to leverage new concepts, technologies and best practices to optimize mission support
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 6
• Open Source Information: analysis and integration • Situational Awareness across disparate data sets • Two use cases: “Connect the Dots” and “Needle in Haystack” • Cyber Security: rapid real time analysis of all relevant data • Asset catalog across extensive/dynamic enterprises • Rapid return of geospatial data • Location based push of data • Real time return of relevant search • Real time suggestion of topics • Bioinformatics:
• Human Genome • Patient location, treatment, outcomes
• Law Enforcement: Predictive Policing • Data Hub: Unified storage, governance, security, functionality
Solutions That Require Big Data Management
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 7
Best Practices in Big Data Management
VISION Start with a mission-focused vision. This will vary by organization. Support to mission will drive everything else. Consider that analytics and Big Data go together.
STRATEGY Should prioritize and tackle challenges like: Changes to governance processes, right mix of skills for workforce, learning new technology, prioritizing which workload types will be handled by which part of the architecture.
KNOW
Know existing infrastructure and process with focus on: Understanding of legal/policy dynamics relevant to your agency, understanding of new capabilities available, current and required throughputs/capacities, types of workloads supported by each components in the architecture, available tech choices.
DESIGN Document and continuously improve. Architect to manage data in its original form. Include right mix of traditional and new in your design. Don’t assume any one platform will be a solution. Architect to insulate applications and users from a variety of disparate big data platforms.
EXECUTE Avoid custom coding wherever possible. Don’t let new Big Data Platforms become proprietary silos. ETL remains important. Ensure training for all based on job function. Don’t neglect your own training. Serve the analyst.
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 8
Continue your market surveys, stay aware of what new technologies can do for you.
Revisit your vision. As you do, ponder this: How can you leverage data to support your mission?
Continue to study use-cases and exchange best practices. Dialog with others in and out of your sector. Great lessons are coming from other industries.
Continue to engage with the broader community. Sign-up for our Government Big Data Weekly.
Share your lessons learned.
Next Steps
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 9
E-mail: [email protected] Blog: http://ctovision.com Twitter: http://www.twitter.com/bobgourley Facebook, LinkedIn, etc: See the blog
Provide Your Thoughts, Input, Questions
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 10
The Modern Operational Database for Government
Will LaForest Director of Federal, MongoDB
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 11
The Evolution of Databases
2010
RDBMS
NoSQL
OLAP/BI
Hadoop
2000
RDBMS
OLAP/BI
1990
RDBMS
Operational & Real-time
Datawarehouse
Online
Offline
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 12
Relational Database Challenges
Variety
• Unstructured data
• Semi-structured data
• Polymorphic data
Volume & Velocity
• Petabytes of data
• Trillions of records
• Millions of queries per second
Agile Development
• Iterative
• Short development cycles
• New workloads
New Architectures
• Horizontal scaling
• Commodity servers
• Cloud computing
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 13
MongoDB The Modern Operational Database
Document Oriented
Open-Source
General Purpose
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 14
Fully Featured
MongoDB {
first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}
}
Rich Queries • Find Paul’s cars • Find everybody in London with a car
built between 1970 and 1980
Geospatial • Find all of the car owners within 5km of Trafalgar Sq.
Text Search • Find all the cars described as having leather seats
Aggregation • Calculate the average value of Paul’s car collection
Native Indexes • Secondary • Compound • Geospatial
• Full Text • Hash • Covering
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 15
MongoDB and Enterprise IT Stack
EDW Hadoop
Man
agem
ent &
Mon
itorin
g Security &
Auditing
RDBMS
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Online Data Offline Data
Variety – Modern Data
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 17
Document Data Model
Relational MongoDB {
first_name: ‘Paul’,
surname: ‘Miller’
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 18
Dynamic Schema
MongoDB does not need any defined data schema. Every document could have different data
{name: “jeff”, eyes: “blue”, height: 72, boss: “ben”}
{name: “brendan”, aliases: [“el diablo”]}
{name: “ben”, hat: ”yes”}
{name: “matt”, pizza: “DiGiorno”, height: 74, boss: 555.555.1212}
{name: “will”, eyes: “blue”, birthplace: “NY”, aliases: [“bill”, “la ciacco”], gender: ”???”, boss: ”ben”}
Volume, Velocity, and New Architectures
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 20
Automatic Sharding
• Increase or decrease capacity as you go
• Automatic balancing
• Optimized for commodity servers and cloud infrastructure
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 21
High Availability
• Automated replication and failover
• 0 down time with hardware failure and upgrades
• Multi-data center support
• Improved operational simplicity (e.g., HW swaps)
• Data durability and consistency
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 22
MongoDB Performance*
Top 5 Marketing Firm
Government Agency
Top 5 Investment Bank
Data Key/value 10+ fields, arrays, nested documents
20+ fields, arrays, nested documents
Queries Key-based 1 – 100 docs/query 80/20 read/write
Compound queries Range queries MapReduce 20/80 read/write
Compound queries Range queries 50/50 read/write
Servers ~250 ~50 ~40
Ops/sec 1,200,000 500,000 30,000
* These figures are provided as examples. Your application governs your performance.
Replication Benefits
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 24
Operational and Analytical Workloads
• Application interacts with primaries
• Analytical workloads on secondaries
• Workloads are isolated from one another
• Working set appropriate for each application
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 25
Global Data Distribution
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 26
Read Global / Write Local
Primary:NYC
Secondary:NYC
Primary:LON
Primary:SYD
Secondary:LON
Secondary:NYC
Secondary:SYD
Secondary:LON
Secondary:SYD
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 27
Solving Big Data Challenges in the
Federal Government
Dave Diegtel Head of Federal Sales, Pentaho
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 28
• Company and Product Maturity: Pentaho has been around for over 9 years, with 1,000’s of paid customers, and 5.0 Version release. Pentaho is proven and less risky.
• Business Model and Subscription: Pentaho’s Subscription Model and Server-based pricing allows for lower upfront investment and risk compared to legacy BI vendors who traditionally cost an average of 4X for similar size deployments.
• Government Certifications: Pentaho has made significant investments in Government Certifications and Compliance such as 508 and Security.
• Open API’s and extensible architecture enable ease of integration and reduce potential for vendor lock-in.
• Existing Government Customers and Cleared Personnel
Why Pentaho for Federal Government
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 29
A Comprehensive Big Data Platform
Dave Henry Senior VP Enterprise Solutions, Pentaho
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 30
Pentaho 5.0 Architected for the Future Simplified analytics experience for all users
ANY Analytics • Reports • Dashboards • Visualizations • Discovery • Predictive
Analytics
ANY Environment • Data warehouses • Data marts • Stack vendors • Cloud • Embedded
Existing & New Data Infrastructure
& Processes
ANY Data • Relational • Operational • Big Data • Data sources not yet
anticipated…
Billing
Location
Social Media
Customer
Web
Network
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 31
The New Reality Simplified analysis for all users
Simplified Analytics
Experience
Enterprise Big Data
Integration
Blended Big Data
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 32
Pentaho & MongoDB Enable Key Use Cases Customer 360 and Device Data Analytics enable comprehensive insight
…
Pentaho Data Integration
Pentaho Data Integration
Mission Scope
Pentaho Analytics • Reporting • Dashboards • Visualization • Discovery
• MongoDB delivers Scalable, Low-Latency Enterprise Data Store
• Visual ETL development with Pentaho Data Integration (PDI)
• Reporting, Dashboards,
Visualization and Discovery with Pentaho Analytics
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 33
Enterprise Customer Data Store Powerful data integration for MongoDB
mongoDB cluster PDI ETL
Web Event Data
POS Data
Customer Master
$push to data arrays
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 34
Data Integration Exploits MongoDB’s native APIs and query language
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 35
Operational Reports Multi-page, highly formatted reports – real-time, scheduled or burst to email
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 36
Operational Dashboards Highly tailored, pixel-perfect dashboards on MongoDB
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 37
Analyzer Explore and visualize data
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 38
James Dixon Founder and CTO, Pentaho
As CTO at Pentaho, James Dixon is responsible for Pentaho's architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 39
• Pentaho is the best platform to connect, integrate, and analyze both traditional sources and MongoDB
• Pentaho embraces and extends the MongoDB environment with rich visualization and exploration of data
• Pentaho’s Subscription-based business model lowers upfront investments, enabling faster ROI
• Pentaho has dozens of Federal Government Customers and made significant investments in government certifications and cleared personnel
• Pentaho and MongoDB are established partners – Pentaho carefully engineers its products to use the latest MongoDB APIs to provide the best possible performance
Why Pentaho?
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 40
• Needs Assessment with Pentaho and MongoDB • Dave Diegtel - [email protected] • Will LaForrest - [email protected]
• Try Pentaho (30 Free Trial) -- pentaho.com/download • Learn More about Big Data and Government Solutions
• Pentaho • Big Data Website: pentahobigdata.com/ • Government Solutions: pentaho.com/solutions/government
• MongoDB: • Government Solutions: mongodb.com/industries/government • Big Data: Examples and Guidelines for the Enterprise Decision Maker
mongodb.com/lp/whitepaper/big-data-nosql • MongoDB Top 5 Considerations When Evaluating NoSQL Databases
mongodb.com/lp/whitepaper/nosql-considerations • Sign-up for the Big Data Government Newsletter at CTOvision.com &
take reader survey
Next Steps and Q&A
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 41
Thank You