Date post: | 10-May-2015 |
Category: |
Technology |
Upload: | cloudera-inc |
View: | 1,933 times |
Download: | 3 times |
1
Cloudera Search Embracing Apache Solr into Cloudera’s Pla9orm for Big Data Eva Andreasson, Sr. Product Manager, Cloudera Steven Noels, Co-‐founder and SVP of Products, NGDATA
Who is Cloudera?
2
What the Enterprise Requires
§ Only 100% open source Hadoop-‐based pla<orm with both batch and real-‐@me processing engines, enterprise-‐ready with na@ve high availability
§ Suite of system and data management soEware
§ Comprehensive support and consul@ng services
§ Broadest Hadoop training and cer@fica@on programs
Extensive Partner Ecosystem
§ Over 600 partners across hardware, soEware and services
The Leader in Big Data
Management
§ Deliver a revolu@onary data management pla<orm powered by Apache Hadoop
§ World’s leading commercial vendor of Apache Hadoop
§ Enable organiza@ons to improve opera@onal efficiency and Ask Bigger Ques@ons of all their data
Customers & Users Across Industries
§ More produc@on deployments than all other vendors combined
INGEST STORE EXPLORE PROCESS ANALYZE SERVE
CDH CLOUDERA MANAGER
CLOUDERA SUPPORT
Cloudera Enterprise
3
BRINGS STORAGE & COMPUTE TOGETHER
WORKS WITH EVERY TYPE OF DATA
CHANGES THE ECONOMICS OF DATA
MANGAGEMENT
A revolu@onary solu@on powered by Apache Hadoop
CLOUDERA NAVIGATOR
“ About NGDATA
NGDATA is the next genera@on Customer Intelligence company that enables ac@onable customer insights, personalized product offers and in@mate customer experience with a unique combina@on of interac@ve Big Data management and machine learning technologies in one integrated solu@on.
Business Expertise
Enterprise Architectures
Big Data Technology
Machine Learning,
Algorithms, Analytics
Customer Intelligence
VISION & EXPERTISE SOLUTION
Customer Database
Enterprise Data
Reference Data
Customer Data
Customer Engagement
Governance and Risk
Management
Insights, Trends and Analysis
lily
A Next GeneraVon Customer Intelligence Company
Agenda
§ Why Search? § What is Cloudera Search? § Using Cloudera Search § Learn more
6
Why Search?
Cloudera’s Enterprise Strategy
An Integrated Part of the Hadoop System
One pool of data
One security framework
One set of system resources
One management interface
Search Simplifies Interac@on
Explore
Navigate
Correlate Experts know MapReduce. Savvy people know SQL.
Everyone knows Search.
Benefits of Search
Improved Big Data ROI • An interac@ve experience without technical knowledge • Single data set for mul@ple compu@ng frameworks
9
Faster Vme to insight • Exploratory analysis, esp. unstructured data • Broad range of indexing op@ons to accommodate needs
Cost efficiency • Single scalable pla<orm; no incremental investment • No need for separate systems, storage
Solid foundaVons and reliability • Solr in produc@on environments for years • Hadoop-‐powered reliability and scalability
10
What is Cloudera Search?
Cloudera Search
InteracVve search for Hadoop • Full-‐text and faceted naviga@on • Batch, near real-‐@me, and on-‐demand indexing
11
Apache Solr integrated with CDH • Established, mature search with vibrant community • Separate run@me like MapReduce, Impala • Incorporated as part of the Hadoop ecosystem
Open Source • 100% Apache, 100% Solr • Standard Solr APIs
Scalable and Robust Index Storage
HDFS
Lucene
Extrac@on Mapping
Solr
Zookeeper
SolrCloud
Querying API Indexing API
12
Solr and HDFS • Scalable, cost-‐efficient index storage
• Higher availability • Search and process data in one pla<orm
Near Real Time Indexing at Ingest
Log File Solr and Flume • Data ingest at scale • Flexible extrac@on and mapping
• Indexing at data ingest • Document-‐level ACL
HDFS
Flume Agent
Indexer
Other Log File
Flume Agent
Indexer
13
Streamlined Extrac@on and Mapping
Cloudera Morphlines • Simple and flexible data transforma@on
• Reusable across mul@ple index workloads
• Over @me, extend and re-‐use across pla<orm workloads
syslog Flume Agent
Solr sink
Command: readLine
Command: grok
Command: loadSolr
Solr
Event
Record
Record
Record
Document
Scalable Batch Indexing
Index shard
Files
Index shard
Indexer
Files
Solr server
Indexer
Solr server
15
HDFS
Solr and MapReduce • Flexible, scalable batch indexing
• Start serving new indices with no down@me
• On-‐demand indexing, cost-‐efficient re-‐indexing
Scalable Batch Indexing
16
Mapper: Parse input into
indexable document
Mapper: Parse input into
indexable document
Mapper: Parse input into
indexable document
Index shard 1
Index shard 2
Arbitrary reducing steps of indexing and merging
End-‐Reducer (shard 1): Index document
End-‐Reducer (shard 2): Index document
Searchable Real-‐Time Data Indexing HBase
HDFS
HBase
interac@ve load
Indexer(s)
Triggers on
updates Solr server
Solr server Solr server Solr server Solr server
Search
+ = planet-‐sized tabular data immediate access & updates fast & flexible informaVon discovery
B IG DATA DATAMANAGEMENT
Searchable Real-‐Time Data HBase & Search
HBase SEP Triggers & Indexer
• HBase replica@on mechanism for reliable indexing
• light-‐weight, zero impact on write performance
• easy to set up & integrate • flexible, configura@on-‐based mapping & content extrac@on
Many use cases
• indexes near-‐real-‐@me HBase updates into Solr
• fielded search on HBase columns
• faceted search • query by example • datacube
• secondary indexes
Simple, Customizable Search Interface
Hue • Simple UI • Navigated, faceted drill down
• Customizable display • Full text search, standard Solr API and query language
Simplified Management
Cloudera Manager • Install, configure, deploy Solr services on the cluster
• Unified management and monitoring
• Resource management
21
Using Cloudera Search
Skybox
• Advanced parallel image processing on images stored in HDFS
• Before: difficult to interac@vely evaluate image quality and correlate with satellite logs
• Now: Index images and satellite logs at acquisi@on and on demand, interac@vely introspect image quality
Scalable, efficient image search for analysis and process improvement
Explorys Medical
"Hadoop has been Explorys' center of gravity for data management since the company's incep@on. The addi@on of Search to Cloudera's pla<orm expands its usability by suppor@ng more workloads and reducing data movement between infrastructure systems. Deploying Cloudera Search supports Explorys' mission to help healthcare providers deliver beker, more cost efficient care through fast, flexible data analysis."
-‐-‐ Michael Onders, SVP & CTO, Explorys
Event, exploraVon, and data correlaVon to meet SLAs
Pakerns and Predic@ons
• Iden@fy pakerns in social media and perform analy@cs on term usage to improve suicide predic@ve capability
• Before: Social media data sets too large; tradi@onal enterprise search
• Now: Near real-‐@me correla@on of medical records, notes, social media; access for doctors and non-‐tech staff
ProacVve healthcare for returning military veterans
Ques@ons
• Ask on the Q&A tab
• Recording will be available at cloudera.com
• A^er webinar, inquire at:
[email protected] • Presenters contact info:
[email protected] [email protected]
Thank you for a,ending!
25
Download Cloudera Search cloudera.com/downloads
Learn more about Cloudera Search, powered by Solr
cloudera.com/search
Learn more about NGDATA and Lily
www.ngdata.com