Date post: | 09-Jan-2017 |
Category: |
Technology |
Upload: | tugdual-grall |
View: | 471 times |
Download: | 0 times |
© 2015 MapR Technologies ‹#›
Big Data Journey with Hadoop & MapR
Tug Grall [email protected] @tgrall
© 2015 MapR Technologies ‹#›
Big Data Journey
Tug Grall [email protected] @tgrall
Tug Grall [email protected] @tgrall
David Pilato [email protected] @dadoonet
Import RDBMS data
sqoop import --connect jdbc:mysql://db.foo.com/somedb --table \ customers --target-dir /incremental_dataset --append
FilesHBaseHive
Import RDBMS datainput { jdbc { jdbc_connection_string => "jdbc:postgresql://localhost:5432/mydb" jdbc_user => "postgres" jdbc_driver_library => "/path/to/postgresql-9.4-1201.jdbc41.jar" jdbc_driver_class => "org.postgresql.Driver" statement => "SELECT * from contacts" }}
How to store your data?
• Files in a distributed file system • Rows in NoSQL Table • Index in Search Engine
Data Processing
• Transform the data • Enrich the data
• Examples: • Store data in multiple formats • Aggregate data • Build Recommendations • ….
MapReduce Processing Model
• Define mappers • Shuffling is automatic • Define reducers • For complex work, chain jobs together
– Use a higher level language or DSL that does this for you
Apache Spark: Fast Big Data
– Rich APIs in Java, Scala, Python
– Interactive shell
• Fast to Run – General execution
graphs – In-memory storage
Spark: Unified Platform
Spark SQL Spark Streaming (Streaming)
MLlib (Machine learning)
Spark (General execution engine)
GraphX (Graph computation)
Mesos
Distributed File System (HDFS, MapR-FS, S3, …)
Hadoop YARN
Machine Learning
MapR Cluster
HBaseMapR DB
MapR-FS
Add recommendations to movies
Capture RatingsMovies & Recommendations
Movie Database
Conclusion
• If possible use Streams: Kafka, Logstash
• Advanced Data Processing and Machine Learning : Spark
• Expose your data using SQL for your “BI folks” : Drill
• Aggregation and Full Text Search : Elasticsearch
• Data Visualisation : Kibana
© 2015 MapR Technologies ‹#›
Big Data Journey
Tug Grall [email protected] @tgrall
Tug Grall [email protected] @tgrall
David Pilato [email protected] @dadoonet