+ All Categories
Home > Software > Austin Data Meetup 092014 - Spark

Austin Data Meetup 092014 - Spark

Date post: 27-Jun-2015
Category:
Upload: steve-blackmon
View: 271 times
Download: 0 times
Share this document with a friend
Description:
Summary of recent developments in Apache Spark
Popular Tags:
39
Spark
Transcript
Page 1: Austin Data Meetup 092014 - Spark

Spark

Page 2: Austin Data Meetup 092014 - Spark

Spark - Summit - News - Basics - Advanced - Subprojects - Use Cases - Resources

Page 3: Austin Data Meetup 092014 - Spark

Summit - 1,164 participants from over 453 companies attended - Spark Training sold out at 300 participants - 31 organizations sponsored the event - 12 keynotes and 52 community presentations were given

Page 4: Austin Data Meetup 092014 - Spark

News - Project - Databricks

Page 5: Austin Data Meetup 092014 - Spark

Project - 1.0.0 release - Graduated incubator - Very active community

Page 6: Austin Data Meetup 092014 - Spark

Very active community - Top three Apache projects - Most active Big Data project - > 50 companies - > 250 contributors - > 175,000 LOC

Page 7: Austin Data Meetup 092014 - Spark

Databricks - Certification - Cloud

Page 8: Austin Data Meetup 092014 - Spark

Certification - Every certified app will run on every certified distribution - Distribution Partners - App Partners

Page 9: Austin Data Meetup 092014 - Spark

Distribution Partners - Cloudera - MapR - Hortonworks - Pivotal - IBM - Amazon Web Services - SAP

Page 10: Austin Data Meetup 092014 - Spark

App Partners - Alteryx - Datastax - 0xdata - Typesafe - Zoomdata

Page 11: Austin Data Meetup 092014 - Spark

Cloud - Vision: Make Big Data Easy! - Product: Badass - Hosted Platform - Cluster Management - Interactive Workspace

Page 12: Austin Data Meetup 092014 - Spark

Interactive Workspace - Notebooks - Dashboards - Jobs

Page 13: Austin Data Meetup 092014 - Spark

Dashboards - WYSIWYG Builder - Interactive plots - One-click publishing

Page 14: Austin Data Meetup 092014 - Spark

Spark Basics - Execution - RDDs - Caching - Broadcast - Languages

Page 15: Austin Data Meetup 092014 - Spark

Execution - Apply Functional Operators across Distributed Collections - Master / Worker - Lazy - Parallelize with Threads first

Page 16: Austin Data Meetup 092014 - Spark

RDDs - Interface for dataset - Backed by anything - Any InputFormat class - HDFS default

Page 17: Austin Data Meetup 092014 - Spark

Caching - Store intermediate results in memory - Partition-locality - Significant speed-up for iterative algorithms

Page 18: Austin Data Meetup 092014 - Spark

Broadcast - Send immutable object to all workers - Similar to DistributedCache in mapreduce

Page 19: Austin Data Meetup 092014 - Spark

Languages - Scala - Python - Java 7 - Java 8 - R - Clojure

Page 20: Austin Data Meetup 092014 - Spark

Advanced - Partitioning - Persistence Options - Checkpointing - Accumulators - Optimizations

Page 21: Austin Data Meetup 092014 - Spark

Subprojects - SparkSQL - Tachyon - Spark Streaming - MLLib - GraphX - BlinkDB - Spark Job Server

Page 22: Austin Data Meetup 092014 - Spark

SparkSQL - Replaces Shark - Core - Catalyst - Libraries

Page 23: Austin Data Meetup 092014 - Spark

Core - SchemaRDDs - Query Execution - Caching

Page 24: Austin Data Meetup 092014 - Spark

Catalyst - Relational algebra - Expressions / UDFs - Query Planning - Optimizer

Page 25: Austin Data Meetup 092014 - Spark

Libraries - POJOs - JDBC - JSON - Parquet - Hive

Page 26: Austin Data Meetup 092014 - Spark

Hive - Catalog info from Metastore - Helps connect UI like Microstrategy / Tableau - Wrappers for UDF, UDAFs, UDTFs - Supports TRANSFORM - Supports SerDes

Page 27: Austin Data Meetup 092014 - Spark

Tachyon - In Memory (Off-Heap) Distributed Datastore - Change URI from hdfs:// to tachyon:// - Share datasets between jobs without HDFS - Helps scaling by off-loading allocation responsibility and GC pauses from executor processes

Page 28: Austin Data Meetup 092014 - Spark

Spark Streaming - Real-time streams - Micro-batching - Windowed Computations - Lambda Architecture

Page 29: Austin Data Meetup 092014 - Spark

MLLib - Summary statistics - Regression - Classification - Clustering - Collaborative Filtering - Optimization - Dimensional Reduction

Page 30: Austin Data Meetup 092014 - Spark

GraphX - Graph, VertexRDD, EdgeRDD objects and operations - Pregel API - mapReduceTriplets List<V,E,V> - Graph analytics libraries

Page 31: Austin Data Meetup 092014 - Spark

Graph analytics libraries - ConnectedComponents - PageRank - TriangleCount - ShortestPaths - SVDPlusPlus

Page 32: Austin Data Meetup 092014 - Spark

BlinkDB - Get estimated results - Time bound - Error bound

Page 33: Austin Data Meetup 092014 - Spark

Spark Job Server - Runs multiple jobs / contexts in same process - Allows for RDD Caching / Sharing between jobs - Job Persistence

Page 34: Austin Data Meetup 092014 - Spark

Use Cases - Spotify - Real-time Auctions - ShareThrough - Real-time Recommendations - Graphflow - Cancer Genomics - AMPLab - Malware Detection - F-Secure - Media Distribution Analytics - NBC Universal - Personal Fitness - Jawbone - Neuroscience - HHMI

Page 35: Austin Data Meetup 092014 - Spark

Resources - Code - Event - Technology - Videos

Page 36: Austin Data Meetup 092014 - Spark

Code - https://github.com/apache/spark

Page 37: Austin Data Meetup 092014 - Spark

Event - spark-summit.org - http://arjon.es/2014/06/30/spark-summit-2014-day-1/ - https://www.crowdchat.net/chat/c3BvdF9vYmpfODc=. - https://nathanbrixius.wordpress.com/2014/07/02/spark-summit-keynote-notes/ - http://thomaswdinsmore.com/2014/07/03/spark-summit-2014-roundup/

Page 38: Austin Data Meetup 092014 - Spark

Technology - Learning Spark (O'Reilly eBook) - www.spark-stack.org - ampcamp.berkeley.edu - https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/

Page 39: Austin Data Meetup 092014 - Spark

YouTube - AmpLab https://www.youtube.com/channel/UCWudC4d9i-2yxR5tuen-Nuw - Databricks https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA - Apache Spark https://www.youtube.com/channel/UCRzsq7k4-kT-h3TDUBQ82-w


Recommended