Date post: | 12-Jul-2015 |
Category: |
Software |
Upload: | jeykottalam |
View: | 2,402 times |
Download: | 7 times |
The BDAS Open Source Community
UC BERKELEY
Ion Stoica UC Berkeley and Databricks
Growing Beyond AMPLab As software matures and becomes successful, more and more contributors outside AMPLab New startups have anchored development » Databricks (Spark Stack) » Mesosphere (Mesos) » …
Enables AMPLab to focus more resources on future systems instead of software maintenance
Apache Spark
Velox Model Serving
Tachyon
SparkStreaming SparkSQL
BlinkDB
GraphX MLlib
MLBase SparkR
Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean
Apache Spark (core)
HDFS, S3, …Apache Mesos Yarn
Tachyon
Apache Spark Open Source: end of 2010 Apache Project: 2013 Over time has grown to include key libraries » SparkStreaming, SparkSQL, MLlib, GraphX
Becoming a platform for Big Data apps
Apache Spark Today M
apRe
duce
YARN H
DFS Stor
m
Spar
k
0200400600800
100012001400160018002000
Map
Redu
ce
YARN
HD
FS
Stor
m
Spar
k
0
50000
100000
150000
200000
250000
300000
350000
Commits Lines of Code ChangedActivity in past 6 months
2-3x more activity than: Hadoop, Storm, MongoDB, NumPy, D3, Julia, …
Meetups Around the World
Monthly Contributors
0
25
50
75
100
2011 2012 2013 2014
370+ contributors for last 12 months
Databricks founded
Spark Stack (2013)
Tachyon
SparkStreaming
BlinkDB
MLlib
MLBase
Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean
Apache Spark (core)
HDFS, S3, …Apache Mesos Yarn
Tachyon
Shark
MLlib
Last Year Developments
Tachyon
SparkStreaming
BlinkDB MLBase
Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean
Apache Spark (core)
HDFS, S3, …Apache Mesos Yarn
Tachyon
SharkSparkSQL GraphX MLlib
Tachyon
SparkR
TachyonUC BERKELEY
TachyonUC BERKELEY
…
UC BERKELEY
BlinkDB
Velox Model Serving
Wide Adoption All major Hadoop distributions include Spark
Beyond Hadoop
Wide Adoption All major Hadoop distributions include Spark
Beyond Hadoop
Databricks: spurred Spark’s enterprise growth
partners
partners
Apache Mesos
Velox Model Serving
Tachyon
SparkStreaming SparkSQL
BlinkDB
GraphX MLlib
MLBase SparkR
Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean
Apache Spark
HDFS, S3, …Apache Mesos Yarn
Tachyon
Apache Mesos Open Source: 2010 Apache Project: 2012 Used in production at Twitter for past 2.5 years » +10,000 machines » +500 engineers using it
Most development moved outside Berkeley starting with 2012
Monthly Contributors
65 contributors for last 12 months
Mesosphere founded
BDAS Stack
Velox Model Serving
Tachyon
SparkStreaming SparkSQL
BlinkDB
GraphX MLlib
MLBase SparkR
Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean
Apache Spark
HDFS, S3, …Apache Mesos Yarn
Tachyon
Release Growth
Tachyon 0.2: - 3 contributors
Feb ‘14 Oct‘13 Apr ‘13
Tachyon 0.3: - 15 contributors
Tachyon 0.4: - 30 contributors
16 July ‘14
Tachyon 0.5: - 46 contributors
Tachyon 0.1: -1 contributor
Dec ‘12
Fast Growing Community
Berkeley Contributors Non-Berkeley Contributors (20+ companies)
~80% contributors already outside AMPLab
Reaching Tipping Point
18
Research to Real-World Impact
Research Real-world Impact
Apache Spark (core)
MLlib Spark Streaming
Spark SQL
Apache Mesos
GraphX
Tachyon Velox
Succinct
ADAM
BlinkDB
AMPLab/Berkeley Non-Berkeley
com
mitt
ers
/ com
mits
Impact on AMPLab Created blue-print & ecosystem for other BDAS components to succeed » MLlib, GraphX, Tachyon, …
Enabled AMPLab to increase focus on new research projects » Velox, ADAM, Succinct, …