Date post: | 19-Jan-2017 |
Category: |
Data & Analytics |
Upload: | yana-valasatava |
View: | 43 times |
Download: | 1 times |
Apache Spark is a fast and general engine for large-‐scale data processing • In-‐memory processing Successor of Hadoop (MapReduce) • File-‐based processing
hDp://spark.apache.org/
Apache Spark works in parallel on • Mul)core laptop, desktop • Single server • Cluster (need cluster manager)
RDD<String> RDD<String> PairRDD<String,Integer> PairRDD<String,Integer>
Map-‐Reduce Example
one to many one to one