+ All Categories
Home > Engineering > Using Spark at Vungle

Using Spark at Vungle

Date post: 06-Aug-2015
Category:
Upload: alicia-strait
View: 257 times
Download: 0 times
Share this document with a friend
Popular Tags:
57
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion 1
Transcript
Page 1: Using Spark at Vungle

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

1

Page 2: Using Spark at Vungle

2

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

● Introduction

● Old Architecture

● New Architecture

● Decoupling

● Streaming

● Conclusion

Page 3: Using Spark at Vungle

3

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

● Legacy Java Process○ “Crunches” data○ Sends data downstream to our own datastores and to 3rd party

analytics○ Runs every hour

● Growth○ Process can run over an hour○ 12 GB -> 24GB heap in less than 1 year○ Cron is a horrible job management system○ A failure requires rerunning a job from the beginning

● 2.0○ Horizontably scalable○ Real Time ETL○ Reuesable

Page 4: Using Spark at Vungle

4

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

ETL @ Vungle

● ~1 Billion Events / Day

● Deduplication

● Calculating $$$

● Outputting data to various destinations

Page 5: Using Spark at Vungle

5

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Old Architecture

Page 6: Using Spark at Vungle

6

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 7: Using Spark at Vungle

7

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 8: Using Spark at Vungle

8

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 9: Using Spark at Vungle

9

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 10: Using Spark at Vungle

10

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 11: Using Spark at Vungle

11

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 12: Using Spark at Vungle

12

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 13: Using Spark at Vungle

13

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 14: Using Spark at Vungle

14

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

New Architecture

Page 15: Using Spark at Vungle

15

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 16: Using Spark at Vungle

16

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 17: Using Spark at Vungle

17

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 18: Using Spark at Vungle

18

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 19: Using Spark at Vungle

19

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 20: Using Spark at Vungle

20

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 21: Using Spark at Vungle

21

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 22: Using Spark at Vungle

22

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Decoupling

Page 23: Using Spark at Vungle

23

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 24: Using Spark at Vungle

24

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 25: Using Spark at Vungle

25

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 26: Using Spark at Vungle

26

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 27: Using Spark at Vungle

27

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 28: Using Spark at Vungle

28

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 29: Using Spark at Vungle

29

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 30: Using Spark at Vungle

30

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 31: Using Spark at Vungle

31

Introduction Problem Decoupling Streaming Conclusion

Setup connection and spark streams

Map each line of log into Mongo Objects and insert into mongo

Page 32: Using Spark at Vungle

32

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Setup connection and spark streams

Page 33: Using Spark at Vungle

33

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Mapping to Mongo objects and insertions

Page 34: Using Spark at Vungle

34

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Questions

Page 35: Using Spark at Vungle

35

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Streaming

Page 36: Using Spark at Vungle

36

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 37: Using Spark at Vungle

37

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 38: Using Spark at Vungle

38

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 39: Using Spark at Vungle

39

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Ingestion

Page 40: Using Spark at Vungle

40

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Event ID Request View Install ... Request Added

View Added

Install Added

Value

Ingestion Table Schema

Page 41: Using Spark at Vungle

41

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

... Date Time Deliveries Views Installs Processed Deliveries

Processed Views

Processed Installs

Fact Table Schema

Page 42: Using Spark at Vungle

42

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Ingestion

Page 43: Using Spark at Vungle

43

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 44: Using Spark at Vungle

44

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 45: Using Spark at Vungle

45

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 46: Using Spark at Vungle

46

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 47: Using Spark at Vungle

47

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 48: Using Spark at Vungle

48

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 49: Using Spark at Vungle

49

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Process

Page 50: Using Spark at Vungle

50

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 51: Using Spark at Vungle

51

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 52: Using Spark at Vungle

52

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 53: Using Spark at Vungle

53

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 54: Using Spark at Vungle

54

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Page 55: Using Spark at Vungle

55

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Next Steps

● Switching from JSON to ProtoBuf

● Using YARN to run multiple jobs on one cluster

● Data Science

● Who knows?

Page 56: Using Spark at Vungle

56

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Questions

Page 57: Using Spark at Vungle

Thank you!

57


Recommended