+ All Categories
Home > Career > Qingpeng zhang week5

Qingpeng zhang week5

Date post: 07-Apr-2017
Category:
Upload: qingpeng-qp-zhang
View: 50 times
Download: 2 times
Share this document with a friend
23
Introducing VenmoPlus.com -Explore your Venmo network! Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow
Transcript
Page 1: Qingpeng zhang week5

Introducing VenmoPlus.com-Explore your Venmo network!

Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow

Page 2: Qingpeng zhang week5

Historical transactions

Real time transactions

Pipeline

Page 3: Qingpeng zhang week5

2013

Biggest Challenge:

● Calculate/Query graph distance in real time

Page 4: Qingpeng zhang week5

● Cache of 2nd degree friends list● Partitioned GraphDB● Good for Linkedin (hundreds of million

users, with higher degree)

● 5 million vertices (users)● 32 million distinct edges (transactions)● 88 million total edges (transactions)

Page 5: Qingpeng zhang week5

● Cache of 2nd degree friends list● Partitioned GraphDB● Good for Linkedin (hundreds of million

users, with higher degree)

● 5 million vertices (users)● 32 million distinct edges (transactions)● 88 million total edges (transactions)

No cache (precalculation)?No GraphDB?

Page 6: Qingpeng zhang week5

Historical transactions

Real time transactions

Two Databases

Page 7: Qingpeng zhang week5

Two Databases

420890 Graham Hadley

1630476 Leon Tang

810029 Harminder Toor

1371353 Ephraim Park

562884 Paul Min

420890 set(14935158, 562884)

1630476 set(1371353)

810029 set(190230,14935158)

1371353 set(810029,971156)

562884 set(196371,1371353)

Page 8: Qingpeng zhang week5

Two Databases

Page 9: Qingpeng zhang week5

Optimizations

● Two databases● Graph algorithms optimization● S3⇔Redis S3⇔ Elasticsearch distributedly with Spark● ...

Page 10: Qingpeng zhang week5

VenmoPlus.com

m4.xlarge

m4.large

m4.xlarge

m4.large

t2.micro

$29.11/day

Page 11: Qingpeng zhang week5

About Me

● Postdoc in Lawrence Berkeley National Lab● PhD in Computer Science, Michigan State● BS in Physics, Nanjing U.

Certified Volunteers:

● Software Carpentry● Data Carpentry● American Red Cross

Christmas Eve 2014, ice storm, Michigan

Page 12: Qingpeng zhang week5

Algorithm Optimization

Shortest distance -> intersection of sets (friend lists)

● 1st degree friends of A ∩ 1st degree friends of B == [] ?● 2nd degree friends of A ∩ 1st degree friends of B == []?

Page 13: Qingpeng zhang week5

Algorithms Design -2

Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….)

● A recent transaction for a user is history and has changed the graph● Query distance of the two users at that moment.

○ not considering that specific transaction)○ Remove the influence of that specific transaction temporarily and restore

■ Test if that transaction is the first between the pair of users.

Page 14: Qingpeng zhang week5

1 Spark m4.large 0.12 2.88

2 Spark m4.large 0.12 2.88

3 redis m4.xlarge 0.24 5.76

4 Elasticsearch

m4.xlarge 0.24 5.76

5 Elasticsearch

m4.xlarge 0.24 5.76

6 Kafka, producer

m4.large 0.12 2.88

7 kafka m4.large 0.12 2.88

8 webserver t2.micro 0.013 0.312

https://github.com/qingpeng/VenmoPlus for more details!

$29.11/24hours

Page 15: Qingpeng zhang week5

AlgorithmsDistance detection between vertices in graph (1st, 2nd, 3rd friends?)

● 1st degree friends of A ∩ 1st degree friends of B == [] ?● 2nd degree friends of A ∩ 1st degree friends of B == []?

Page 16: Qingpeng zhang week5

Pipeline

Page 17: Qingpeng zhang week5

Redis:

● Graph Edges: userID -> userID● Graph Vertices: userID -> userName

In memory DB -> Fast graph updating, graph traversal, in real time

ElasticSearch:

● Everything about the transactions

Distributed -> Data storage and full text search, in real time

Big Challenge:

● Graph distance + Common connections in real time

Page 18: Qingpeng zhang week5

Pipeline

Historical transactions

Page 19: Qingpeng zhang week5

This, or that? - to build graph

Page 20: Qingpeng zhang week5

This, or that? - for fast searching

Page 21: Qingpeng zhang week5

Lesson learned

Page 22: Qingpeng zhang week5
Page 23: Qingpeng zhang week5

Recommended