Date post: | 07-Apr-2017 |
Category: |
Career |
Upload: | qingpeng-qp-zhang |
View: | 50 times |
Download: | 2 times |
Introducing VenmoPlus.com-Explore your Venmo network!
Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow
Historical transactions
Real time transactions
Pipeline
2013
Biggest Challenge:
● Calculate/Query graph distance in real time
● Cache of 2nd degree friends list● Partitioned GraphDB● Good for Linkedin (hundreds of million
users, with higher degree)
● 5 million vertices (users)● 32 million distinct edges (transactions)● 88 million total edges (transactions)
● Cache of 2nd degree friends list● Partitioned GraphDB● Good for Linkedin (hundreds of million
users, with higher degree)
● 5 million vertices (users)● 32 million distinct edges (transactions)● 88 million total edges (transactions)
No cache (precalculation)?No GraphDB?
Historical transactions
Real time transactions
Two Databases
Two Databases
420890 Graham Hadley
1630476 Leon Tang
810029 Harminder Toor
1371353 Ephraim Park
562884 Paul Min
420890 set(14935158, 562884)
1630476 set(1371353)
810029 set(190230,14935158)
1371353 set(810029,971156)
562884 set(196371,1371353)
Two Databases
Optimizations
● Two databases● Graph algorithms optimization● S3⇔Redis S3⇔ Elasticsearch distributedly with Spark● ...
VenmoPlus.com
m4.xlarge
m4.large
m4.xlarge
m4.large
t2.micro
$29.11/day
About Me
● Postdoc in Lawrence Berkeley National Lab● PhD in Computer Science, Michigan State● BS in Physics, Nanjing U.
Certified Volunteers:
● Software Carpentry● Data Carpentry● American Red Cross
Christmas Eve 2014, ice storm, Michigan
Algorithm Optimization
Shortest distance -> intersection of sets (friend lists)
● 1st degree friends of A ∩ 1st degree friends of B == [] ?● 2nd degree friends of A ∩ 1st degree friends of B == []?
Algorithms Design -2
Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….)
● A recent transaction for a user is history and has changed the graph● Query distance of the two users at that moment.
○ not considering that specific transaction)○ Remove the influence of that specific transaction temporarily and restore
■ Test if that transaction is the first between the pair of users.
1 Spark m4.large 0.12 2.88
2 Spark m4.large 0.12 2.88
3 redis m4.xlarge 0.24 5.76
4 Elasticsearch
m4.xlarge 0.24 5.76
5 Elasticsearch
m4.xlarge 0.24 5.76
6 Kafka, producer
m4.large 0.12 2.88
7 kafka m4.large 0.12 2.88
8 webserver t2.micro 0.013 0.312
https://github.com/qingpeng/VenmoPlus for more details!
$29.11/24hours
AlgorithmsDistance detection between vertices in graph (1st, 2nd, 3rd friends?)
● 1st degree friends of A ∩ 1st degree friends of B == [] ?● 2nd degree friends of A ∩ 1st degree friends of B == []?
Pipeline
Redis:
● Graph Edges: userID -> userID● Graph Vertices: userID -> userName
In memory DB -> Fast graph updating, graph traversal, in real time
ElasticSearch:
● Everything about the transactions
Distributed -> Data storage and full text search, in real time
Big Challenge:
● Graph distance + Common connections in real time
Pipeline
Historical transactions
This, or that? - to build graph
This, or that? - for fast searching
Lesson learned