Qingpeng zhang 0711

Post on 07-Apr-2017

61 views 0 download

transcript

Introducing VenmoPlus.com - Explore your Venmo network!

Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow

Features - VenmoPlus.com

● fuzzy searching of user name, with friend list to help identify users with same name

● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user

Features - VenmoPlus.com

● fuzzy searching of user name, with friend list to help identify users with same name

● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user

Demo:VenmoPlus.com

Challenge:● Find the distance between nodes in dynamic graph in real time

Solutions

● Two databases○ Redis and ElasticSearch

● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction

● Query/search optimizations

Solutions

● Two databases○ Redis and ElasticSearch

● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction

● Query/search optimizations

Historical transactions

Real time transactions

A Tale of Two Databases

API

Redis for graph structure

420890 Graham Hadley

1630476 Leon Tang

810029 Harminder Toor

1371353 Ephraim Park

562884 Paul Min

420890 set(14935158, 562884)

1630476 set(1371353)

810029 set(190230,14935158)

1371353 set(810029,971156)

562884 set(196371,1371353)35 million edges6 million nodes

ElasticSearch for everything

Redis

Elasticsearch

Redis + Elasticsearch => search transactions in friend circle

VenmoPlus.com

m4.xlarge

m4.large

m4.xlarge

m4.large

t2.micro

$29.11/day

Qingpeng “Q.P.” Zhang

● Postdoc○ Lawrence Berkeley National Lab

● PhD in Computer Science, ○ Michigan State University

What I learned from Insight:

● Thinking as data engineer● Open source tools

○ Redis, Elasticsearch, Kafka, Spark Streaming, Flask, AngularJS, etc.

ElasticSearch for everything

Breadth First Search -> Bidirectional Search

Shortest distance -> intersection of sets (friend lists)

● A’s 1st degree friends ∩ B’s 1st degree friends● A’s 2nd degree friends ∩ B’s 1st degree friends

O(N^2) -> O(2*N)

O(N^3) -> O(N + N^2)

Query relationship of a past transaction

Query relationship of a past transaction

Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….)

● If there are transactions before that one, distance = 1● If the transaction is new: distance >1

○ Remove the influence of that specific transaction temporarily○ Check distance from graph (2, 3, or >3)

Pipeline, raw data, in distributed way

Query/Search Optimizations

1. Remove aggregation for better performance… (trade-off)2. Friend recommender:

a. Using Counter to get only 5 users with the most common friends

3. Search message in friend circlea. Combine query of Elasticsearch and Redis

More optimization

● Only store necessary info in elasticsearch● Labeling distance of history transaction can be done in batch job, reduce

the number the real time queries● Adjust AWS instances to reduce cost

Historical transactions

Real time transactions

Pipeline

API