+ All Categories
Home > Technology > LinkedIn Graph Presentation

LinkedIn Graph Presentation

Date post: 11-May-2015
Category:
Upload: amy-w-tang
View: 1,321 times
Download: 0 times
Share this document with a friend
Description:
Chris Conrad (Senior Engineering Manager) and Igor Perisic (Senior Director Engineering) from LinkedIn gave this talk to UC Santa Barbara in 2012.
Popular Tags:
28
The Evolution of the Professional Graph at LinkedIn Chris Conrad Senior Engineering Manager, Social Graph Igor Perisic Sr. Director of Engineering, SNA
Transcript
Page 1: LinkedIn Graph Presentation

The Evolution of the Professional Graph at LinkedIn

Chris Conrad Senior Engineering Manager,

Social Graph

Igor Perisic Sr. Director of Engineering, SNA

Page 2: LinkedIn Graph Presentation

LinkedIn •  The site officially launched on May 5, 2003. At the end of the first

month in operation, LinkedIn had a total of 4,500 members in the network.

•  As of January 9, 2013, LinkedIn operates the world’s largest professional network on the Internet with more than 200 million members in over 200 countries and territories.

•  As of September 30, 2012, LinkedIn counts executives from all 2012 Fortune 500 companies as members; its corporate talent solutions are used by 85 of the Fortune 100 companies.

•  As of the school year ending May 2012, there are over 20 million students and recent college graduates on LinkedIn. They are LinkedIn's fastest-growing demographic.

Page 3: LinkedIn Graph Presentation

In the beginning…

Page 4: LinkedIn Graph Presentation

The Cloud •  Cloud is the original name of our graph engine

•  Responsible for read scaling graph queries (and it used to do search, too)

•  Stored 4 primary sets of data:

Member Data

Group Membership

Network Cache

Connections

Cloud

Page 5: LinkedIn Graph Presentation

What was wrong? •  Large memory footprint

–  Network cache used simple but inefficient data structures

–  The size and density of the graph was increasing

•  Garbage Collector woes –  Large JVM heap caused long GC pauses

–  Long GC pauses reduces availability resulting in site outages

Page 6: LinkedIn Graph Presentation

C++ Graph •  First project: migrate the network cache to a new data structure to

reduce memory usage

•  Second project: implement a C++ JNI library to move the graph data off heap

•  Result: Drastic reduction in JVM heap utilization

Member Data

Group Membership

Network Cache

Java Heap libGraphJNI.so

Connections

Cloud

Page 7: LinkedIn Graph Presentation

Several million users later

Page 8: LinkedIn Graph Presentation

New Problems •  Growth

–  The size and density of the graph was increasing

–  We were running out of memory

–  We were running out of CPU cycles

–  Proliferation of services increased the overhead of maintaining client side software load balancer

–  As of September 30, 2012, LinkedIn has 3,177 full-time employees located around the world. LinkedIn started off 2012 with about 2,100 full-time employees worldwide, up from around 1,000 at the beginning of 2011 and about 500 at the beginning of 2010.

•  C++ code had a much higher maintenance cost –  Coredumps are much less friendly than a NullPointerException

–  LinkedIn didn’thave the expertise or infrastructure to support C++ development

Page 9: LinkedIn Graph Presentation

Split cloud •  cloud-session: Move the load balancing logic into a service we

control

•  rgraph: Extract the C++ graph into its own service

Member Data

Group Membership

Network Cache

Java Heap

Cloud

libGraphJNI.so

Connections

rgraph

cloud-session

Page 10: LinkedIn Graph Presentation

New problems, same as the old •  rgraph instances still had a large memory footprint

–  The density of the graph was increasing

–  We were running out of memory

–  We were running out of CPU cycles

•  cloud-session’s software load balancer implementation was essentially a single point of failure

Page 11: LinkedIn Graph Presentation

Distribute the Graph •  Introduce Norbert a new cluster management system

•  Partition the graph data

•  Partition the network cache service

Member Data

Java Heap

Cloud

cloud-session

Connections

Group Membership

dgraph

Network Cache Service

Page 12: LinkedIn Graph Presentation

Mission Accomplished

Page 13: LinkedIn Graph Presentation

So now what?

Page 14: LinkedIn Graph Presentation

My Connections

Page 15: LinkedIn Graph Presentation

Common Connections

Page 16: LinkedIn Graph Presentation

My Network

Page 17: LinkedIn Graph Presentation

How am I connected?

Page 18: LinkedIn Graph Presentation

What is the professional graph? •  LinkedIn connections

•  Current and past co-workers

•  University colleagues and alumni

•  Group members

•  And what about geography, industry and skill overlap?

Page 19: LinkedIn Graph Presentation

New requirements •  Members aren’t the only type of node in the professional graph

•  LinkedIn connections aren’t the only type of edge in the profession graph

•  We already supported groups and group membership

Page 20: LinkedIn Graph Presentation

Making changes was hard •  Code was rigid

–  Data was stored using class hierarchies, introducing data types was prohibitively slow

–  Queries were built by combining object instances

•  BDBJE

•  Everything was back in the heap

–  Garbage collection time was starting to go up

–  GC pauses no longer caused outages, but flapping introduced high developer and operational overhead

Page 21: LinkedIn Graph Presentation

Graph as a Service •  Custom persistence engine

–  Log structured

–  Memory mapped files keeps data out of the Java heap

–  Data described using DDL like schema

•  Custom SQL like query language –  Query language understands DDL

–  Text based language reduces code changes

Page 22: LinkedIn Graph Presentation

Graph Queries •  Company(:id)[CompanyFollowers] •  Member(:id)[MemberToMember{CreatedAt > :t}]

•  Member(:id)[topN(MemberToMember, Score, 10)]

Page 23: LinkedIn Graph Presentation

What do we have in common?

Page 24: LinkedIn Graph Presentation

How am I connected?

Page 25: LinkedIn Graph Presentation

What’s next? •  Online schema migration

•  Automated repartitioning and data migration

•  Automated provisioning

•  Hierarchical data partitioning

•  Monitoring and statistics

•  Query optimization

•  Query fragment caching

•  Result set caching

•  Query parallelization

•  Very large data set handling

•  …

Page 26: LinkedIn Graph Presentation

2 4 8

17

32

55

90

2004 2005 2006 2007 2008 2009 2010 2011 LinkedIn Members (Millions)

200M+

25th Most visit website worldwide (Comscore 6-12)

Company pages

>2.6M

63% non U.S.

2/sec

85% Fortune 100 Companies use LinkedIn to hire

And we’re still growing

Page 27: LinkedIn Graph Presentation

We’re Hiring •  http://studentcareers.linkedin.com

•  Or email me at [email protected]

Page 28: LinkedIn Graph Presentation

Q&A


Recommended