Neo4j au coeur du graphe social de 45 millions de membres par Nicolas Tricot

Post on 06-Dec-2014

1,886 views 3 download

Tags:

description

Neo4j au coeur du graphe social de 45 millions de membres", ou comment Viadeo est passés d'une technologie maison devenue limitée à un graphe-database plein de perspectives d'avenir pour modéliser son graphe social... http://fr.viadeo.com/fr/profile/nicolas.tricot

transcript

Your network is more powerful than you think

Neo4J au cœur du graphe social de 45 millions de membres

1

Viadeo Tech Days

Les 20, 21 et 22 novembre 2012

Your network is more powerful than you think2 / 36

• 1 million new members /

month

• 10 million connexions /

month

• 100 million profiles

viewed / month

ABOUT THE VIADEO GROUP

GRAPHS ARE EVERYWHERE

4 / 36Your network is more powerful than you think

GRAPHS ARE EVERYWHERE

Your network is more powerful than you think5 / 36

GRAPHS ARE EVERYWHERE

Your network is more powerful than you think6 / 36

GRAPHS ARE EVERYWHERE

© Air France / KLM

Your network is more powerful than you think7 / 36

GRAPHS ARE EVERYWHERE

SOCIAL GRAPH

Your network is more powerful than you think9 / 36

SOCIAL GRAPH

Nodes

Relationships

Your network is more powerful than you think10 / 36

Direct contacts

SOCIAL GRAPH

Your network is more powerful than you think11 / 36

Level 2 contacts

SOCIAL GRAPH

Your network is more powerful than you think12 / 36

SOCIAL GRAPH

Path

Distance 3

Path

Distance 4

PREHISTORY

2006 - 2011

PREHISTORY 2006-2011

Your network is more powerful than you think14 / 36

• In-house algorithm

• Network storage in MySQL Database

CREATE TABLE `Network` (

`memberId` int(11) NOT NULL DEFAULT '0',

`L1` mediumblob NOT NULL,

`L2` mediumblob NOT NULL,

PRIMARY KEY (`memberId`)

) ENGINE=InnoDB;

PREHISTORY 2006-2011

Your network is more powerful than you think15 / 36

Update the network (old-fashioned style)

Member A and Member B are now in contacts

� Update of A.L1 + B.L1 and A.L2 + B.L2� Retrieving A.L1 + B.L1 and update *.L2

Example:• A has 500 contacts• B has 150 contacts� 500 + 150 + 2 = 652 updates!

PREHISTORY 2006-2011

Your network is more powerful than you think16 / 36

Good performances on

� Computation of Paths

� Computation of Distances

Your network is more powerful than you think17 / 36

=

=

PREHISTORY 2006-2011

Your network is more powerful than you think18 / 36

LIMITATIONS

PREHISTORY 2006-2011

3) 48 hours to restart from scratch

2) Massive bandwidth impact for internal network

1) Important latency for complete update

GRAPH DATABASE

GRAPH DATABASE

Your network is more powerful than you think20 / 36

1 2 4 3

6 5

WROTE WROTE

Name: Graham Greene

Born: 02-10-1904

Died: 02-04-1991Title: Our man in Havana

Published: 1958

Title: Tinker, Soldier, Spy

Published: 1974

Name: Graham Greene

Born: 19-10-1932

Name: Alan Name: Ian

RECOMMENDED

Date: 05-07-2011

RECOMMENDED

Date: 09-09-2011

RECOMMENDED

Date: 03-02-2011

© Ian Robinson

GRAPH DATABASE

Your network is more powerful than you think21 / 36

1 2 4 3

6 5

WROTE WROTE

Name: Graham Greene

Born: 02-10-1904

Died: 02-04-1991Title: Our man in Havana

Published: 1958

Title: Tinker, Soldier, Spy

Published: 1974

Name: Graham Greene

Born: 19-10-1932

Name: Alan Name: Ian

RECOMMENDED

Date: 05-07-2011

RECOMMENDED

Date: 09-09-2011

RECOMMENDED

Date: 03-02-2011

Nodes

© Ian Robinson

GRAPH DATABASE

Your network is more powerful than you think22 / 36

1 2 4 3

6 5

WROTE WROTE

Name: Graham Greene

Born: 02-10-1904

Died: 02-04-1991Title: Our man in Havana

Published: 1958

Title: Tinker, Soldier, Spy

Published: 1974

Name: Graham Greene

Born: 19-10-1932

Name: Alan Name: Ian

RECOMMENDED

Date: 05-07-2011

RECOMMENDED

Date: 09-09-2011

RECOMMENDED

Date: 03-02-2011

Nodes

Relationships

© Ian Robinson

GRAPH DATABASE

Your network is more powerful than you think23 / 36

1 2 4 3

6 5

WROTE WROTE

Name: Graham Greene

Born: 02-10-1904

Died: 02-04-1991Title: Our man in Havana

Published: 1958

Title: Tinker, Soldier, Spy

Published: 1974

Name: Graham Greene

Born: 19-10-1932

Name: Alan Name: Ian

RECOMMENDED

Date: 05-07-2011

RECOMMENDED

Date: 09-09-2011

RECOMMENDED

Date: 03-02-2011

Nodes

Relationships

Properties

© Ian Robinson

WHY NEO4J

Findings after POC on 3 other tools:• Old technology with add-on for graph management• No user communities• Bad performance• “Black Box” code

Why ?• OpenSource project• Good documentation• User community• Excellent performance• ACID• Very simple• (How to better model a Social Graph than with a

Graph database ?!?)

WHY Neo4J

Your network is more powerful than you think25 / 36

Your network is more powerful than you think26 / 36

1 node = 1 member 1 Relationship

= 1 direct contact

WHY Neo4J

Your network is more powerful than you think27 / 36

BENEFITS

WHY Neo4J

Instantaneous

graph updates

Very easy to integrate

(less than 2 months)

High Availability

Backup /

Restore

LIMITATION

Doesn’t handle SHARDING!(Split one graph onto several servers)

LIMITATION

Your network is more powerful than you think29 / 36

« Size doesn’t matter… », but…

Server 1 Server 2

EXPLORATION MODE

Your network is more powerful than you think31 / 36

EXPLORATION MODE

What for the future?

� Store various kind of objects

� Change the development paradigm

Your network is more powerful than you think32 / 36

EXPLORATION MODE

Your network is more powerful than you think33 / 36

EXPLORATION MODE

CONCLUSION

CONCLUSION

Your network is more powerful than you think35 / 36

N e o 4 J :

� Has replaced a 5-year-old in-house technology in only 2

months

� Supports the core system of the Viadeo Professional Social

Network

� Has been in production for 1 year ½

� Deals smoothly with Viadeo’s usage growth

Think about how Neo4J will improve your own business!