+ All Categories
Home > Technology > Big Data Usage in Linkedin

Big Data Usage in Linkedin

Date post: 23-Jan-2015
Category:
Upload: information-excellence
View: 397 times
Download: 6 times
Share this document with a friend
Description:
Information Excellence Presentation 2010 Sep from Hari Shankar, Linkedin Big Data Engineer, on Big Data usage in Linkedin
30
Recruiting Solutions Recruiting Solutions Recruiting Solutions Harvesting Information Excellence Information Excellence 2012 Sep Session
Transcript
Page 1: Big Data Usage in Linkedin

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Harvesting Information Excellence

Information Excellence2012 Sep Session

Page 2: Big Data Usage in Linkedin

Information Excellence 2 informationexcellence.wordpress.com

Big Data Usage and Implementation in Linkedin

Hari Shankar, Big Data Engineer, Linkedin

Thank You

for hosting us today

Today’s Speakers

Page 3: Big Data Usage in Linkedin

Big data and Hadoop

September 2012

Hari Shankar MenonSoftware engineerLinkedIn

3

Page 4: Big Data Usage in Linkedin

LinkedIn Engineering Data warehouse team

Previously, Software engineer @Clickable– Worked on building the reporting and analytics platform on

Hadoop and HBase.

Hadoop and Open-source enthusiast

4

About me

Page 5: Big Data Usage in Linkedin

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

5

Agenda

Page 6: Big Data Usage in Linkedin

Our missionConnect the world’s professionals to make

them more productive and successful

6

Page 7: Big Data Usage in Linkedin

7

*as of Nov 4, 2011**as of June 30, 2011

2 48

17

32

55

90

2004 2005 2006 2007 2008 2009 2010

LinkedIn Members (Millions)

175M+

85%Fortune 100 Companies use LinkedIn to hire

Company Pages

>2M

**

New Members joining

~2/sec

Professional searches in 2011

~4.2B

LinkedIn by numbers

Page 8: Big Data Usage in Linkedin

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

8

Page 9: Big Data Usage in Linkedin

* Chart from Philip Russom- Research Director: TDWI

What is big data?

Page 10: Big Data Usage in Linkedin

10

Infrastructure technologies

Primary data store (Front-end)Distributed key-value store

Document-oriented store

Distributed PubSub messaging

Search technologies

Database change replication SenseiDB

Zoie Bobo

Page 11: Big Data Usage in Linkedin

11

http://data.linkedin.com/opensource

Open source

Page 12: Big Data Usage in Linkedin

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

12

Page 13: Big Data Usage in Linkedin

What is Hadoop Evolution of Hadoop Impact

13

Page 14: Big Data Usage in Linkedin

Recommendation systems– Generating recommendations– Modeling– A/B Testing– Grandfathering

Data warehouse/ETL– Raw data storage– Aggregations– Heavy lifting

Data sciences– Strategic analyses– Experimentation sandbox

14

@

Page 15: Big Data Usage in Linkedin

15

Pandora Search for People

Events YouMay BeInterested In

Groups browse maps

The Recommendations opportunity

• Relevance/Latency

• Offline computation

• Caching

Page 16: Big Data Usage in Linkedin

16

Improving recommendations

• Mathematical modeling

• A/B Testing

• Grandfathering

Page 17: Big Data Usage in Linkedin

17

Hadoop in the Data warehouse

• Source of truth• Lower retention• Ad-hoc analysis

• Longer retention• Complex

transformations• Algorithmic

computations

Page 18: Big Data Usage in Linkedin

18

Hadoop in Data Sciences

• Deep dives

• Sandbox

• Hackday projects

Page 19: Big Data Usage in Linkedin

19

Data Insights - 1

Job migration after financial collapse

Page 20: Big Data Usage in Linkedin

20

Data Insights - 2

Page 21: Big Data Usage in Linkedin

21

Data Insights - 3

Page 22: Big Data Usage in Linkedin

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

22

Page 23: Big Data Usage in Linkedin

1. User adoption of new technologies2. Real-time processing3. Graph/Network algorithms4. Making data accessible

23

Challenges

Page 24: Big Data Usage in Linkedin

24

User adoption

Page 25: Big Data Usage in Linkedin

25

• Challenges• Random reads/writes• Warm-up time

• Solutions• Parts of the problem that can be moved offline?• HBase, Voldemort

Real-time processing

Page 26: Big Data Usage in Linkedin

26

• Graph problems• Traditional joins

Map-reduce-incompatible problems

Page 27: Big Data Usage in Linkedin

27

• Hadoop Tons of data

Making data accessible

Page 28: Big Data Usage in Linkedin

Finally!

No Silver bullet

Hadoop Offline processing

Scalability by design

28

Page 29: Big Data Usage in Linkedin

www.linkedin.com/in/harisreekumar

29

www.linkedin.com/company/linkedin/careers

Page 30: Big Data Usage in Linkedin

Information Excellence 30 informationexcellence.wordpress.com

Community Focused

Volunteer Driven

Knowledge Share

Accelerated Learning

Collective Excellence

Distilled Knowledge

Shared, Non Conflicting Goals

Validation / Brainstorm platform

Mentor, Guide, Coach

Satisfied, Empowered Professional

Richer Industry and Academia

About Information Excellence Group

Progress Information Excellence

Towards an Enriched Profession, Business and Society


Recommended