Post on 23-Jan-2015
description
transcript
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Harvesting Information Excellence
Information Excellence2012 Sep Session
Information Excellence 2 informationexcellence.wordpress.com
Big Data Usage and Implementation in Linkedin
Hari Shankar, Big Data Engineer, Linkedin
Thank You
for hosting us today
Today’s Speakers
Big data and Hadoop
September 2012
Hari Shankar MenonSoftware engineerLinkedIn
3
LinkedIn Engineering Data warehouse team
Previously, Software engineer @Clickable– Worked on building the reporting and analytics platform on
Hadoop and HBase.
Hadoop and Open-source enthusiast
4
About me
About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges
5
Agenda
Our missionConnect the world’s professionals to make
them more productive and successful
6
7
*as of Nov 4, 2011**as of June 30, 2011
2 48
17
32
55
90
2004 2005 2006 2007 2008 2009 2010
LinkedIn Members (Millions)
175M+
85%Fortune 100 Companies use LinkedIn to hire
Company Pages
>2M
**
New Members joining
~2/sec
Professional searches in 2011
~4.2B
LinkedIn by numbers
About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges
8
* Chart from Philip Russom- Research Director: TDWI
What is big data?
10
Infrastructure technologies
Primary data store (Front-end)Distributed key-value store
Document-oriented store
Distributed PubSub messaging
Search technologies
Database change replication SenseiDB
Zoie Bobo
11
http://data.linkedin.com/opensource
Open source
About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges
12
What is Hadoop Evolution of Hadoop Impact
13
Recommendation systems– Generating recommendations– Modeling– A/B Testing– Grandfathering
Data warehouse/ETL– Raw data storage– Aggregations– Heavy lifting
Data sciences– Strategic analyses– Experimentation sandbox
14
@
15
Pandora Search for People
Events YouMay BeInterested In
Groups browse maps
The Recommendations opportunity
• Relevance/Latency
• Offline computation
• Caching
16
Improving recommendations
• Mathematical modeling
• A/B Testing
• Grandfathering
17
Hadoop in the Data warehouse
• Source of truth• Lower retention• Ad-hoc analysis
• Longer retention• Complex
transformations• Algorithmic
computations
18
Hadoop in Data Sciences
• Deep dives
• Sandbox
• Hackday projects
19
Data Insights - 1
Job migration after financial collapse
20
Data Insights - 2
21
Data Insights - 3
About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges
22
1. User adoption of new technologies2. Real-time processing3. Graph/Network algorithms4. Making data accessible
23
Challenges
24
User adoption
25
• Challenges• Random reads/writes• Warm-up time
• Solutions• Parts of the problem that can be moved offline?• HBase, Voldemort
Real-time processing
26
• Graph problems• Traditional joins
Map-reduce-incompatible problems
27
• Hadoop Tons of data
Making data accessible
Finally!
No Silver bullet
Hadoop Offline processing
Scalability by design
28
www.linkedin.com/in/harisreekumar
29
www.linkedin.com/company/linkedin/careers
Information Excellence 30 informationexcellence.wordpress.com
Community Focused
Volunteer Driven
Knowledge Share
Accelerated Learning
Collective Excellence
Distilled Knowledge
Shared, Non Conflicting Goals
Validation / Brainstorm platform
Mentor, Guide, Coach
Satisfied, Empowered Professional
Richer Industry and Academia
About Information Excellence Group
Progress Information Excellence
Towards an Enriched Profession, Business and Society