+ All Categories
Home > Documents > Berkeley Dataproduct Talk

Berkeley Dataproduct Talk

Date post: 09-Jul-2016
Category:
Upload: lordger-liu
View: 3 times
Download: 0 times
Share this document with a friend
48
Data Products Deep Dive Pete Skomoroch @peteskomoroch 3/31/14 Berkeley CS19416: Intro to Data Science
Transcript
Page 1: Berkeley Dataproduct Talk

Data  Products  Deep  Dive  

Pete  Skomoroch  @peteskomoroch  3/31/14  Berkeley  CS194-­‐16:  Intro  to  Data  Science  

Page 2: Berkeley Dataproduct Talk

Some  Background  

•  Physics/Math  BS  Undergrad  •  Analyst/  SoGware  Engineer  @ProfitLogic  -­‐  3.5  years  •  Biodefense  Engineer  /  ML  Student  @  MIT  -­‐  3.5  years  •  Sr.  Research  Engineer  @  AOL  Search  -­‐  1  year  •  Director  @  Juice  AnalyScs  -­‐  1  year  •  ConsulSng  @  Cloudera,  Amazon  etc  -­‐  1  year  •  Principal  Data  ScienSst  @  LinkedIn  -­‐  4  years  

Page 3: Berkeley Dataproduct Talk

Four  types  of  data  scienSst  (at  least)  

source:  "Analyzing  the  Analyzers"  O'Reilly  Media  

Page 4: Berkeley Dataproduct Talk

Data  ScienSsts  create  data  products  

Page 5: Berkeley Dataproduct Talk

The  data  product  process  

•  Verify  you  are  solving  the  right  problem    •  Theory  +  model  design  •  Measurement:  data  collecSon  and  cleaning  •  Feature  engineering  &  model  development  •  Error  analysis  and  invesSgaSon  •  Iterate  and  improve  each  step  in  the  process  •  Leverage  derived  data  to  build  new  products  

Page 6: Berkeley Dataproduct Talk

Data  factories  &  flywheels  

Source:  h`p://www.linkedin.com/channels/disrupt2013  Steve  Jennings/Ge`y  

Images  Entertainment  

Page 7: Berkeley Dataproduct Talk

Data  Product  Example:  LinkedIn  Skills  

•  Skill  ExtracSon  and  StandardizaSon  Pipeline  •  Skill  Pages  •  Skills  SecSon  on  Member  Profiles  •  Suggested  Skills  Algorithm  and  Email  •  Skill  Endorsements  

Page 8: Berkeley Dataproduct Talk
Page 9: Berkeley Dataproduct Talk
Page 10: Berkeley Dataproduct Talk

Skill  Discovery:  Unsupervised  Topics  from  Profile  SpecialSes  SecSon  

10  

Extract

Page 11: Berkeley Dataproduct Talk

Topic  Clustering  &  Phrase  Sense  DisambiguaSon  

11  

Page 12: Berkeley Dataproduct Talk

DeduplicaSon  Signals  from  Mechanical  Turk  

12  

Page 13: Berkeley Dataproduct Talk

Sample  Task  for  Mechanical  Turk  Workers  

13  

Page 14: Berkeley Dataproduct Talk

Mechanical  Turk  StandardizaSon  

Page 15: Berkeley Dataproduct Talk

Skill  Phrase  DeduplicaSon  

15  

Page 16: Berkeley Dataproduct Talk

Lead  designer  and  engineer  for  the  implementaSon  of  a  user-­‐centric,  fully-­‐configurable  UI  for  data  aggregaSon  and  reporSng.  Developed  over  20  SaaS  custom  applicaSons  using  Python,  Javascript  and  RoR.  

Tagging  Skill  Phrases  •  Tagging:  Extract  potenSal  skill  phrases  from  text  

 

 

•  Standardize  unambiguous  phrase  variants  

16  

JavaScript RoR SaaS Python

ror rubyonrails ruby on rails development ruby rails ruby on rail

Ruby on Rails

Document  (ex:  Profile)  

TokenizaSon  

Skills  Tagger  

Phrases (up to 6 words)

Skills  Classifier  

Skills (unordered)

Skills (ranked by relevance)

Page 17: Berkeley Dataproduct Talk
Page 18: Berkeley Dataproduct Talk
Page 19: Berkeley Dataproduct Talk
Page 20: Berkeley Dataproduct Talk
Page 21: Berkeley Dataproduct Talk
Page 22: Berkeley Dataproduct Talk
Page 23: Berkeley Dataproduct Talk
Page 24: Berkeley Dataproduct Talk
Page 25: Berkeley Dataproduct Talk
Page 26: Berkeley Dataproduct Talk
Page 27: Berkeley Dataproduct Talk
Page 28: Berkeley Dataproduct Talk
Page 29: Berkeley Dataproduct Talk
Page 30: Berkeley Dataproduct Talk

30  

Page 31: Berkeley Dataproduct Talk

Skills  Related  to  “Big  Data”  

31  

Page 32: Berkeley Dataproduct Talk

Skills  Correlated  with  the  Job  Title  “Data  ScienSst”  

32  

Page 33: Berkeley Dataproduct Talk

SkillRank:  Algorithm  for  Top  People  

33  

Page 34: Berkeley Dataproduct Talk

How  do  we  get  more  people  into  the  skill  graphs?  

Page 35: Berkeley Dataproduct Talk

Suggested  Skills  Inference  •  How  suggested/inferred  skills  work:  

–  The  skill  likelihood  is  a  condiSonal  model  

–  ProbabiliSes  are  combined  using  a  Naïve  Bayes  Classifier      

 •  If  you  are  an  engineer  at  Apple,  you  probably  know  

about  iPhone  Development.  

   

35  

Profile  

Extract  a`ributes  

- Company ID - Title ID - Groups ID - Industry ID - …

Skills  Classifier  

Skills (ranked by likelihood)

Feature Vectors

Page 36: Berkeley Dataproduct Talk
Page 37: Berkeley Dataproduct Talk
Page 38: Berkeley Dataproduct Talk
Page 39: Berkeley Dataproduct Talk
Page 40: Berkeley Dataproduct Talk
Page 41: Berkeley Dataproduct Talk

Skill  RecommendaSons  for  Your  LinkedIn  Profile  

41  

49%  Conversion  

4%  Conversion  

Page 42: Berkeley Dataproduct Talk

ReputaSon:  Build  Endorsements  Product  to  Collect  More  Graph  Edges  

42  

Page 43: Berkeley Dataproduct Talk

PYMK  +  Suggested  Skills  

43  

Page 44: Berkeley Dataproduct Talk

44  

Viral Growth: 1 Billion Endorsements in 5 Months

Page 45: Berkeley Dataproduct Talk

Social  Viral  Tagging  =  Lots  of  Data  

Suggested  endorsements  

Skill  recommendaSons  Skill  markeSng  

Virality  only  

Page 46: Berkeley Dataproduct Talk

How  Did  We  Gather  this  Data?    

46  

1.  Desire  +  Social  Proof  2.  Viral  Loops  +  Network  Effects  3.  Data  FoundaSon  +  RecommendaSon  

Algorithms    

Page 47: Berkeley Dataproduct Talk

Recap:  Data  Product  EvoluSon  

•  Skill  ExtracSon  and  StandardizaSon  Pipeline  •  Skill  Pages  •  Skills  SecSon  on  Member  Profiles  •  Suggested  Skills  Algorithm  and  Email    >  20M  members  •  Skill  Endorsements    >  60M  members,  3B+  Edges  •  Big  product  wins  in  engagement,  recall,  relevance  •  SkillRank  &  ReputaSon  integraSon…  •  Sets  stage  for  next  generaSon  of  products  

Page 48: Berkeley Dataproduct Talk

QuesSons?  

@peteskomoroch    h`p://datawrangling.com  h`p://www.linkedin.com/in/peterskomoroch  

 


Recommended