+ All Categories
Home > Data & Analytics > Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)

Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)

Date post: 21-Jan-2018
Category:
Upload: semantic-web-company
View: 687 times
Download: 1 times
Share this document with a friend
33
Scaling Seman+c Technology to Increase User Engagement FT.com September, 16 th 2015 Ontotext, Scaling Semantic Technology #1 Sept, 2015
Transcript

Scaling  Seman+c  Technology  to  Increase  User  Engagement  -­‐  FT.com  

   

September,  16th  2015    

 

Ontotext, Scaling Semantic Technology #1 Sept, 2015

•  Introducing  Ontotext  •  Related  Reads  –  a  FT.com  use  case  

•  What  we  managed  to  achieve  

•  Hands  on  FT.com  live  

•  PosiHve  signs  across  the  news  and  media  domain  

•  Hands  on  NOW  –  News  on  the  Web  demo  service  

 

Outline  

Ontotext, Scaling Semantic Technology #2 Sept, 2015

Why?    enable  be>er  search,  analy+cs  and  content  delivery  

What?    data  and  content  management  technology      graph  database  engine  +  text-­‐mining  solu+ons  

How?  seman+c  analysis  of  text,  linking  text  to  data    NoSQL  database  with  inference  

Best  for:  dealing  with  heterogeneous  dynamic  data  

Clients:  BBC,  FT,  Bloomberg,  DK,  AstraZeneca,  Wiley,  etc.  

Facts:    70  staff;  HQ  in  Sofia;  sales  in  London  &  New  York  

USP:  the  best  semanHc  graph  database  engine    text-­‐mining  pla[orm  integrated  with  graph  database  

Company  Brief  

Ontotext, Scaling Semantic Technology #3 Sept, 2015

Sample  RDF  Graph:  Data  and  Schema  

#4 Sept, 2015

myData:Maria

ptop:Agent

ptop:Person

ptop:Woman

ptop:childOf

ptop:parentOf

rdfs:range

owl:inverseO

f

inferred

myData:Ivan

owl:relativeOf

owl:inverseOfowl:SymmetricProperty

rdfs:subPropertyOf

owl:inverseOf

owl:inverseOf

rdf:type

rdf:type

rdf:type

Ontotext, Scaling Semantic Technology

Interlinking  Text  and  Data  

Ontotext, Scaling Semantic Technology #5 Sept, 2015

Seman+c  Annota+on  

Ontotext, Scaling Semantic Technology #6

pmid:17714090

umls:C0035204

COPD

Bronchial Diseases

Respiration Disorders

umls:C0006261

Chronic Obstructive Airway Diseases

Asthma umls:C000496

Ian A Yang

Clinical and experimental pharmacology …

Sept, 2015

Technology  PorTolio  

Ontotext, Scaling Semantic Technology #7 Sept, 2015

Ontotext  and  Financial  Times  

Ontotext, Scaling Semantic Technology

Profile  •  Top  3  business  media  •  Focused  both  on  B2C  publishing  and  B2B  

services    

Goals  •  Create  a  horizontal  pla[orm  for  both  data  

and  content  based  on  semanHcs  and  serve  all  funcHonality  through  it  

Challenges  •  CriHcal  part  of  the  enHre  workflow  •  MulHple  development  projects  in  parallel  

with  up  to  2  months  Hme  between  incepHon  and  go  live  

 

•  Horizontal  pla[orm  with  focus  on  organizaHons,  people,  GPEs  and  relaHons  between  them  

•  AutomaHc  extracHon  of  all  these  concepts  and  relaHonships    

•  Separate  stream  of  work  for  a  user  behavior  based  recommenda+on  of  relevant  content  and  data  across  the  enHre  media  

#8 Sept, 2015

   

 

Serve  relevant  arHcles    to  increase  user  engagement    

and  improve  usability  

FT  Primary  Objec+ve  

Ontotext, Scaling Semantic Technology #9 Sept, 2015

 Subject:  User  Object:  Ar+cle,  Media  Asset,  Data,  …    AcHon:  Read,  Preview,  Comment,  …        

Subject,  Object,  Ac+on  

Ontotext, Scaling Semantic Technology #10 Sept, 2015

action

         

Contextual  Recommenda+on  

Ontotext, Scaling Semantic Technology #11 Sept, 2015

Contextual Similarity

         

Behavioural  Recommenda+on  

Ontotext, Scaling Semantic Technology #12 Sept, 2015

Behavioural Similarity

User Prof

ile

         

Contextual  and  Behavioural  in  Combina+on  

Ontotext, Scaling Semantic Technology #13 Sept, 2015

Behavioural and

Contextual SimilarityReads

User Prof

ile

         

Average  News  Ar+cle  Metadata  

Ontotext, Scaling Semantic Technology #14 Sept, 2015

Article

NY

promoted (popular)

updated

created

image

summary

title

ID

URL

reads

views

votes

comments

         

FT  Ar+cle  Metadata  

Ontotext, Scaling Semantic Technology #15 Sept, 2015

Summary

Title

body

editorial

img:alt

people

regions

organisations

IPTC

tags

         

Metadata  Used  

Ontotext, Scaling Semantic Technology #16 Sept, 2015

Summary

Title

body

editorial

img:alt

people

regions

organisations

IPTC

tags

concepts keyphrases

         

User  Ac+ons    

Ontotext, Scaling Semantic Technology #17 Sept, 2015

Limited  to  User  reads  ArHcle  

reads

         

User  Ac+ons:  Another  Perspec+ve  

Ontotext, Scaling Semantic Technology #18 Sept, 2015

perform

comments

votes

posts

preview

read

contains leads to read

leads to preview

Article

Search Action

Result

Date

FTS Q. TagCat

Tag set

results

cattaxonomy

Search Log-----------------------------------------------------------------

•  Relies  on  the  previous  choices  of  an  individual  user  (a  user's  profile)  

•  Results  on  the  basis  of  the  similarity  of  items,  defined  in  terms  of  their  content  

•  The  recommended  content  is  rather  homogeneous  

“Content”-­‐based  Recommenda+on  

Ontotext, Scaling Semantic Technology #19 Sept, 2015

Two-­‐fold  scoring  approach  

 

•  Similarity  to  recently  viewed  arHcles  (context)  

•  Relevance  to  a  long-­‐term  user  profile  –  Weights  reflecHng  the  relaHve  importance  of  the  individual  terms  (staHc  component)    

–  TransiHon  likelihoods  among  any  pair  of  terms  (dynamic  component)  

Content-­‐based  Ranking  Mechanisms  

Ontotext, Scaling Semantic Technology #20 Sept, 2015

•  Rely  on  staHsHcs  that  reflect  the  past  choices  of  all  users  

•  Results  based  on  user  raHngs,  and  the  similarity  of  users  or  items  

•  Content-­‐agnosHc  •  Aware  of  the  quality  of  content  

Collabora+ve  Filtering  

Ontotext, Scaling Semantic Technology #21 Sept, 2015

Collabora+ve  Ranking  Mechanisms  

Ontotext, Scaling Semantic Technology #22 Sept, 2015

User to Content Similarity Score

User to User Sim. Score

Content to Content Sim. Score

•  Combines  both  approaches  to  improve  the  quality  of  predicHon  

•  Implemented  via  staHsHcal  models  

•  Takes  a  wide  array  of  features  into  consideraHon  

Hybrid  Approach  

Ontotext, Scaling Semantic Technology #23 Sept, 2015

     Ini+al  Architecture  

Ontotext, Scaling Semantic Technology #24 Sept, 2015

Final  Architecture  

Ontotext, Scaling Semantic Technology #25 Sept, 2015

SOLR 1

SOLR 2

SOLR 3

CS Node 3

CS Node 1

CS Node 2

ReplicationGroup I

FT API

Fetch &Annotation

OWLIMWorker

RecommendationAPI

Varnish Cache

RR

RR

RR

Read

Article

1. get related

2. ask

4. query

3. on cache miss

1. pull content

2. annotate3. indexannotatecontent

storeuser

profiles

updatepopularity

click stream

update user

AWS INSTANCE

AWS INSTANCEAWS INSTANCE

AWS Elastic LB

1.  Pull  content  –  annotate/enrich  –  index    

2.  Accumulate/update  user  profile  

3.  Recommend  

Main  Ac+ons  

Ontotext, Scaling Semantic Technology #26 Sept, 2015

Implementa+on  Overview  

Ontotext, Scaling Semantic Technology #27 Sept, 2015

Profile Update Request

(User ID, Item ID)

Query Generation Items Index (Solr)

Profile Storage

(Cassandra)

Recommendation Request (User ID)

Profile Update

User: - context - static component - dynamic component Article: - co-visitation matrix - popularity

Boosted sub-queries for all involved ranking schemes: content-based, collaborative, popularity, recency

•  8m  named  enHHes  and  metadata  about  them  

•  20m  labels  of  People  and  OrganisaHons  

•  CES  cluster  which  can  be  scaled  horizontally  to  handle  peak  loads  

•  Live  dicHonary  updates  coming  from  GraphDB  through  the  EUF  (EnHty  Update  Feed)  plugin    

•  Max  throughput  -­‐  10  docs/sec  on  a  single  c3.2xlarge  AWS  node,  mulHple  by  N  to  get  an  N  nodes  cluster  throughput  

•  Reliability  has  been  100%,  but  the  soluHon  hasn't  been  stressed  as  much  as  we've  designed  it  for  

Wrap  up  -­‐  Concept  Extrac+on  Highlights  

Ontotext, Scaling Semantic Technology #28 Sept, 2015

•  100%  reliability  in  producHon  for  a  full  year  (Ontotext  also  manages  the  deployment)  

•  API  handling  1,5m  requests  a  day  on  average,  up  to  3m  requests  a  day  (1/3  recommendaHons,  1/3  logging  user  acHon,  1/3  checking  whether  a  user  has  enough  history  to  ask  for  behavioural  recommendaHons)  

•  Roughly  200m  recommendaHons  served  and  200m  user  acHons  tracked  to  day  since  go  live  

•  450  873  documents  indexed  

•  No  caching,  since  everything  is  effecHvely  a  personalized  search  request  

Wrap  up  -­‐  Recommenda+on  Highlights  

Ontotext, Scaling Semantic Technology #29 Sept, 2015

•  GraphDB  had  to  comply  with  a  set  of  tests  designed  by  FT  and  OT:  Network  lag,  Disk  Space,  Disk  Load,  Less  Memory,  CPU  Load,  etc.  

•  Comprehensive  support  for  OWL  and  SPARQL  

•  Efficient  inference  through  the  enHre  life-­‐cycle  of  the  data  

•  High-­‐availability  cluster  architecture  –  proven  and  mature  for  more  than  5  years  now  –  GraphDB  first  HA  implementaHons  works  at  BBC  since  2010  –  Unmatched  HA  Tests  and  TransacHon  load  benchmarks  

•  FTS  and  NoSQL  Connectors  for  seamless  integraHon  

Wrap  up  –  GraphDB  Highlights  

Ontotext, Scaling Semantic Technology #30 Sept, 2015

•  Washington  Post  tests  new  ‘Knowledge  Map’  feature  “Our  ulHmate  goal  is  to  mine  big  data  to  surface  highly  personalized  and  

contextual  data  for  both  journalisHc  and  naHve  content.”  

•  New  York  Times  RnD  Lab  announced  an  experimental  project  “Editor”  1)  recognize  a  term  that  can  be  categorized,  2)  link  that  enHty  to  exisHng  

databases  or  microservices,  3)  make  this  enriched  informaHon  accessible  to  journalists  

•  BBC  Structured  Journalist  Manifesto  Structured  journalism  :  1)  On  the  reporter  side  -­‐  automaHon  helps  

improve  a  journalist’s  reporHng  and  make  it  less  cumbersome,  2)  on  the  audience  side  semtech  helps  scale  things  that  can  improve  the  reader’s  experience  

Posi+ve  Signs  from  the  News  Industry  

Ontotext, Scaling Semantic Technology #31 Sept, 2015

Selec+on  of  Ontotext  Customers  

Ontotext, Scaling Semantic Technology #32 Sept, 2015

Thanks!  

Ontotext, Scaling Semantic Technology #33 Sept, 2015

 

We  will  be  delighted  to  have  a  word  with  you  auer  the  session  or  later  today  or  tomorrow!  

 

•  Dr.  Georgi  Georgiev  –  Head  of  Ontotext  Text  Analysis  Unit    -­‐  [email protected]    

•  Ilian  Uzunov  –  Sales  Director  CEMEAA  -­‐  [email protected]    

•  Nikolay  Krustev  –  GraphDB  Sales  Engineer  -­‐  [email protected]    


Recommended