Semantic Digital Humanities Workshop 2015 @Oxford

Post on 21-Mar-2017

546 views 0 download

transcript

Open, Connected & Smart Heritage: Towards New Cultural Commons

Lora Aroyo

Semantic Digital Humanities 2015, Oxford

massive  amount  of  digital  content  to  explore  …  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

but  at  some  point  it  all  looks  the  same  …  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

audiences  feel  disconnected  &  lost  …  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

SMART  

We  need  more  of  this  

Johan  Oomen,  Lora  Aroyo  (2011).  Crowdsourcing  in  the  Cultural  Heritage  Domain:  OpportuniCes  and  Challenges,  hDp://www.iisi.de/fileadmin/IISI/upload/2011/p138_oomen.pdf    

CONNECTED   OPEN  

Smart  new technologies for indexing, retrieval & linking link to the workflows of creative industries distribution over various devices & platforms

 Connected  

Open  to users

between collections to distributed content

     “For  content  to  be  truly  accessible,  it  needs  to  be  where  the  users  are,  embedded  in  their  daily  networked  lives.” (Wabel,  2009)  

to stimulate collaboration & creativity

“Enabling  anything  like  seamless  access  to  the  cultural  record  will  require  developing  tools  to  navigate  among  vast  catalogs  of  born-­‐digital  and  digiCzed  materials    […]  The  return  on  this  investment  will  be  a  humaniCes  and  social  science  cyberinfrastructure  that  will  allow  new  quesCons  to  be  asked,  new  paDerns  and  relaCons  to  be  discerned,  and  deep  structures  in  language,  society,  and  culture  to  be  exposed  and  explored.”  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

… Digital Humanities researchers

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

… they often don’t find what they were searching for

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

“an  event  is  the  exemplificaLon  of  a  property  by  a  substance  at  a  given  Lme”  Jaegwon  Kim,  1966  “events  are  changes  that  physical  objects  undergo”  Lawrence  Lombard,  1981  

“events  are  properLes  of  spaLotemporal  regions”,  David  Lewis,  1986  

L.  Aroyo,  C.  Welty:  Harnessing  Disagreement  in  Crowdsourcing  Events.  DeRIVE  2011  @ISWC2011.  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

typically collections are described by experts ...

“A planned public or social get together or occasion.”  

“an event is an incident that's very important or monumental”  

“An event is something occurring at a specific time and/or date to celebrate or recognize a particular occurrence.”  

“a location where something like a function is held. you could tell if something is an event if there people gathering for a purpose.”  

“Event can refer to many things such as: An observable occurrence, phenomenon or an extraordinary occurrence.”  

but the crowd talks about things in a different way ...

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

… and they all search & browse with some implicit relevance in mind

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

             we  need  ….                support  of  mulLple  perspecLves  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

How  to  bridge  the  GAP  between    Expert  &  Crowd  SemanDcs?  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

a  novel  approach  to  gather  diversity  of  perspecDves  &  opinions  from  the  crowd,  expand  expert  vocabularies  with  these  and  gather  new  type  of  gold  standard  for  machines    

http://lora-aroyo.org http://slideshare.net/laroyo @laroyo

L.  Aroyo,  C.  Welty:  Crowd  Truth:  Harnessing  disagreement  in  crowdsourcing  a  relaLon  extracLon  gold  standard.  ACM  WebSci  2013.  

CrowdTruth.org  

Peter  Singer  

           we  have  ….              altruism-­‐driven  crowds  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

Peter  Singer  

           we  have  ….              altruism-­‐driven  crowds  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

Q: Why did you tag?

0% 20% 40% 60% 80% 100%

don't remember

to connect with others

so that I could find works again later

other (please specify)

to learn about art

to improve search for other users

for fun

to help museums document art work

Public

MMA

diversity  of  opinion  Independent  decentralized  aggregated    

James  Surowiecki  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

“the  wise  crowd”  

3  of  our  Crowdsourcing  Use  Cases  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

http://www.prestoprime.org/

Use  Case  1:    Crowdsourcing  Video  Tags  @Sound  and  Vision  

@waisda hNp://waisda.nl  

Two  Pilots  

Results  of  First  Pilot  

– The  first  6  months:  •  44.362  pageviews  •  12.279  visits  (3+  min  online)  •  555  registered  players  (thousands  anonymous  players!)  

– 340.551  tags  added  to  602  items  – 137.421  matches  

Results  of  First  Pilot  

11    PartcipaLng  Museums  1,782    Works  of  Art  in  the  Research    36,981  Tags  collected    2,017    Users  who  tagged    

First  two  years  (2006-­‐2008)  

Q: Why did you tag?

0% 20% 40% 60% 80% 100%

don't remember

to connect with others

so that I could find works again later

other (please specify)

to learn about art

to improve search for other users

for fun

to help museums document art work

Public

MMA

Tags  by  Documentalists  •  Tags  describe  mainly  short  segments  •  Tags  are  oeen  not  very  specific  •  Tags  not  describe  programmes  as  a  whole  •  User  tags  were  useful  &  specific  -­‐-­‐>  domain  dependent  

user vocabulary 8% in professional vocabulary 23% in Dutch lexicon 89% found on Google

locations (7%)

engeland

persons (31%) objects (57%)

On  the  Role  of  User-­‐Generated  Metadata  in  A/V  CollecCons  Riste  Gligorov  et  al.  KCAP  Int.  Conference  on  Knowledge  Capture  2011  

Crowd  vs.  Professionals  

Waisda?: Tags vs. Rest System MAP

All user tags 0.219

Consensus user tags only 0.143

NCRV tags 0.138 NCRV catalog 0.077

Captions 0.157

Captions + User tags 0.247

Captions + NCRV catalog 0.183

Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276

All tags better than consensus only •  Improvement of 53% •  Consensus tags have

•  higher precision: 0.59 vs. 0.49 •  but lower recall: 0.28 vs. 0.42

Waisda?: Tags vs. Rest System MAP

All user tags 0.219

Consensus user tags only 0.143

NCRV tags 0.138 NCRV catalog 0.077

Captions 0.157

Captions + User tags 0.247

Captions + NCRV catalog 0.183

Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276

All tags better than rest •  Individually

•  beat NCRV tags by 69% •  beat captions by 39%

Waisda?: Tags vs. Rest System MAP

All user tags 0.219

Consensus user tags only 0.143

NCRV tags 0.138 NCRV catalog 0.077

Captions 0.157

Captions + User tags 0.247

Captions + NCRV catalog 0.183

Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276

All tags better than rest •  Individually

•  beat NCRV tags by 69% •  beat captions by 39%

•  Combined •  Improvement of 5%

Waisda?: Tags vs. Rest System MAP

All user tags 0.219

Consensus user tags only 0.143

NCRV tags 0.138 NCRV catalog 0.077

Captions 0.157

Captions + User tags 0.247

Captions + NCRV catalog 0.183

Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276

All data performs best •  largely due to contribution of user tags – 33%

Waisda?: Tags vs. Rest System MAP

All user tags 0.219

Consensus user tags only 0.143

NCRV tags 0.138 NCRV catalog 0.077

Captions 0.157

Captions + User tags 0.247

Captions + NCRV catalog 0.183

Captions + NCRV tags 0.201 NCRV tags + User tags 0.263 NCRV tags + NCRV catalog 0.150 All – User tags 0.208 All 0.276

All tags better than consensus only •  Improvement of 53% •  Consensus tags have

•  higher precision: 0.59 vs. 0.49 •  but lower recall: 0.28 vs. 0.42

All tags better than rest •  Individually

•  beat NCRV tags by 69% •  beat captions by 39%

All data performs best •  largely due to contribution of user tags – 33%

•  Combined •  Improvement of 5%

Current  Pilot  

Accurator ask the right crowd, enrich your collection

hNp://annotate.accurator.nl  

Use  Case  2:    Crowdsourcing  &  Nichesourcing  

@Rijksmuseum  

Rijksmuseum Amsterdam collection over 1 million artworks

only a small fraction of about 8000 items are currently on display

… online collection grows 125.000 artworks already available

another 40.000 are added every year

expertise of museum professionals is in describing & annotating collection with art-historical information, e.g. when they were

created, by whom, etc.

detailed information about depicted objects, e.g. which species the animal or plant belongs to,

is in most cases not available

annotated only with “bird with blue head near branch with red leaf”

species of the bird and the plant are missing

by involving people from outside the museum in annotation process, we support

museum professionals in their annotation task

use crowdsourcing to get more annotations use nichesourcing, i.e. niches of people with the right expertise, to add more specific information

use sources like Twitter to find experts or groups of experts on certain areas, e.g. bird

lovers, ornithologists or people who enjoy bird-watching in their spare time

platform where users enter tags: (1) structured vocabulary terms or (2) free text

hNp://annotate.accurator.nl  

for tasks that are too difficult: game in which players can carry out an expert

annotation task with some assistance

to evaluate the correctness of annotations: reviewed & rated by other experts

BIRDWATCHING RIJKSMUSEUMSunday October 4, 10.00 am - 14.00 pmCuypers Library Rijksmuseum

On World Animal Day, the Rijksmuseum will host a birdwatching day in collaboration with Naturalis Biodiversity Center, Wikimedia Netherlands and the COMMIT/ SEALINCMedia project.

We are looking for bird watchers to join an expedi-tion through the digital collections and help the museums identify bird species in works of art.

dive.beeldengeluid.nl  

In  Digital  HermeneuDcs  

Use  Case  3:    Event-­‐centric  ExploraLon    

Sound  &  Vision  and  Royal  Library  

dive.beeldengeluid.nl  

3rd  Price  at  the  SemanLc  Web  Challenge  2014  

OPENIMAGES.EU  •  3000  videos    •  NL  InsLtute  for  Sound  &  Vision  •  mostly  news  broadcasts  

DELPHER.NL  •  1.5  Million  Scans  of  •  Radio  bulleLns    •  (hand  annotated)  •  1937  –  1984                                                                    

Simple  Event  Model  (SEM)  OpenAnnotaDon  (OA)  &  SKOS  

DIVE:MEDIA OBJECT  

SEM:EVENT  

SEM:PLACE  

SEM:TIME  

SEM:ACTOR  

SKOS:CONCEPT  

OA:ANNOTATION  

•  LINKS  TO  EUROPEANA  (MULTILINGUAL)  •  LINKS  TO  DBPEDIA    

Digital  Submarine  UI  

Infinity  of  ExploraDon  

Events  Linking  Objects  

Crowd  Bringing    the  Human  PerspecDves  

Linked  (Open)  Data  

EnDty  &  Event  ExtraDon  with  CrowdTruth.org  

ENTITY EXTRACTION

EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG

SEGMENTATION & KEYFRAMES

LINKING EVENTS AND CONCEPTS TO KEYFRAMES

Erp,  M.  van;  Oomen,  J.;  Segers,  R.;  Akker,  C.  van  de;  Aroyo,  L.;  Jacobs,  G.;  Legêne,  S;  Meij,  L.  van  der;O  ssenbruggen,  J.R.  van;  Schreiber,  G.  AutomaLc  Heritage  Metadata  Enrichment  with  Historic  Events  Museums  and  the  Web  2011  hlp://www.museumsandtheweb.com/mw2011/papers/automaLc_heritage_metadata_enrichment_with_hi  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

engaging users

through event

narratives

DIVE  implements  Digital  Hermeneu3cs  •  a  theory  of  interpretaCon  of  informaCon  •  bringing  people  and  technology  together  to  explore:  

–  how  to  model  and  represent  informaLon  –  how  to  provide  engaging  interacLon    –  how  to  support  interpretaLon  

           

“Digital  HermeneuCcs:  Agora  and  the  online  understanding  of  cultural  heritage”  In  proceedings  of  Web  Science  

Conference,  (ACM:  New  York,  2011)  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

Chiel  van  den  Akker,    Marieke  van  Erp,  Lora  Aroyo,  Ardjan  van  Nuland,  Lourens  van  der  Meij,  Susan  Lêgene,  and  Guus  Schreiber  (2013).  EvaluaDng  Cultural  Heritage  Access  on  the  Web:  From  InformaDon  Delivery  to  InterpretaDon  Support  

(WebSci’13)  

Informa3on:    Museums  &  Archives  as  Inventories  of  the  World  

André  Malraux,  The  Imaginary  Museum  of  World  Sculpture,  1953    

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

Interpreta3on:    Museums  &  Archives  as  a  Place  to  Engage  with  the  World  

Acknowledgements  

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

PrestoPrime  Team:  Lora  Aroyo,  Riste  Gligorov,  Lole  Belice  Baltussen,  Maarten  Brinkerink,  Johan  Oomen,  Jacco  van  Ossenbruggen,  Michiel  HIldebrand  

http://prestoprime.eu

SealincMedia  Team:  Alessandro  Bozzon,  Geert-­‐Jan  Houben,  Lora  Aroyo,  Lizzy  Jongma,  Guus  Schreiber,  Chris  Dijkshoorn,  Jasper  Oosterman,  Jacco  van  Ossenbruggen,  Archana  Nolamkandath,  

Myriam  Traub  

http://sealinc.ops.few.vu.nl/invenit/

DIVE  Team:  Victor  de  Boer,  Oana  Inel,  Lora  Aroyo,  Johan  Oomen,  Elco  Van  Staveren,  Werner  Helmich  &  Dennis  De  Beurs  

dive.beeldengeluid.nl

Agora  Team:  Lora  Aroyo,  Guus  Schreiber,  Lourens  van  der  Meij,  Marieke  van  Erp,  Chiel  van  den  Akker,  Susan  Legêne,  Geertje  Jacobs,  

Johan  Oomen  

agora.cs.vu.nl

CrowdTruth  Team:  Lora  Aroyo,  Chris  Welty    Robert-­‐Jan  Sips,  Carlos  MarDnez  OrDz,  Anca  Dumitrache,  Oana  Inel,  Benjamin  Timmermans,  Susanna  van  de  Ven,  Merel  van  Empel,  Jelle  v.d.  Ploeg,  TaLana  Cristea,  Khalid  Khamkham,  Harriële  Smook,  Rens  van  Honschooten,  Arne  Rutjes  

CrowdTruth.org github.com/CrowdTruth

Links  

On  the  Web •  http://waida.nl •  http://prestoprime.org •  http://agora.cs.vu.nl •  http://sealincmedia.wordpress.com •  http://dive.beeldengeluid.nl •  http://crowdtruth.org •  http://game.crowdtruth.org •  http://wm.cs.vu.nl

 

On  TwiNer  @waisda  @agora-­‐project  @sealincmedia  @prestocenter  @vistatv  #CrowdTruth  

 

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo

THANK YOU!

http://lora-aroyo.org ! http://slideshare.net/laroyo ! @laroyo