+ All Categories
Home > Documents > together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R)...

together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R)...

Date post: 30-Dec-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
17
Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure Twan Goosen 1 (CLARIN ERIC), Nuno Freire 2 , Clemens Neudecker 3 , Maria Eskevich 1 1 CLARIN ERIC; 2 Europeana / INESC-ID; 3 Berlin State Library/Europeana Newspapers Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017
Transcript
Page 1: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

BringingEuropeana andCLARINtogether:

Disseminationandexploitationofculturalheritagedata

inaresearchinfrastructureTwan Goosen1 (CLARINERIC),Nuno Freire2,ClemensNeudecker3,MariaEskevich1

1 CLARINERIC;2 Europeana /INESC-ID;3BerlinStateLibrary/Europeana Newspapers

DigitalInfrastructuresforresearch(DI4R)2017

Brussels,BE

30November2017

Page 2: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

Europeana insix bullets

• Europeana istheEuropeandigitalplatformforculturalheritagethat• seekstoenableuserstosearchandaccessknowledgeinallthelanguagesofEurope,eitherdirectlyviaitswebportals,orindirectlyviathird-partyapplicationsleveragingitsdataservice• Europeana enablespeopletoexplorethedigitalresourcesofEurope'sgalleries,museums,libraries,archivesandaudiovisualcollections• workingwithpartnersandalliestodevelopframeworks,standards,strategyandpolicyrelevanttodigitalculturalheritage,andtoraisefunds• providingdigitalexpertiseandplatformsforbringingculturalheritagetowideraudiences• championingtheuseofdigitised culturalheritageineducation,researchandthecreativeindustriesthroughpartnershipsandinternationalengagementcampaigns

2

Page 3: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

CLARINinseven bullets

• CLARINistheCommonLanguageResourcesandTechnologyInfrastructure

• ESFRI ERICstatussince2012,Landmarksince2016• thatprovideseasyandsustainableaccessforscholarsinthehumanitiesandsocialsciences andbeyond• todigitallanguagedata (inwritten,spoken,videoormultimodalform)• andadvancedtools todiscover,explore,exploit,annotate,analyse orcombinethem,wherevertheyarelocated• throughasinglesign-ononlineenvironment• andthatservesasanecosystemforknowledgesharing

3

Page 4: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

4

CLARINERICinmembersand centres

Aconsortium of:• 19members:AT,BG,CZ,DE,DK,DLU,EE,FI,GR,HU,IT,LT,LV,NL,NO,PL,PT,SE,SI• 2observers:FR,UK;• >40centres

Page 5: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

CLARIN&Europeana partnershipincontextofDSIDigitalServiceInfrastructure (DSI):Creationofacomplete,cohesiveandintegratedDigitalServiceInfrastructure• DSI(01.2015– 06.2016):

– EuropeanResearchDistributionPlan– AssessmentofrelevantdatasetsavailablefromTheEuropeanLibrary(TEL)

• DSI-2(07.2016– 08.2017):– Improvementofdataqualityandimplementationofqualityframeworksto

improvemetadataquality– IntegrationofEuropeana dataintoCLARINinfrastructure

• DSI-3(09.2017– 08.2018):– Fosteringcontentsupplybyoptimising Europeana dataandaggregation

infrastructure– Improving(meta-)dataandcontentquality– Fosteringreuseofdigitalculturalheritageresourcesbyimprovingcontent

distributionmechanisms– Maintainaninternationalinteroperablelicensingframework

5

Page 6: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

StepstowardsCLARIN&Europeana interoperability

1) IncorporateEuropeana metadataintheVLO

2) Openingupthefull-textEuropeana NewspapersresourcessuchasthosefromEuropeana NewspapersthroughCLARIN’sfederatedcontentsearchmechanism(FCS)

3) ExploitingCLARIN’scommunicationchannelstoincreasetheawarenessofEuropeana withinthecommunity

4) MeasureimpactofthedisseminationofEuropeana data

6

Page 7: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

Metadata:accesstoculturalheritage

• Aggregationandexploitationof(meta)dataaboutdigitised objectsfromverydifferentcontexts.• Europeana DataModel(EDM)asitsmodelforinteroperabilityofmetadata,inlinewiththevisionoflinkedopenvocabularies

7

• Aggregationofmetadatafromresourceproviders(CLARINcentresandselected“external”parties)• VirtualLanguageObservatory(VLO)providesauniformexperienceandconsistentworkflow.• LanguageResourceSwitchboard(LRS)allowsresearcherstoinvoketoolswiththeselectedresourcesdirectlyfromitsuserinterface.

Challenge:CLARINandEuropeana donotshareacommonmetadatamodel

Page 8: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

TheCLARINdataarchitecture:repositories

8

Repository at a CLARIN centre

Language Data Metadata Language

Tools

describes

single text or recording

!corpus

!lexicon

!wordnet

!grammar

!…

web application !

web service !

web service pipeline

!stand-alone application

!…

Page 9: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

TheCLARINdataarchitecture:harvesting

9

Language Data Metadata Language

ToolsLanguage

Data Metadata Language Tools

Harvested Metadata

Language Data Metadata Language

ToolsLanguage

Data Metadata Language Tools

copy

Page 10: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

TheCLARINdataarchitecture:processing

10

Page 11: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

TheCLARINdataarchitecture:contentsearch

11

Language Data Metadata Language

ToolsLanguage

Data Metadata Language Tools

(Federated) Content Search!!

(1) enter query !(4) show aggregated results

Language Data Metadata Language

ToolsLanguage

Data Metadata Language Tools

(2) perform local search

(3) retrieve results

Page 12: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

TheCLARINdataarchitecture:workflows

12

Language Data Metadata Language

ToolsLanguage

Data Metadata Language Tools

Web Service Pipelines!!

(1) select input data (2) construct pipeline (3) execute (4) use/analyse output data

Language Data Metadata Language

ToolsLanguage

Data Metadata Language Tools

Page 13: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

Interoperability iskey

• to the exhange ofmetadata• to the exchangeformatsfor the outputofanalytic tools• to the optionsfor supporting comparativeresearch

13

Page 14: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

CLARIN&Europeana Interoperaility highligths• CLARIN’singestionpipeline(OpenArchivesInitiativeProtocolforMetadataHarvesting(OAI-PMHprotocol))wasextendedtoretrieveasetofselectedcollectionsfromEuropeana andapplytheconversionintheprocess.

• Severalinfrastructurecomponentshadtobeadaptedtoaccommodatethesignificantincreaseintheamountofdatatobehandledandstored.– Currentstatus:

• 775Europeana datasets(e.g.Newspapers)nowfoundintheVLO• 10KaretechnicallysuitableforprocessingwiththeLRS

– Goal:• Morerecordsintheforeseeablefuture

14

Page 15: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

Metadataretrievalandconversion:OAI-PMHprotocol• Europeana:

– EDM-structuredEuropeana asRDF/XMLdocuments• CLARIN:

– HarvesterperformsconversionsbymeansofXSLTstylesheetsbyapplyingastylesheetthatconvertstheRDF/XMLdocumentsmetadatatoComponentMetadata(CMD)

– CreationofaCMDprofileforEDMintheCMDIComponentRegistry– implementationofanXSLTstylesheetthatproducesinstancesofthe

correspondingschemaonbasisoftheEDMrecords.– PropertiesaredefinedasCMDelementsintheorderthattheyappearinthe

EDMspecificationwhileobjectorderisbasedonrelevance.– Conceptlinksareassignedtomostcomponentsandelements.– Implementedconversionstylesheet:theheaderinformationandresource

proxies(entitiesrepresentingexternaldocuments)intheresultingrecordareproducedonthebasisofalistofstaticXPathsintheoriginaldocument.

– Therecord’spayloadisproducedmostlybymeansofastraightforwardcrosswalkwherethepropertiesinthedocumentaremappedtoCMDcomponentsorelementsofanequivalentname.

• Testharvestof11selectedmetadatasets:– Totalof3.2millionsuccessfullyretrievedandconverted,schemavalidrecords– Fullharvestandimportofthesizeofthissampletakesroughly48hours

15

Page 16: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

Processingpipelineissues

• GenerallackoftechnicalinformationavailableintheprovidedEDM(e.g.themediatypeforlinkedresources)

• Directlinkstomachineprocessable resourcesarecommonlymissing

• LimitedfunctionalityprovidedbythetoolsthatareconnectedtotheLRS(e.g.languagesvariability,resourcetypes,accessibility)

16

Page 17: together: Dissemination and exploitation of cultural ...Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017. Europeanain sixbullets •Europeanais the European

Getintouch

[email protected]

https://www.europeana.euhttps://pro.europeana.euhttps://pro.europeana.eu/project/europeana-dsi-3

17


Recommended