+ All Categories
Home > Documents > Metadata in Research Information Introduction: “Tour de table” Ed Simons, President of euroCRIS

Metadata in Research Information Introduction: “Tour de table” Ed Simons, President of euroCRIS

Date post: 12-Jan-2016
Category:
Upload: vevay
View: 13 times
Download: 0 times
Share this document with a friend
Description:
Metadata in Research Information Introduction: “Tour de table” Ed Simons, President of euroCRIS. Structure of the Presentation. Introduction of the Speaker “Tour de Table”: scope of this introduction. Nature and importance of research metadata. - PowerPoint PPT Presentation
60
Metadata in Research Information Introduction: “Tour de table” Ed Simons, President of euroCRIS
Transcript
Page 1: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Metadata in Research Information

Introduction: “Tour de table”

Ed Simons, President of euroCRIS

Page 2: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Structure of the Presentation

• Introduction of the Speaker

• “Tour de Table”: scope of this introduction.

• Nature and importance of research metadata.

• Some challenges to meet regarding the realization of optimal solutions for Research Information Metadata.

• Conclusions.

Page 3: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Introduction of the Speaker

• Working at Radboud University, NL: Central “Concern Information Management” Department.

• Former Head of this Department and since some years working as “International IT-project Manager” (especially IT-projects in developing countries).

• Initiator and project leader Dutch Research Information System (CRIS) “METIS”.• Development of Dutch CRIS already started in 1992 with an interuniversity

task group defining a datamodel for research information called “Combi-format”, a kind CERIF “avant-la-lettre”.

• First version of METIS implemented in 1993: one of the first fully-fledged CRIS systems in Europe.

Page 4: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Radboud University, Nijmegen, NL.

Page 5: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Radboud University, Nijmegen, NL

Nijmegen

• Oldest city in the Netherlands: celebrated its 2000th anniversary in 2005.

• Regiment city of the Roman Empire, located in the East of the country on the banks of the Rhine, - which marked the Northern border of the Roman Empire - where the river enters the Netherlands from Germany.

• Home of the biggest walking event – and since 2012 officially even the biggest sports event - in the World: the “4-days Marches” (held every year in July) with some 45.000 participants from allover the World.

Page 6: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Radboud University, Nijmegen, NL

• Founded in 1923 as “Catholic University Nijmegen”, name changed some 10 years ago.

• About 19.000 students and 10.000 staff (5.000 f.te.) of which about half academic.

• Middle-sized university according to Dutch standards: 5th or 6th out of 13 universities.

• All faculties, including an academic hospital, except engineering.

• Strong in research:• Brain research.• Physics: home of a “High Field Magnet”, one of the most powerful in the world,

which serves as a national research equipment facility.• (Psycho)-linguistics: the Max Planck Institute for Psycholinguistics, a prominent

player in the RDA-community, is located on campus.

Page 7: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Radboud University

(University of Manchester)Both worked at Radboud University for some years, where they started the research that led to the development of the material “graphene” for which they received the Nobel Prize. Novoselov got his Ph.D. from Radboud and Geim is still a Visiting Professor.

Page 8: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

“Tour de Table”: Scope of the introduction

Presenting a “Tour de Table” involves talking about:

• What is on the table• Who is sitting around the table.

Applied to our subject of “Research Metadata”:

• What issues are at stake concerning Research Metadata.• Who are the key players involved.

Focus will be on the first aspect, key players involved will come up “automatically” while dealing with these core issues.

Page 9: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

“Tour de Table”• A “Tour de Table’ by its nature can only be global or general. To use the food metaphor:

Deals with the dishes on the table, but does not go into a detailed treatment of the ingredients of the various dishes or the cooking technique to produce the food.

• So the presentation will be one from a “Bird’s eye” view, and more particularly of course the view of a “euroCRIS bird”.

• Focus not so much on technology, but more on non-technical aspects and issues that are of importance when it comes to the realization of optimal solutions concerning Research Metadata.

• (obviously) Focus on “domain-related” metadata so metadata about the subject matter (research information) itself as distinguished from domain-agnostic, formal metadata (adiminstrative, technical metadata, rigths metadata..).

• Guiding or underlying question: what major challenges exist in the Research Information Domain, that should be dealt with in order to be able to create optimal, sustainable solutions regarding research metadata?

Page 10: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Why is Research Metadata important?

As subject for this euroCRIS Seminar?

Research Metadata is

“The Bread we Bake”

The development of a research metadata model (CERIF) is a core activity of euroCRIS and at the heart of our Mission.

Page 11: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Why is Research Metadata important?

As such for (the field of) Research Information?

“Carefully crafted metadata results in the best information management —and the best end-user access— in both the short and

the long term.”

“Quality metadata creation is just as important as the care, preservation, display, and dissemination of collections; adequate planning and resources must be devoted to this ongoing, mission-

critical activity”

From: Tony Gill, et.al., Introduction to Metadata, Paul Getty Institute: http://www.getty.edu/research/publications/electronic_publications/intrometadata/index.html

Page 12: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

What are Metadata?

“Data about data”

In a way an appealing definition:• Short and simple to remember• Basically says “it all” what Metadata is about• Can be read in both directions• Complies to the (popular) tripple structure.

But requires already a certain familiarity with the matter to really grasp the full meaning.

Page 13: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

What are Metadata?

Various definitions exist of (domain-related) meatadata, such as:

“Metadata is structured data which describes the characteristics of a resource.”An Introduction to Metadata by Chris Taylor, University of Queensland

“Metadata are structured, encoded data that describe characteristics of information bearing entities to aid in the identification, discovery, assessment, and management

of the described entities”Ma, J. Managing metadata for digital projects

“Metadata are data identifying, describing or characterizing information objects or resources, and their relations, aimed at supporting discovery, access to and use of

these information objects/resources”Ed Simons

Page 14: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

What are Metadata?

From a more pragmatic point of view:

• Research metada: “Data about research”, more specified:

• Data about objects or resources, and their relations, in the research information domain”

Research Information objects/resources:• Individual Actors (persons): researchers, managers, policy makers, auditors…• Organizational units: research institutions, funding organisations, publishers, …• Activities: projects ... • Input: f.t.e, money• Output: publications, patents, other products• Equipment used• Services used / produced• Datasets used / produced• Metrics and indicators• ….

So research metadata are data about these information objects/resources (that identify, describe, characterize) and the relations between the objects/resources.

Page 15: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

What are Metadata for?

“metadata results in the best information management —and the best end-user access”

“… aid in the identification, discovery, assessment, and management…”

“... aimed at supporting discovery, access to and use...”

Page 16: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

What are Metadata for?

So metadata are of help or necessary for:

•Finding and (the ablility to) access or obtain research information. This includes the aspects:• Discovery of research information objects/resources. • Provenance, administrative and technical data a (e.g. format or type) of the information object.• Conditions of (re)-use • User rights and security.• ...

•The Management and use of research information (use cases). This regards the aspects:• Research policy (formulation, execution and evaluation, on various levels)• Planning of research• Management of research (steering, monitoring)• Performance measurement • Impact measurement• Presentation and communication of (information on) research (to/by various stakeholders)• ...

“Carefully crafted metadata” are needed for an optimal implementation or execution of these aspects .

Page 17: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Metadata requirements.

I n order to optimally support the activities mentioned, metadata must be:

• Complete• Correct• Up to date• Accurate• Unambiguous• Detailed (enough)• Reliable• Secure• Sustainable

Page 18: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Types (typologies) of Metadata.

Various typologies of classifications of Metadata exists, e.g. the distinction:

• Descriptive or content-related metadata: describing/characterizing the intellectual content.• Administrative, Technical metadata: e.g. file formats.• Rights metadata: regulating authorization, permissions.• Provenance metadata: creation, subsequent versioning or treatment.• Structural metadata: e.g. internal structure of items, page order…• Context metadata: relating information objects to their “environment” or context. E.g. Projects to

funders, institutions; Publications to authors, publishers, reviewers, etc…• Usage metadata: information about the use of the information object (nr. of downloads. Requests,..)

Another one (by Keith Jeffery):• Schema metadata: controlling the integrity of the described data• Navigational metadata: the access path to the data• Associative metadata:

• Descriptive: content-related, context• Restrictive: rights, authorization• Supportive: conditions of use, constraints, …

Page 19: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Metadata models and formats

In order to be put to use in information systems or applications, metadata are described and organized in metadata models. A working definition of a metadata model could be:

A structured set of concepts that define the information objects in a given business domain, their identification, properties and relations as well as their meaning within the context of the model itself, including possible constraints that may exist regarding (values, use of) elements of the model.

So a metadata model concerns the information objects themselves as well as their metadata as such. E.g. the concept “Person” (information object) is part of a metadata model, but in itself is no metadata.

It further defines the information objects in terms of the model e.g. : • “A Person is an individual actor within the research domain”.• “Person is a first level entity” in the CERIF model.

Page 20: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Metadata models and formats

A genuine metadata model is structured, meaning it has a certain architecture involving the relations between the elements of the model (analoguous e.g. to a “domain model” in software development).

A metadata model may be implemented in a system (e.g. a relational database) and expressed in (one or more) formats (e.g. CERIF-XML).

Examples: CERIF, DCMES, MODS, MARC21, VIVO-ontology, etc…

Page 21: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The CERIF Metadata Model

Page 22: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The CERIF Metadata Model

• Broad coverage (aspects) of research information: metadata on researchers, projects, organizations, output (publications and other products), input (f.t.e, money), funding, equipment, services, metrics, impact, dataset metadata (as from version 1.6) and the (inter)relations between these elements.

• Detailed: highly normalized.

• Well thought-out architecture based on an optimal use of the relational model and with as a basic principle: expressing properties and semantics of the information objects and their relations by means of time-stamped links (linking entities) instead of as attributes of the entities. This makes the model extremely flexible and scalable, since any number of links can exist between information objects in the model.

Page 23: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The CERIF Metadata Model

Some examples:• A researcher can have various roles at the same time in a project or

affiliations to an organisational unit. Even various roles from various typologies or in various languages.

• Any number of classifications can be used for the same publication (various controlled vocabularies).

• The same principle of linking entities can be used to map controlled vocabularies to one another.

• And: various levels of granularity can be expressed and registered for the same kind of metadata. E.g. the role of a researcher concerning a publication can both be expressed according to the low-grained DC (creator, contributor) or by means of a more fine-grained classification (1st author, author, editor, editor-in-chief, reviewer, executer of the experiment, etc...).

Page 24: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The CERIF Metadata Model

Person

OrganisationUnit

Project

ResultPublication

Person_ResultPublication

Person_Project

OrganisationUnit_ResultPublication

Project_ResultPublication

Project_OrganisationUnit

Person_OrganisationUnitPersonPerson

OrganisationUnitOrganisationUnit

ProjectProject

ResultPublicationResultPublication

Person_ResultPublication

Person_Project

OrganisationUnit_ResultPublication

Project_ResultPublication

Project_OrganisationUnit

Person_OrganisationUnit

role=authorrole=author1role=reviewerrole=... ?

role=coordinatorrole=manager

role=CEOrole=researcherrole=project-manager

role=deliverable1.2role=journal articlerole=public report

role=author1-instituterole=editorrole=... ?

role=funderrole=investigatorrole=member

Page 25: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The CERIF Metadata Model

The broad coverage and appropriate architeture, just mentioned, make CERIF-CRIS a powerful interoperability instrument or “engine” and a perfect candidate as “one stop registration store” of research metadata.

Theoretically one could say that “all you need” is registration of your research metadata in a CERIF-CRIS.

But practice is different: CERIF is not alone in the world and still is unknown to a lot of stakeholders in the international research information domain.

And various other valuable applications and developments exist that “will be there to stay” and with which the CERIF-CRIS community has to live together.

This brings us to the challenges ahead of us on our way to more optimal solutions concerning research information and its metadata.

Page 26: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Challenges to reach metadata “Nirvana”

• The “Tower of Babel” syndrome• A multi-cultural world

• A tale of two ecosystems• Transatlantic differences• The “Success of the Web Paradox”

• The human factor• The imperfect being• It’s all about time and money• The imperfect organization

• The (big) data deluge

Page 27: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Challenges to reach metadata “Nirvana”

• The “Tower of Babel” syndrome• A multi-cultural world

• A tale of two ecosystems• Transatlantic differences• The “Success of the Web Paradox”

• The human factor• The imperfect being• It’s all about time and money• The imperfect organization

• The (big) data deluge

Page 28: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The Tower of Babel Syndrome

The Tower of Babel.

• A Biblical story (Genesis) about the survivors of the Great Flood who wanted to build a Tower that would (by means of which one could) reach heaven.

• This angered God, who confounded their languages so that they could not understand and communicate with each other anymore. As a result their project collapsed and the tower was left unfinished.

Two “all time lessons” to be drawn from this:• It’s difficult to reach optimal solutions if you “speak” in different languages,

formats, models, etc…• Communication and cooperation between various parties or stakeholders

involved is needed for an optimal result.

Page 29: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The Tower of Babel Syndrome

The Tower of Babel metaphor may well be applicable to the field of research information metadata.

Strolling around for a while in the research metadata domain, you may encounter the following kind of experience.

Page 30: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The Tower of Babel Syndrome

DCMES

MODS

CERIF

PRISMDA|RA

MARC21

TEI

CKAN

DCAT

EGMS

DDI

PREMIS

VIVO

CIF

NEXUSMAGEGBIF

GILS

Darwin Core

journalpublishing3

ROADS/IAFASOIF

FGDCIAD ISAD(

G) SPECTRUM

ITISHIVE

LCNAF

LCS

MESH

NBIITGN

UBio

INSPIRE

A plethora of metadata models and formats exists within the research information domain, both concerning “generic” aspects (i.e. metadata applicable to all disciplines) as well as “discipline- or subject-specific” metadata (controlled vocabularies that hold content- or aspect-specific classifications related to a given scientific discipline or research subject e.g. the MeSH-classification for Medical Sciences).

Page 31: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The Tower of Babel Syndrome

According to the Tower of Babel Syndrome it seems that we are doomed and will never reach research information “Nirvana”.

However, things have changed in the 3.000 years since the Tower of Babel was built and humanity has made some progress. The “Babel Builders” did not have the Web nor did they have automatic translation facilities. But we have today. In other words: we now have tools to realize interoperability between the various models and formats in order to solve our language problem.

So there’s a first challenge for euroCRIS: to create crosswalks between CERIF and other metadata models or formats existing within the research information domain.

Page 32: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The Tower of Babel SyndromeeuroCRIS has taken up this challenge:• Agreement on and creation of a CERIF-OpenAire interoperability solution,

based on CERIF-XML in cooperation with the OpenAire community.• Realization of a mapping CERIF-VIVO, again a joint project between the two

organisations. The first version of this mapping is now ready for endorsement by the Boards of both organizations.

• Within the EU-financed project “ENGAGE” aimed at making public governmental data available on the web, in which euroCRIS is a partner, a crosswalk has been created between (metadata elements of) CERIF on the one hand and CKAN and DCAT on the other.

• Within the C4D (CERIF for Data) project, a first mapping has been done between CERIF and INSPIRE (EU-project: Infrastructure for Spatial Information in the European Community).

• A project has been started up in cooperation with Elsevier to “CERIF-y” the Snowball Metrics Metadata (Project of UK-universities and Elsevier to develop a set of benchmark metrics for institution’s research performance).

Page 33: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The Tower of Babel Syndrome

• The Tower of Babel metaphor not only points to the necessity of creating interoperability solutions (translations of concepts) but also to the more Business-related aspect of the need for international standard definitions of key Business Objects and aspects in the research information domain. It is not of much use to match terms if the content of the terms does not match.

• In this respect euroCRIS is working closely together with CASRAI, the Canadian based “Consortia Advancing Standards in Research Administration Information”, which is (a.o.t.) developing a standard dictionary of research information concepts.

Page 34: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The Tower of Babel Syndrome

An aspect that may not remain unmentioned here is that the plethora of different formats, often related to disciplinary differences or boundaries between scientific disciplines, not only causes problems for those active in the research information domain, but may well hamper research itself, especially in a time when research more and more becomes multi-disciplinary:

“The proliferation of discipline-specific metadata schemes contributes to artificial barriers that can impede interdisciplinary and transdisciplinary research.

….These barriers, frequently associated with metadata semantics and data structures, interfere with scientific progress along multidisciplinary, interdisciplinary, and trans-disciplinary lines. On the whole, the barriers can interfere with progress supporting

our contemporary understanding of science.”

(Willis, Greenberg, White, Analysis and Synthesis of Metadata Goals for Scientific Data, preprint)

Page 35: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Challenges to reach metadata “Nirvana”

• The “Tower of Babel” syndrome• A multi-cultural world

• A tale of two ecosystems• Transatlantic differences• The “Success of the Web Paradox”

• The human factor• The imperfect being• It’s all about time and money• The imperfect organization

• The (big) data deluge

Page 36: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

A tale of two ecosystems

Within the research information domain, two major “ecosystems” exist with their own culture, tradition, visions on and approaches towards research information and research information metadata.

These are:• (research) Administrative ecosystem• Library ecosystem.

Both ecosystems up to now often (still) behave like “silos” without much communication, sharing of visions or cooperation and this on all levels, whether local (within an institution), national or international.

The table on the next slide shows a comparison of both ecosystems on some significant aspects:

Page 37: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

A tale of two ecosystems

 Aspects Administrative  ecosystem Library ecosystemDealing with research information metadata since

Fairly recent, from 1980’s on For a long time already (cfr. the library Catalogue Card as the “iconic” metadata record)

Major tasks related to research information

Research policy and planning, management, assessment, performance measurement.

Making research results (publications) available, archiving of research information.

Scope of metadata Quite broad: covering almost all aspects of research.

Specifically focused on output (publications) metadata.

Major type of metadata Contextual Discovery and descriptive.Major technical solution Relational Database Systems, based

on a structured metadata model (CERIF, or local)

Repositories (Web technologies – OAI-PMH, RDF) based on a (rather) flat metadata model (DC, MODS,…)

Proximity to the research(ers) community

Distant. Often even a reluctant if not averse attitude by the research community: considered to be superfluous and mere bureaucratic.

Close. Open minded attitude by the research community: library is seen as a valuable and obvious partner in education and research.

Page 38: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

A tale of two ecosystems

Major challenge: to integrate or harmonize the two communities in the search for optimal solutions in research information.

This can be promoted by:• Adapting an open mind towards each others expertise, experience and

solutions. • Being present at each others events and organizing joint events.• Starting up joint projects.• Formalize the exchange of information (newsletters, announcements)• Organizational integration of both communities (departments) on an

institutional (university), national (e.g SURF in NL) and international level (joint, coordinating structure: to be created).

Page 39: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

A tale of two ecosystems euroCRIS has picked up this challenge, e.g. by:• As early as 2004 (euroCRIS Conference in Antwerp), inviting the library/

repository community to euroCRIS events.• Cooperation within the framework of the CERIF-OpenAire interoperability

format.• Inviting (very recently) a well-known expert from the repository community to

join the eruoCRIS Board (which he – luckily - has accepted from this month on).

But the progress made in this respect is a two-way street, e.g.:• The CERIF-OpenAire project clearly was a joint initiative.• The Italian Consortium of Universities (CINECA), in cooperation with the

University of Hongkong has worked out CERIF-compatible metadata extensions for the Repository software package DSPACE.

So it is good to see that the two communities are more and more aware of each others existence and value and are growing towards and learning from each other.

Page 40: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

A tale of two ecosystems

Page 41: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

A tale of two ecosystems

Some challenges that remain:CERIF and the added value it holds, still could be more promoted and made known in the library ecosystem.

The still existing distant and reluctant attitude of the research community towards the “administrative ecosystem” still in a way hampers the acceptance and introduction of CERIF outside this ecosystem.

(Aside: in this respect, and with a bit of exaggeration one could say that maybe the best model (CERIF) – sadly enough - has been developed in the wrong ecosystem.)

So, some work needs to be done to correct the negative image that exist of CRIS within the research community. For this it is good to have an understanding of what causes the researcher’s discontent.

Page 42: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

A tale of two ecosystems

In my view, the following 3 reasons are the main causes for the distance and discontent of the researchers towards the administrative ecosystem and its systems:

• Administration and its organizational structures is considered to be of a “lower status” and a “necessary evil”. Researchers want to do research and not be bothered by administrators.

• Registration of research information (metadata) in an administrative information system (CRIS) is boaring, and creates a lot of overhead at the expense of valuable research time lost.

• Various organizations ask the same information in different formats with different definitions and at different moments.

Challenges: to remove these “problems”. The first one is psychologic and will only be removed if (at least) the other two have been solved.

Page 43: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

A tale of two ecosystems Registration of research information (metadata) in an administrative information system (CRIS) is boaring, and creates a lot of overhead at the expense of valuable research time lost. • Already to a large extent dealt with by the CRIS-developers: most modern CRIS include

automated harvesting of metadata already existing in other resources (local: HRM, projectmanagement systems, as well as international: WoS, Scopus, MedLine, etc...). This diminishes the time to invest by researchers dramatically. Challenge: optimally inform the researchers that these services exist as part of a CRIS.

Various organizations ask the same information in different (metadata) formats, with different definitions for the same concepts and at different moments.

• This requires agreements and streamlining of information request processes between the organizations involved on a supra-local (national) level. Differences between countries in level of achievement. Challenge for euroCRIS (and others): to advocate the need for coordination and agreement; to knock on the doors of the organizations it concerns, to give advice and show best practices.

Page 44: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Challenges to reach metadata “Nirvana”

• The “Tower of Babel” syndrome• A multi-cultural world

• A tale of two ecosystems• Transatlantic differences• The “Success of the Web Paradox”

• The human factor• The imperfect being• It’s all about time and money• The imperfect organization

• The (big) data deluge

Page 45: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Transatlantic differences• A variation on the previous theme. Developments in IT and Web technology are

traditionally dominated by US-based organizations and enterprises, as a result of which IT applications in a given domain to a certain extent reflect the US organization and culture of the domain in question.

• Applied to research information: US universities and research institutions, compared to Europe, have different ways of e.g. financing, organizing and evaluating research and thus have different views on applications to support these processes and the metadata involved, compared to Europe.

• This may also hamper the implementation of global, standard solutions in the research information domain or lead to the implementation of “domain-incongruous” applications on both sides of the Atlantic.

• In this respect, I can mention a concrete example from within our university, in the student information domain.

Page 46: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Challenges to reach metadata “Nirvana”

• The “Tower of Babel” syndrome• A multi-cultural world

• A tale of two ecosystems• Transatlantic differences• The “Success of the Web Paradox”

• The human factor• The imperfect being• It’s all about time and money• The imperfect organization

• The (big) data deluge

Page 47: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The “Success of the Web” paradox.

The success of the World Wide Web may in some way rather hamper instead of promote the implementation of optimal solutions in the research information domain.

• The (no-doubt justified) enthusiasm among important stakeholders in the research information domain (e.g. research community, politicians) about possibilities of the Web and its technologies as promoted by Web Science and W3C communities may hold the danger of an uncritical identification of the “Web technology solution” as “the best possible solution”.

• This “overwhelming belief in the Web” may cause a certain blindness for (the integration of) other technologies, and hence may lead to sub-optimal solutions. Significant and illustrative in this respect is the following remark: “However, in the open data process very little attention is paid to metadata.” (Jeffery, Zuiderwijk, Janssen, The potential of metadata for linked open data and its value for users and publishers)

Challenge: continue to advocate and demonstrate the (added) value of RDBMS-technology and its implementation in the CERIF model and CERIF-CRIS.

Page 48: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Challenges to reach metadata “Nirvana”

• The “Tower of Babel” syndrome• A multi-cultural world

• A tale of two ecosystems• Transatlantic differences• The “Success of the Web Paradox”

• The human factor• The imperfect being• It’s all about time and money• The imperfect organization

• The (big) data deluge

Page 49: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The human factor: the imperfect being• Humans are not perfect: they forget, make errors, lack discipline or even just lie!

• This may result in substantial problems regarding metadata, since whenever “bio-curation” is applied (metadata registration by humans - as is still the case in the majority of

cases), the above aspects come into play and may lead to incomplete, not in time, not up to date, incorrect or unreliable metadata.

• Example from the Dutch situation (but probably a universal phenomenon): the supply of metadata by and within the universities, needed for the yearly research evaluation process by the government is often only done at the very last moment.

• Challenge: to develop and perfect automated metadata creation and registration.

Page 50: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Challenges to reach metadata “Nirvana”

• The “Tower of Babel” syndrome• A multi-cultural world

• A tale of two ecosystems• Transatlantic differences• The “Success of the Web Paradox”

• The human factor• The imperfect being• It’s all about time and money• The imperfect organization

• The (big) data deluge

Page 51: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The human factor: it’s all about time and money

• Metadata creation and registration can be very time-consuming (bio-curation, see previous)

• And (therefore) may cost a lot of money.

(Estimates have been made that up to 60% of time invested in typical workflows for data deposition is devoted to metadata creation).

Both resources (time and money) may not always be available enough or in time and may be cut down in time of crisis, leading to the possible danger of incomplete or unbalanced metadata in time (from one period to another).

Page 52: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Challenges

• The “Tower of Babel” syndrome• A multi-cultural world

• A tale of two ecosystems• Transatlantic differences• The “Success of the Web Paradox”

• The human factor• The imperfect being• It’s all about time and money• The imperfect organization

• The (big) data deluge

Page 53: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The human factor: the imperfect organization

Humans involved in the creation and supply of metadata work within the framework of organizations. The scope or level to which the “imperfections” of these individuals, mentioned before, may be manifested, highly depend on the existence and quality within the organization of a policy, and resulting regulations, on how to deal with research metadata. Unfortunately such a coherent policy is often non-existent. Nevertheless:

“Quality metadata creation is just as important as the care, preservation, display, and dissemination of collections; adequate planning and resources must be devoted to this

ongoing, mission-critical activity”

Challenge: to make institutions aware of the importance of a sound metadata policy and its implementation.

Page 54: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The human factor: the imperfect organization

• Metadata creation is one of the core activities of organizations dealing with research information (research institutions, universities, libraries,…), therefore the formulation of a metadata policy and strategy should be a priority within these organizations

• A high-level understanding of the importance of metadata by upper management is essential for the successful implementation of a metadata strategy.

• Metadata rules and processes must be enforced in all appropriate units of an institution.

• Adequate, carefully thought-out staffing levels including appropriate skill sets are essential for the successful implementation of a cohesive, comprehensive metadata strategy.

• Institutions must streamline metadata production and replace manual methods of metadata creation with “industrial” (automated) production methods wherever possible and appropriate.

Page 55: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Challenges to reach metadata “Nirvana”

• The “Tower of Babel” syndrome• A multi-cultural world

• A tale of two ecosystems• Transatlantic differences• The “Success of the Web Paradox”

• The human factor• The imperfect being• It’s all about time and money• The imperfect organization

• The (big) data deluge

Page 56: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The (big) data deluge

• Nowadays, and certainly in the future, data may stream in so fast or in such big volumes, that it is impossible to keep up with by metadata creation processes, even in the case of automated metadata generation.

• Related to this: how to appropriately capture and deal with the scientific communication among researchers taking place through the “social media”.

• Metadata will always be necessary to e.g. describe datasets and their re-use conditions or publications published “traditionally” But what about “new realities” in the research information domain, such as: (the discovery and description of) dynamic virtual networks of researchers and the associated science production processes? Can these also be managed by “traditional” metadata solutions or will they be better discovered and described by “big data” solutions?

Page 57: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

The (big) data deluge

A core question here is:

Do we still need pre-structuring of research information data (by means of metadata schemes) in (relational) databases or can the emerging “big data” technologies (e.g. the so-called NoSQL databases – like Casandra and HBase – combined with powerful distributed data processing techniques, like Google’s MapReduce and the open source implementation of this technology “Apache Hardoop”) do the job properly in the future?

Or does the ideal solution lie in a combination of the “best of both worlds”: the bottom-up crunching of massive data volumes on the one side with the top-down approach of a pre-structured set of metadata with a corresponding controlled volume of instances?

Challenge: to find adequate answers to these questions and implement them in workable technology models and applications.

Page 58: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Conclusions: general

• Research metadata take a core position within the research information business domain. They are the keys to successful discovery, access, and (re-)use of research information and data. Substantial challenges exist in order to realize optimal solutions for the creation, management and use of research metadata. Several of these challenges have to do with optimal communication, cooperation and harmonization of visions and approaches between key stake holder communities in the research information domain as well as the interoperability of their models, applications and definitions.

• To meet these challenges is not only a duty, but more so a responsibility: notably the responsibility that we as key players in the research information domain have to use our expertise, knowledge and experience to, on an international level, create the best possible conditions for research.

Page 59: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Conclusions: practical• Technological:

• Further development and bringing to perfection of applications for automated metadata creation and registration.

• Creation of interoperability solutions (crosswalks) between the various metadata models and formats in the research information domain and in this respect the advocacy of CERIF-CRIS as the “universal” interoperability instrument.

• Integration/combination of CRIS-based technologies with web technologies (open data, semantic web) and big data technologies.

• Organizational/business-related:• Standardization of definitions of the business (metadata) concepts in the RI domain, as well

as standardization of metadata profiles for various use cases.• Formalizing of relations and communication between key stakeholder organizations on an

international level and starting up/participating in joint projects.• Awareness-raising and promotion of insights concerning the importance and value of

research metadata and information, the existing technologies to deal with metadata, as well as the strategies and procedures for optimal registration and maintenance of metadata , to the relevant decision-makers on a local, national and international level.

Page 60: Metadata in Research Information  Introduction: “Tour de table” Ed Simons, President of euroCRIS

Conclusions

I wish you an interesting and inspiring seminar.

Thank you for your attention!


Recommended