D4.1 – Requirements methodology
31 AUG 2015 OpenMinTeD
Open Mining INfrastructure for TExt and Data
Deliverable Code: D4.1
Version: 1 – Final
Dissemination level: PUBLIC
This deliverable presents the plan of design and development of the OpenMinTeD infrastructure services for the next 12 months. At month 12 and 24 an update of this deliverable will be produced. Its intended use is mainly for technical partners to locate their software release duties in the wider context of the project software release, but also for the generic reader to get an overall picture and insight view of the technical activities.
H2020-EINFRA-2014-2015/H2020-EINFRA-2014-2
Topic: EINFRA-1-2014
Managing, preserving and computing with big research data
Research & Innovation action Grant Agreement 654021
Ref. Ares(2015)3782098 - 13/09/2015
PUBLIC
D4.1 - Requirements methodology Page 1
Document Description
D4.1 – Requirements methodology
WP4 - Community Driven Requirements and Evaluation
WP participating organizations: ARC, UNIMAN, UKP-TUDA, INRA, EMBL-EBI, AK, LIBER, UvA, OU, EPFL, CNIO, USFD, GESIS, GRNET, Frontiers
Contractual Delivery Date: 07/2015 Actual Delivery Date: 08/2015
Nature: Report Version: 1 Final
Public Deliverable
Preparation Slip
Name Organisation Date
From Nikolaos Marianos Charalampos Thanopoulos Vassilis Protonotarios Effie Tsiflidou Thodoris Kontogiannis
AK AK AK AK AK
24/07/2015
Edited by Nikolaos Marianos AK 24/07/2015
Reviewed by Sophie AubinRichard Eckart de Castilho Peter Mutschke
INRA UKP-TDA
GESIS
19/08/2015
Approved by Mike Chatzopoulos ARC 11/09/2015
For delivery Mike Chatzopoulos ARC 11/09/2015
Document Change Record
Issue Item Reason for Change Authors Organization
V0.1 Draft version ToC, Tools and guidelines for partners
Charalampos Thanopoulos, Vassilis Protonarios
AK
V0.5 Draft version Description of methodology Nikolaos Marianos Charalampos Thanopoulos Vassilis
AK
PUBLIC
Page 2 D4.1 - Requirements methodology
Protonarios Effie Tsiflidou
Thodoris Kontogiannis
V0.9 First Delivery First full version for review Nikolaos Marianos Charalampos Thanopoulos
AK
V1.0 Final version Final version addressing comments from peer-review
Nikolaos Marianos Vassilis Protonarios
AK
PUBLIC
D4.1 - Requirements methodology Page 3
Table of Contents
1| CONTEXT 11
1.1 BACKGROUND AND CONTEXT 11 1.2 USE CASES TO BE STUDIED BY THE PROJECT 12 1.2.1 SCHOLARLY COMMUNICATION 13 1.2.2 LIFE SCIENCES 14 1.2.3 AGRICULTURE / BIODIVERSITY 15 1.2.4 SOCIAL SCIENCES 17 1.2.5 SUMMARY OF INITIAL USE CASES ANALYSIS 17 1.3 OVERVIEW OF PROJECT STAKEHOLDERS 19 1.4 KEY MILESTONES OF REQUIREMENTS ANALYSIS 21
2| DESIRED OUTCOMES 23
2.1 PROFILING USER PERSONAS 23 2.2 CONTENT ANALYSIS 26 2.3 INFORMATION-RELATED CHALLENGES AND PROBLEMS 29 2.4 DESCRIPTION OF INFORMATION SERVICES & SYSTEMS 31 2.5 DESCRIPTION OF CURRENT USAGE SCENARIOS OF INFORMATION SERVICES & SYSTEMS 33 2.6 DESIGN INTERFACE WIREFRAMES 33 2.7 DESCRIPTION OF USER-ENVISAGED USAGE WORKFLOWS 34 2.8 LIST RESULTING REQUIREMENTS FOR TDM-POWERED OUTCOMES SERVICES 36
3| METHODOLOGY 39
3.1 INITIAL PROFILING OF TARGETED PERSONAS 40 3.1.1 USING ONLINE QUESTIONNAIRES 40 3.1.2 INTERVIEWS 42 3.1.3 USING THE PERSONA GRAPH 43 3.2 VALIDATION OF USERS’ REQUIREMENTS 44 3.3 DESIGNING FINAL WIREFRAMES WITH REQUIRED FEATURES FOR THE INFORMATION SERVICES OF THE USE CASES 44 3.4 TRANSLATING FEATURES INTO REQUIREMENTS FOR TEXT MINING SERVICES AND E-INFRA 45
4| TOOLS AND TEMPLATES 46
4.1 GENERIC QUESTIONNAIRE FOR USER & CONTENT PROFILING 46 4.2 EXAMPLE OF CUSTOMIZED QUESTIONNAIRE FOR AGRICULTURAL PERSONAS 47 4.3 GUIDELINES FOR THE ORGANIZATION OF REQUIREMENTS' WORKSHOPS PER USER PARTNER (FOR PERSONA VALIDATION &
BRAINSTORMING) 47 4.3.1 PRACTICALITIES 48 4.3.2 INTERACTIVE SESSIONS 49 4.3.3 MATERIALS REQUIRED FOR THE IMPLEMENTATION OF THE EVENT 49 4.4 FINAL INTERFACE WIREFRAMES TO REVISIT & REFINE PER USER PARTNER AND INFORMATION SERVICE 52
5| SCHEDULE OF NEXT STEPS 53
6| CONCLUSIONS 55
7| REFERENCES 57
PUBLIC
Page 4 D4.1 - Requirements methodology
8| ANNEX A: PERSONA GRAPH TEMPLATE 59
9| ANNEX B: GENERIC ONLINE QUESTIONNAIRE 60
10| ANNEX C: ONLINE QUESTIONNAIRE EXAMPLE (AGRIS USE CASE) 64
11| ANNEX D: INVITATION TO THE WORKSHOP 69
12| ANNEX E: TEMPLATE FOR THE AGENDA OF A WORKSHOP 70
13| ANNEX F: SCRIPT FOR RUNNING A WORKSHOP 74
14| ANNEX G: METHODOLOGY OF THE WORKSHOP PPT TEMPLATE 76
15| ANNEX H: INTERACTIVE SESSION I PPT TEMPLATE 78
16| ANNEX I: INTERACTIVE SESSION II PPT TEMPLATE 81
17| ANNEX J: INTERACTIVE SESSION III PPT TEMPLATE 83
18| ANNEX K: REPORTING FORM FOR A WORKSHOP 85
PUBLIC
D4.1 - Requirements methodology Page 5
Table of Figures Figure 1: The envisaged use cases per thematic areas, according to the DoA ________________________________________ 13 Figure 2: Empty Persona Graph with some initial questions to be answered __________________________________________ 24 Figure 3: A Persona Graph produced using markers and a flipchart sheet during the OpenMinTeD kick-off meeting _________ 25 Figure 4: Example of a completed Persona Graph for the Agriculture / Wheat use case, from the OpenMinTeD kick-off meeting 26 Figure 5: Identification of content / data requirements for the Wheat Researcher persona, through the Persona Graph _______ 27 Figure 6: Identification of information challenges for the Wheat Researcher persona, through the Persona Graph ____________ 30 Figure 7: Identification of Solution Feautures for the Wheat Researcher persona, through the Persona Graph _______________ 32 Figure 8: Wireframe produced for the Agriculture / Wheat use case using a software tool, during the OpenMinTeD kick-off
meeting _____________________________________________________________________________________________ 34 Figure 9: Process workflow for the Agriculture/Wheat use case, drawn during the OpenMinTeD kick-off meeting ___________ 35 Figure 10: Workflow & Wireframes for the Agriculture / Wheat use case, from the kick-off meeting ____________________ 36 Figure 11: Example Drupal-based form for collecting responses: The AGRIS use case _________________________________ 40 Figure 12: Partial view of the OpenMinTeD generic questionnaire ________________________________________________ 41 Figure 13: Information about a persona to be transferred to a Persona Graph_______________________________________ 43 Figure 14: Part of the generic online questionnaire with comments under each question ________________________________ 46 Figure 15: Part of the AGRIS use case online questionnaire ______________________________________________________ 47 Figure 16: Requirements Elicitation Gantt ____________________________________________________________________ 53 Figure 17: Example of a map guiding participants to the meeting place ____________________________________________ 72
List of tables Table 1: Summary of the requirements extracted from the OpenMinTeD envisaged use cases ....................................................... 18 Table 2: Key milestones and outcomes timeframe ......................................................................................................................... 22 Table 3: Content sources of interest per thematic area ................................................................................................................. 27 Table 4: Initial content requirements as Identified During the OpenMinTeD Kick-Off Meeting ....................................................... 28 Table 5: Information-related Challenges as Identified During the OpenMinTeD Kick-Off Meeting ................................................ 30 Table 6: Solution features as Identified During the OpenMinTeD Kick-Off Meeting ...................................................................... 32 Table 7: Important dates ............................................................................................................................................................. 53
PUBLIC
Page 6 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 7
Disclaimer This document contains description of the OpenMinTeD project findings, work and products. Certain parts of it might be under partner Intellectual Property Right (IPR) rules so, prior to using its content please contact the consortium head for approval.
In case you believe that this document harms in any way IPR held by you as a person or as a representative of an entity, please do notify us immediately.
The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated in the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content.
This publication has been produced with the assistance of the European Union. The content of this publication is the sole responsibility of the OpenMinTeD consortium and can in no way be taken to reflect the views of the European Union.
The European Union is established in accordance with the Treaty on European Union (Maastricht). There are currently 28 Member States of the Union. It is based on the European Communities and the member states cooperation in the fields of Common Foreign and Security Policy and Justice and Home Affairs. The five main institutions of the European Union are the European Parliament, the Council of Ministers, the European Commission, the Court of Justice and the Court of Auditors. (http://europa.eu.int/)
OpenMinTeD is a project funded by the European Union (Grant Agreement No 654021).
PUBLIC
Page 8 D4.1 - Requirements methodology
Acronyms DoA Description of Action
TDM Text and Data Mining
UIMA Unstructured Information Management Architecture
PUBLIC
D4.1 - Requirements methodology Page 9
Publishable Summary This document is the first part of the project’s Work Package 4 titled “Community Driven Requirements and Evaluation” and aims to provide the methodology for collecting requirements relevant to TDM from research communities that have been identified as potential end users of the OpenMinTeD project services.
The first step in the process of the OpenMinTeD project developing its text- and data-mining powered services, as well as its services to end users and e-infrastructure, is the identification of the actual needs of the communities that are currently in need of these. This requires a series of steps that will ensure the identification and description of the potential end users and communities of interest of the project, the extraction, collection and analysis of requirements related to the expected outcomes of the project, and the validation of all requirements collected before they are transformed into project outcomes.
In this context, the OpenMinTeD project aims to investigate, record and analyze the actual needs of a wide variety of stakeholders that will drive the requirement analysis and come up with the interoperability specifications and the design and development of the overarching platform, as well as the community services and support structures that will surround this platform. The definition of the functional specifications of OpenMinTeD and all subsequent research and development will be driven by this comprehensive requirement analysis. Its first priority is to bring different communities together and engage them in the process of requirements elicitation in a methodological manner.
As described in the following sections, the requirements will be elicited through an online survey, interviews and events like workshops that will engage researchers and users from the participating organizations that bring text mining researchers as well as various end user types and communities. The proposed methodology revolves around user profiling and analysis, referring to the identification of characteristics of different user types. This will be carried out through the persona methodology, which refers to the development of use profiles using common characteristics regarding their educational and research background, behavior patterns, goals, skills etc. The analysis of each of the four user communities will assist in prioritizing the importance of the identified personas and use the appropriate ones to conduct problem validation interviews.
An important aspect of this task is also the validation of the collected requirements, which will take place mostly during workshops that will ensure the participation of various types of stakeholders and through interactive sessions, user feedback on content sources, content related issues, proposed TDM-powered services and related workflows will be extracted and the initial designs of these will be drawn. As a next step, this feedback will be validated through the use of online surveys and questionnaires, real life or Skype/phone call interviews etc.
The methodology described in this document aims to provide the means to successfully perform the following activities:
identify typical use cases and applications;
record and chart the profiles of end users/researchers who are involved in or use TDM;
PUBLIC
Page 10 D4.1 - Requirements methodology
analyze the requirements of the distinct use cases and synthesize to produce the OpenMinTeD
functional specifications;
transform the generic use cases used in the requirements process to concrete and detailed scenarios
that will drive domain specific TDM applications into solve researchers’ problems;
define a validation framework through a set of indicators to validate all aspects of the OpenMinTeD
infrastructure.
The mean through which the methodology described in this document will be applied are the envisaged use cases that aim to engage different groups of stakeholders in the activities aiming at the elicitation of requirements and their validation at a later stage. These use cases will be described in the following sections.
PUBLIC
D4.1 - Requirements methodology Page 11
1| CONTEXT
1.1 Background and context
The aim of the OpenMinTeD project is to enable the development of an infrastructure that fosters and facilitates the use of text and data mining technologies in the field of scientific publications but not limited to it, by two main user categories: application domain users and text-mining experts. OpenMinTeD aims to take advantage of existing tools and text mining platforms, facilitating access to them through the appropriate registries, and enabling or enhancing their interoperability through an interoperability layer based on existing standards. OpenMinTeD supports awareness of the benefits and training of text mining users and developers alike and demonstrates the merits of the approach through a number of use cases identified by scholars and experts from different scientific areas, ranging from life sciences (bioinformatics, biochemistry, etc.) to food and agriculture and social sciences and humanities related literature. It brings together different types of stakeholders, including content providers and scientific communities, text mining and infrastructure builders, legal experts, data and computing centres, industrial players and SMEs. Through its infrastructural foresight activities, OpenMinTeD’s vision is to make operational a virtuous cycle in which
a) primary content is accessible through standardized programmatic interfaces and access rules,
b) by well-documented and easily discoverable text mining services and workflows which process, analyze and annotate text to
c) identify patterns and extract new meaningful actionable knowledge, which will be used for
d) structuring, indexing and searching content, and, in tandem,
e) act as a new knowledge resource useful for drawing new relations between content items and firing a new mining cycle.
OpenMinTeD aims to provide a service-oriented infrastructure that will enable search, retrieval, selection and access, remote or local, to all the necessary content, TDM services, appropriately documented with formal metadata, combinable into executable workflows, usable in a range of use cases. The project takes a science-oriented, researcher-centred approach throughout its design and implementation phases, which aim at ensuring that researcher communities’ needs are perfectly addressed, thus maximizing the acceptance and uptake of the infrastructure. To achieve these aims, OpenMinTeD involves researchers from a number of scientific communities ranging from scholarly communication (OpenAIRE, UK/CORE, LIBER, Frontiers), to biochemistry (EMBL-EBI) and neuroinformatics (Human Brain Project), to agriculture (INRA, Agro-know/FAO) and social sciences (GESIS) domains to
(i) gather requirements and chart the respective fields as to TDM usage and practices as well as
tools, resources and standards used,
(ii) define prototype applications that serve the corresponding scientific communities via the
OpenMinTeD infrastructure, and
(iii) evaluate these applications in relation to the infrastructure.
It is worth noting that the OpenMinTeD project instead of developing yet another infrastructure and TDM-powered services, it builds upon existing efforts from text mining partners who have
PUBLIC
Page 12 D4.1 - Requirements methodology
developed their own text-mining systems and platforms that are in production and serve diverse scientific communities (GATE, GATE Cloud and AnnoMarket, Argo and U-Compare, Alvis, DKPro, META-SHARE and its language processing service providing node QT21), abide to architectural standards (UIMA, GATE) or implement their own architectures, align to different types of European infrastructures (OpenAIRE, EGI federated cloud, emerging AAI infrastructure), and provide support to community initiatives (e.g. BioCreative, BioNLP, FOSTER). The project will closely collaborate with the winning proposal of the GARRI (Governance for the Advancement of Responsible Research and Innovation) Call at critical phases to receive input regarding legal and policy aspects, as well as to plan for joint efforts in awareness and community engagement activities.
In order for the project to be able to engage user communities and test the application of these existing TDM-powered services, tools and e-infrastructures, it needs to define use cases that involve the user communities of interest, their content-related challenges and sources and identify potential solutions addressing these challenges. In this context, OpenMinTeD has defined the thematic areas of interest as well as a number of use cases involving different types of users, different content sources and different content-related challenges; in this way, the project will be able to test and propose a variety of solutions adapted to meet the needs of each individual user community, while the aim is to provide an appropriate solution to each type of user identified. These use cases are presented in the next section.
1.2 Use cases to be studied by the project
The OpenMinTeD project has identified a selected a number of use cases that will be studied through the lifetime of the project and will provide the mean for developing the tools to serve the corresponding research communities. These use cases refer to communities and software platforms that could benefit from the outcomes of the OpenMinTeD project, which will offer innovative applications of text and data mining approaches. These use cases fall under four (4) thematic areas of interest to the OpenMinTeD project:
1. Scholarly Communication;
2. Life Sciences;
3. Agriculture / Biodiversity;
4. Social Sciences.
Each thematic area is expected to include a number of use cases that will be used for the elicitation of user requirements and the development of services aiming to meet the specific needs of each user community. An indicative list of use cases per thematic area is presented in the following table.
PUBLIC
D4.1 - Requirements methodology Page 13
FIGURE 1: THE ENVISAGED USE CASES PER THEMATIC AREAS, ACCORDING TO THE DOA
Each one of these use cases provides preliminary information on requirements that will be studied for the development of the project’s outcomes, in terms of content and data sources, TDM service providers, target users and communities. Brief information about these use cases per thematic area is provided in the following section, based on the project’s DoA. It should be noted that these use cases are subject to revisions, according to the initial outcomes of the methodology described in this document.
1.2.1 Scholarly communication
DESCRIPTION
In the context of Scholarly Communication, there are three (3) envisaged use cases that aim to focus on slightly different aspects:
1. Semantic search and discovery of open scientific outcomes
2. Map of academia – scholarly communication network
3. Research monitoring and analytics
The first use case aims to identify and address issues related to the traditional information retrieval technologies that can usually meet only the basic information retrieval needs of the end users but fail to provide refined search and retrieval options. Using content that has been semantically enriched and well-described, the project aims to provide content and service providers with enhanced semantic metadata extraction mechanisms and at the same time provide semantic search mechanisms.
The second use case aims to create a comprehensive dynamic map of relations in academia between people (such as open citation index, co-authorship network), institutions, publications, funding sources, patents and data/software citations. This is expected to be addressed by using scholarly communication infrastructures like OpenAIRE1 and CORE2 for discovering the appropriate disambiguation and entity resolution services, configure them for their needs (e.g., based on language) and apply them to repository content with the appropriate policies.
1 https://www.openaire.eu/ 2 http://core.ac.uk/
PUBLIC
Page 14 D4.1 - Requirements methodology
The third use case is focused on publishing (registered and discovered) innovative services for topic modelling (such as the ARC Communication Research Network3/OpenAIRE), potentially used by a wider range of stakeholders. These innovative services are expected to be enriched through OpenMinTeD with language detection and NLP mechanisms.
POTENTIAL END-USERS
Among the potential end users of this set of use cases are content (publishers and repositories) and service (OpenAIRE, CORE, Europe PMC) providers that will be engaged in using text-mining services to incorporate semantic metadata extraction mechanisms and provide semantic search mechanisms.
1.2.2 Life Sciences
DESCRIPTION
In the context of Life Sciences, there are two (2) envisaged use cases that aim to focus on slightly different aspects:
1. Text mining assisted curation of the EMBL-EBI chemical databases
2. Text mining assisted curation of the neuroscience resources KnowledgeSpace and NeuroLex
The former aims at implementing a workflow-based application oriented towards curation, demonstrating how it can accelerate extraction and curation of information about small molecules of biological interest (such as natural products, drugs, chemical compounds, etc.) from the available open literature. It will demonstrate how large-scale analysis of the literature (EPMC) can be used to automatically extract chemical structures, chemical properties, biological roles, bioactivities, biological targets and reactions, by leveraging text mining tools already developed by the community and shared tasks (BioNLP, BioCreative and U-Compare).
The latter aims to use the same approach as the former but in a different field. More specifically, it aims to implement a workflow-based application oriented towards curation, demonstrating how it can accelerate extraction and curation of information about neurons, brain regions, anatomical entities, diseases from the open literature. It will demonstrate how large-scale analysis of the literature (EPMC) can be used to automatically extract such information, leveraging text mining tools already developed by the community and shared tasks and demonstrating how new solutions can be customized, based on the OpenMinTeD infrastructure and its annotation platform using crowd sourcing
POTENTIAL END-USERS
The potential end users and other stakeholders of these use cases include but are not limited to:
Text mining and NLP developers working on the processing of unstructured data in the domains of
life sciences, biomedicine or chemistry. Also developers of building block technologies such as
information retrieval, text categorization, named entity recognition, named entity grounding,
information extraction, relation mining, development of interactive text mining systems or visualization
of text mining results.
3 https://researchdata.ands.org.au/arc-communications-research-network/64707
PUBLIC
D4.1 - Requirements methodology Page 15
Database curators carrying out literature curation, including model organism databases (MOD) like
TAIR, MGI, RGD, WormBase, FlyBase, MaizeGDB; Functional genome annotation databases (e.g.
GOA), proteomics databases (BioGRID, IntAct, MINT), comparative toxicogenomics (CTD).
Experimental biomedical and basic science researchers: (1) to improve the interpretation and
design of experimental research by improving the access to previously published information on the
studied bio-entities; (2) using literature mining and knowledge discovery software for the generation
of new hypotheses that will be experimentally validated.
Clinicians: improve the information access for evidence-based clinical practice using text mining
technologies and biomedical semantic search engines
Chemists: systematic access to chemical information (structure associated chemical entities) described
in the literature and patents.
Biocurators and Bioinformaticians: Text-mining assisted curation results are useful as Gold Standard
validation sets for predictive bioinformatics results.
Pharma Industry: Drug discovery and target selection, identifying adverse drug effects, competitive
intelligence and knowledge management
Publishers: semantic annotation of online publications, structured digital abstracts
Scientific papers authors: author derived annotations (assisted completion of structured digital
abstracts)
Patients: improved search engines, especially important for the detection of cases of similar rare
disease cases and personalized medicine (automatic detection of mutations descriptions published in
the literature)
Computer scientists: useful training data provided by BioCreative to improve the performance of
cutting edge statistical machine learning algorithms and feature selection/exploration.
1.2.3 Agriculture / Biodiversity
DESCRIPTION
In the context of Agriculture and Biodiversity, there are three (3) envisaged use cases that aim to focus on slightly different aspects:
1. Enrich agricultural databases to assist food- and water- borne disease outbreak alerts and product
recalls
2. Image, figure and dataset discovery in the AGRIS FAO online service (8M bibliographic resources)
3. Aggregation of various types of data for serving the Wheat researchers’ community
The first use case aims to provide solutions to the community consisting of food safety officers and agencies, water quality managers and other stakeholders of human health through nutrition, such as the Global Food Safety Partnership (GFSP). OpenMinTeD aims to provide a semantic-based querying including relations among microorganisms and their locations and graphical display of the parsing results through a text mining-powered discovery and aggregation of relevant content
PUBLIC
Page 16 D4.1 - Requirements methodology
(e.g., microorganisms, kind of food, food processing, and water origin) from diverse trusted sources and their normalization by a shared reference vocabulary.
The second use case plans to support researchers using the AGRIS bibliographic database4 by providing them with value-added services based on text-mining mechanisms that will preview after each search related images/datasets that are located or mentioned inside the publications. End users are expected to be able to click the preview of the related figure (image, diagram, dataset name) and will be redirected to the specific location of the publication to study more details.
The third use case will focus on how text and data mining services may enhance and support further online publication and data search applications. The community of researchers and breeders working on Wheat and other plants is expected to benefit from the project’s outcomes through the automatic linking of different pieces of information from databases and literature. The targeted application will focus on information relating to genetics, regulation and phenotypes with information retrieval and data integration perspectives. Whenever possible, it will rely on Linked Open Data standards and resources and will conform to the RDA Wheat Data Interoperability Working Group5 recommendations.
The fourth use case plans to support indexing and search of information on biodiversity database through GBIF (Global Diversity Information Facility) portal. The aim is to improve search features, currently based on IDs, countries and numerical data (e.g. geolocalisation), with the possibility for the end user to query the portal with general habitat terms, such as 'aquatic environment' and retrieve all occurrences of species known to be living in aquatic environment (e.g. river, lake). Habitat structured information is critical for ecology and genetics studies. Beyond GBIF, the workflow could interest many other stakeholders (e.g. working on biodiversity, genetics, etc.). The information source for the use case could be the free text of the species occurrence records, especially the habitat slot and the external sources.
POTENTIAL END USERS
Food safety officers and agencies, water quality managers
Information managers
Researchers
Breeders
Bioinformatics application providers
Veterinarians, epidemiologists
4 http://agris.fao.org 5 https://rd-alliance.org/groups/wheat-data-interoperability-wg.html
PUBLIC
D4.1 - Requirements methodology Page 17
1.2.4 Social Sciences
DESCRIPTION
In the context of Social Sciences there are two (2) envisaged use cases that aim to focus on slightly different aspects:
1. Automatic detection, disambiguation and linking of (named) entities in Social Science text corpora to
enhance indexing and searching
2. Automatic coding of unstructured answers in surveys
The first use case aims to provide solutions to the problem that social scientists spend considerable time on researching relevant publications and research data. Thus, there is a considerable interest in automatic methods for entity recognition and linking to ensure a reliable and context-sensitive retrieval of relevant entities (such as publications, data, persons, institutions, places, references, citations, scientific concepts). This includes reliable recognition and disambiguation of relevant entities (such as persons and vague data citations in scientific publications) as well as linking recognized entities within and across documents. This may also enhance information extraction (such as detection of definitions and hot topics) and generating knowledge maps on the base of linked entities. A further important issue of the use case is to make annotations comprehensively re-usable for the scientific community. This is also relevant for the analysis of communication documented in texts. Tracing scientific discourses over a large period of time may serve as an example for this.
The second use case addresses a well-known problem in survey research. Research data collected by surveys play a central role the Social Science research. Surveys usually contain elements that provide structured answers by participants but also unstructured elements where participants are encouraged to give answer in free text form. While structured answers can easily be maintained a recurring problem is to recognize and code unstructured answers according to code schemas to which the survey under study have to be mapped. Text mining may help to solve this problem..
POTENTIAL END-USERS
□ Social scientists seeking for information relevant to their research
□ Content and service providers
□ Social science researchers doing surveys.
1.2.5 Summary of initial use cases analysis
The following table provides an overview of information that is of interest to the context of the requirements elicitation processes discussed in this document, namely the type of end users, their content-related challenges where the OpenMinTeD project could provide solutions as well as an indicative list of content sources of interest to the users per thematic area.
PUBLIC
Page 18 D4.1 - Requirements methodology
TABLE 1: SUMMARY OF THE REQUIREMENTS EXTRACTED FROM THE OPENMINTED ENVISAGED USE CASES
Thematic Area
Potential end-users Content-related aspects Content sources (indicative list)
Scholarly Communication
content providers,
such as publishers and
repositories,
service providers such
as OpenAIRE, CORE
and Europe PMC.
commercial and non-
commercial services’
developers, text-
miners,
scientometricians,
researchers and the
general public
funders and
government
organisations needing
to discover scientific
trends and evaluate
research impact.
authors of scientific
publications,
publishers, institutional
managers
OpenMinTeD will be
used to allow content
and service providers in
using text-mining
services to incorporate
semantic metadata
extraction mechanisms
and provide semantic
search mechanisms.
Information extraction
from full-texts of
research papers
Extraction of citations
from full-texts of
research papers
Content
recommendation of
related research
papers
Mining the licence of
research papers
Supporting scientific
knowledge discovery
Text-categorisation of
papers
Europe PMC content with
attached licensing info, of
which some is restrictive.
Publisher content with
restricted licenses, open
publisher content with CC
licenses (PLOS, Frontiers,
Copernicus, etc.).
OpenAIRE and CORE
repositories content with
no specific licences.
Publishers willing to
provide content abiding to
the legal specs (Frontiers,
PLOS, etc.).
Researcher networks
(Mendeley, Frontiers).
European Patent Office
data (www.epo.org) to
derive links to scientific
publications or supporting
data.
Life Sciences Biocurators
Researchers:
Clinicians, Chemists,
Bioinformaticians,
Computer scientists
SME’s
Pharma Industries
Improving performance
of TDM methodologies.
Promote the
development of
accessible real world
TDM applications.
Explore strategies to
integrate TDM results
with knowledge base.
Define formal and
practical solution of
TDM interoperability.
PMC Europe content with
attached licensing.
Publisher content with
restricted licenses, open
publisher content with CC
licenses.
Frontiers publications and
researcher network.
Agriculture / Biodiversity
food safety officers
and agencies
water quality
facilitate the discovery
and aggregation of
relevant content from
AGRIS bibliographic
database
agriculture-specific content
PUBLIC
D4.1 - Requirements methodology Page 19
managers
information
managers,
bionformaticians
researchers,
veterinarians,
epidemiologists,
breeders
diverse trusted sources
support added services
based on text-mining
mechanisms will
preview after each
search related
images/datasets that
are located/mentioned
inside the publications.
automatic linking of
different pieces of
information obtained
through TDM services
repositories
Foodborne Outbreak
Databases.
ProMED-mail Europe PMC
content with attached
licensing info.
GBIF portal
Encyclopedia of Life
Social Sciences
Social scientists
seeking for
information relevant
to their research
Content and service
providers in the
Social Sciences
Social science
researchers doing
surveys
disambiguation,
recognition, linking and
context-sensitive
retrieval of key entities
information extraction
from full texts
automatic coding of
unstructured answers in
surveys
making annotations
comprehensively re-
usable for the scientific
community
The Social Sciences Open
Access Repository (SSOAR)
Proceedings of the Annual
Meetings of the German
Society for Sociology
(DGS)
Social media content
relevant to Science
research
This information can be used as a basis for the next steps of the methodology of eliciting requirements and more specifically for the initial profiling of the personas of interest to the project. Based on this information, persons with characteristics that fit into one of the aforementioned categories will be engaged in the user elicitation activities of the project, as described in the following sections.
1.3 Overview of project stakeholders
The project consortium consists of partners with long experience in the area of text and data mining that will contribute their expertise in activities related to the elicitation of requirements from the potential end users of the project’s outcomes. The OpenMinTeD project partners can fall under six (6) major categories:
1. Research scientific communities, including EMBL-EBI, EPFL, INRA, CNIO, GESIS, AK and LIBER. These
communities are characterized as the text-mining service consumers and the corresponding partners
are expected to play a key role during the elicitation of infrastructure requirements. These
organizations may also be considered as User partners, who adapt the solutions provided by the
TDM experts in order to work on TDM-powered services for specific user communities. In addition, they
have close connections with the communities that they serve through their added-value services.
PUBLIC
Page 20 D4.1 - Requirements methodology
2. Text & Data Mining (TDM) experts, including ARC, UNIMAN, UKP-TUDA, USFD, INRA and CNIO,
leaders in the text-mining domain, building and maintaining systems that actually serve customers.
They are the organizations that use the infrastructure available for their TDM-powered solutions and
related infrastructure.
3. e-Infrastructure providers, with expertise in building and supporting e-infrastructures that can be used
(and is actually used) for the services to be developed by the project. Project partners ARC, USFD,
UKP-TUDA, AK and GRNET are included in this category.
4. Content providers that are providing access to Open Access content. ARC, AK, GESIS and Frontiers
are included in this category.
5. Legal experts that share their expertise on legal aspects such as international copyright and
intellectual property laws. UVA and ARC are the partners included in this category.
6. Industry: represented by AK which provides the business perspective contributing to the sustainability
and exploitation of the project results, defining appropriate business models for SMEs using the
OpenMinTeD infrastructure. AK is also considered a User partner.
This combination of different partner types allows the OpenMinTeD project to provide a complete set of TDM-powered solutions that will meet the needs of a wide variety of stakeholders.
In addition, the project has already identified a number of potential stakeholders that will benefit from the outcomes of the project. These stakeholder groups can be summarized as follows:
Small publishers, scholarly societies and repositories that would benefit from content that is more
visible and therefore more widely used by the researchers.
Text mining and language researchers that would benefit from their interaction with and contribution
to the development of text mining services and various types of related software resources.
Reference tools and social networks used by researchers that will make use of the TDM-powered
services of the project.
SMEs that will use the OpenMinTeD specifications and services to find/compile services to more
advanced or innovative products.
Funding bodies that will be able to raise awareness on TDM need for infrastructure services and
policy harmonization.
Research Communities, such as ESFRI projects6, that will be able to integrate TDM and open science
awareness into their training material.
The definition of the expected user communities for the project will allow the engagement of targeted users of interest for the project and its expected outcomes.
Of specific interest to the activities described in this document as the partners who are actively involved in research scientific communities, referring to text-mining service consumers and the corresponding partners are expected to play a key role during the elicitation of infrastructure requirements. These project-induced communities will be the ones providing the requirements for
6 https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=esfri
PUBLIC
D4.1 - Requirements methodology Page 21
designing the TDM-powered solutions of the project, as well as to adapt these solutions at a later stage. The project partners that fall into this category are the following:
EMBL-EBI: EMBL-EBI’s Cheminformatics and metabolism team provides the biomedical community with information on metabolism through the development and maintenance of MetaboLights, a metabolomics database and archive and ChEBI, EMBL-EBI’s database and ontology of chemical entities of biological interest.
École Polytechnique Fédérale de Lausanne (EPFL): The Neuroinformatics team at EPFL, associated with the Human Brain Project, the International Neuroinformatics Coordinating Facility and Neuroscience Information Framework, are developing KnowledgeSpace, a community-based semantic wiki for living review articles providing semantically linked data and employing curated and community contributed vocabularies and ontologies.
Institut National de la Recherche Agronomique (INRA): INRA carries out mission-oriented research for high-quality and healthy foods, competitive and sustainable agriculture and a preserved and valorised environment. It produces and enables access to knowledge to the international community of researchers and practitioners in agriculture but also towards policy makers and society.
Agro-Know (AK): Agro-Know is a SME that provides meaningful services on top of open data in the agrifood sector. Agro-Know has a long experience in data and metadata management and has a strong involvement in various agrifood research communities worldwide. Through a strategic partnership with the Food & Agriculture Organization (FAO) of the United Nations, the Chinese Academy of Agricultural Sciences, and the ARIADNE Foundation, Agro-Know is hosting the Data Processing Unit of the traditional AGRIS service – a global information system providing access to more than 8M bibliographic records, collecting and making accessible bibliography on agricultural science and technology.
Spanish National Cancer Research Centre (CNIO): CNIO is one of the main organizers and initial founding members of the international BioCreative7 challenge initiative, the main initiative to evaluate and promote the implementation of text mining systems applied to life sciences.
Gesellschaft Sozialwissenschaftlicher Infrastruktureinrichtungen (GESIS): GESIS, the Leibniz-Institute for the Social Sciences is the largest infrastructure institution for the Social Sciences in Germany, offering empirical social researchers services for the various phases of the research data cycle.
Ligue des Bibliothèques Européennes de Recherche – Association of European Research Libraries (LIBER): LIBER is the principal association of the major research libraries of Europe, and along with ARC (through OpenAIRE) represent the scholarly communication domain.
1.4 Key milestones of requirements analysis
In order to work out a concise requirements elicitation plan, the scheduling of the different project activities needs to be taken into account. The following Table presents the requirements related tasks and the related milestones and deliverables that need to be delivered during the duration of the project and their respective deadlines. A detailed plan is section 5.
7 www.biocreative.org
PUBLIC
Page 22 D4.1 - Requirements methodology
TABLE 2: KEY MILESTONES AND OUTCOMES TIMEFRAME
Key Milestones Leader Delivery Date
T4.1 Requirements elicitation (M1-14) AK 31/07/2016
D4.1 Requirements methodology AK 31/07/2015
MS14 Interim Requirements Report AK 30/11/2015
T4.2 Requirements analysis and harmonization (M3-16) AK 30/09/2016
MS19 Interim Requirements Analysis Report AK 29/02/2016
D4.2 Community Requirement Analysis Report AK 31/05/2016
D4.3 OpenMinTeD functional specifications ARC 31/07/2016
PUBLIC
D4.1 - Requirements methodology Page 23
2| DESIRED OUTCOMES The methodology described in this document aims to provide the means for extracting information from potential stakeholders of the project, as they have been already identified. The requirements extracted from these stakeholders will be used for shaping the expected outcomes of the project in the form of e-infrastructure, text and data mining services as well as TDM-powered services for the end users. This chapter aims to provide a description of what each one of these expected outcomes should look like, facilitating the work of the project partners responsible for this task.
2.1 Profiling user personas
The first step in the proposed methodology is the identification of the different user profiles (personas) of interest to the OpenMinTeD project. The project has already identified a number of user groups that is expected to benefit from the envisaged services of the project. These groups are the following:
Researchers: people actively involved (i.e. conducting) scientific research.
Curators: usually the content specialists, which are responsible for an institution's collections (digital
and analogue).
Text Miners: scientists working on deriving high-quality information from text.
Service Developers: software developers working on the development of services and the integration
of the text- and data- mining services.
Publishers: Entities (persons or organizations) that produces and distributes publications such as
journals and books in printed or digital form.
People involved in Libraries and Databases, such as librarians, information scientists, knowledge and
information managers, content and repository managers etc.
Research Communities that engage researchers working on topics of common interest.
SMEs, referring to companies with a limited number of employees that are working on topics of
interest to the OpenMinTeD project.
Each one of these groups consists of one or more user types, as the ones described in the previous Chapter. A persona is defined as a fictional character that is created in order to represent the different user types that may use a service. A persona is defined as a representation of the behavior of a hypothesized but validated group of users. The profile of a persona is compiled from information extracted through interviews with real users; this information is collected, evaluated and validated before it is merged with related information from interviews with other users with similar characteristics. This compilation of information is used for developing the profile of a persona that includes behavior patterns, skills, attitudes, and other attributes. The use of personas is common in workflows related to product development and validation, as it allows the design of a product based on the needs assigned to a specific persona.
In the case of OpenMinTeD, the following attributes are used for the development of a persona:
PUBLIC
Page 24 D4.1 - Requirements methodology
Persona description and demographics;
Data types of interest to each persona as well as data types already being used (own data or data
from external sources) by the specific persona;
Top challenges (data-related issues) faced by each persona regarding improving access to research
data;
Features of the envisaged solutions for the specific community.
These different types of information are analysed in the following sections.
For the recording of these requirements, a template called Persona Graph is used. This allows the uniform description of different personas by different project partners involved in this process. The following figure presents an example of an empty Persona Graph with some initial questions to be answered.
FIGURE 2: EMPTY PERSONA GRAPH WITH SOME INITIAL QUESTIONS TO BE ANSWERED
The Persona Graph is based on an adapted version of the Lean Canvas, as it was presented by Maurya (2012). The Lean Canvas is a template that aims to include information related to the development of a new product and its business model. It focuses on addressing broad customer problems and solutions and delivering them to customer segments through a unique value proposition. The version used in the requirements’ elicitation methodology is adapted in a way
PUBLIC
D4.1 - Requirements methodology Page 25
that allows the collection of information focused on the project’s expected developments, without the need of collecting additional information or going into too many details that would not be substantial for the project.
A Persona Graph contains information that is collected through the use of the online questionnaire, interviews or through the focused workshops, as described in the next sections. The first version of the Persona Graph is usually hand-drawn, using markers and paper, as shown in the following figure.
FIGURE 3: A PERSONA GRAPH PRODUCED USING MARKERS AND A FLIPCHART SHEET DURING THE OPENMINTED KICK-OFF MEETING
The prioritization of the information recorded in each Persona Graph, based on the importance of each statement according to the users, is of major importance. In this direction, in the boxes related to the Data Requirements, Information Challenges and Solution Features the most important (according to the users) statements should be recorded on the top of the list and the least important at the bottom.
The Persona Graph template is provided in the Annex of this document. The following figure shows how the information collected during the workshop was transferred from the handwritten form to the template.
PUBLIC
Page 26 D4.1 - Requirements methodology
FIGURE 4: EXAMPLE OF A COMPLETED PERSONA GRAPH FOR THE AGRICULTURE / WHEAT USE CASE, FROM THE OPENMINTED KICK-OFF MEETING
2.2 Content analysis
Content analysis is the second step of the proposed workflow. It refers to the process that aims to identify, record and analyze the characteristics of the content of interest as well as the content already in use for the personas identified and studied in this task. The following parameters are included in the analysis of the content analysis:
Content type (e.g. publications, datasets, presentations, maps etc.);
Content format (e.g. PDF/DOC files, PPTs, ZIP files etc.);
Content volume (number of resources or size in MB/GB);
Content source (database, repository, website or web portal etc.) and interoperability options (e.g.
OAI-PMH, API, RSS, SPARQL endpoint etc.);
Content Language (referring to the language of the content and its associated metadata);
Intellectual Property Rights (IPR) and licensing information, referring to use, reuse, adapt and
redistribute, among others.
PUBLIC
D4.1 - Requirements methodology Page 27
The detailed analysis of the content identified through this process will allow the optimization of the project’s services so that they will meet the specific requirements of specific communities making use of specific content.
FIGURE 5: IDENTIFICATION OF CONTENT / DATA REQUIREMENTS FOR THE WHEAT RESEARCHER PERSONA, THROUGH THE PERSONA GRAPH
The results of the preliminary analysis so far, through the description of the envisaged use cases, show that the vast majority of content of interest for the purposes of the project is indeed in the form of text. The following table provides information on the content sources for each one of the thematic areas of interest to the OpenMinTeD project.
TABLE 3: CONTENT SOURCES OF INTEREST PER THEMATIC AREA
Thematic area Content sources (indicative list)
Scholarly Communication
Europe PMC content with attached licensing info, of which some is restrictive.
Publisher content with restricted licenses, open publisher content with CC licenses (PLOS, Frontiers, Copernicus, etc.).
OpenAIRE and CORE repositories content with no specific licences.
Publishers willing to provide content abiding to the legal specs (Frontiers, PLOS, etc.).
Researcher networks (Mendeley, Frontiers).
European Patent Office data (www.epo.org) to derive links to scientific publications or supporting data
PUBLIC
Page 28 D4.1 - Requirements methodology
Thematic area Content sources (indicative list)
Life Sciences PMC Europe content with attached licensing
Publisher content with restricted licenses
Open publisher content with CC licenses
Frontiers publications and researcher network
Agriculture/Biodiversity
Foodborne Outbreak Databases.
ProMED-mail Europe PMC content with attached licensing info
GBIF database
FAO AGRIS
Social Sciences The Social Sciences Open Access Repository (SSOAR)
LeibnizOpen: OA publications of Leibniz institutions researchers
Proceedings of the Annual Meetings of the German Society for Sociology (DGS)
Additional content sources and types will be identified through the profiling of the user personas using the methodology described in this document.
During the OpenMinTeD kick-off meeting, the outcomes of three of the envisaged use cases in terms of content / data requirements were collected and presented. These requirements are presented in the following table.
TABLE 4: INITIAL CONTENT REQUIREMENTS AS IDENTIFIED DURING THE OPENMINTED KICK-OFF MEETING
Use Case Content Requirements
Biocreation Access full text biomedical literature (also pre-publication)
Linked information (interdisciplinary)
Exploration tools to navigate & discover content from different sources
Access to other domain experts
Community sharing/ Social tools
Scholarly Communication Primary sources
Body
Articles
Dissertations
Multimedia
PUBLIC
D4.1 - Requirements methodology Page 29
Use Case Content Requirements
Metadata
Licenses
Agriculture / Wheat Needs to link raw data with publications
Purpose: find similar/complementary info in both types of sources
Needs knowledge about previous work on wheat phenotype and environmental info, including
Traits of plants (e.g. disease resistance)
Varieties
Culture conditions (cultivation)
Genetic info
Needs complementary info/ surrounding info on related topics that affect wheat yields beyond her own expertise
Social Sciences N/A
It should be noted that during the OpenMinTeD kick-off meeting, the use cases of Social Sciences were not introduced.
2.3 Information-related challenges and problems
One of the aspects of the definition of each persona and the identification of its attributes is the identification of information- and content-related challenges and problems that a specific persona is facing. These problems may have to do with identifying, retrieving, accessing and managing the content of interest, among others.
The identification of these content-related issues will allow the project partners to work on services that will address these issues and provide meaningful, TDM-powered solutions. These solutions may also be proposed by participants of the project’s user requirements’ workshops or through interviews.
When recording the information in the corresponding box of the Persona Graph, it is really important to indicate the importance of each entry by prioritizing them in the list. In this direction, the most important statement (according to the users) should be listed on the top of the list with the least important one at the bottom of the same list.
PUBLIC
Page 30 D4.1 - Requirements methodology
FIGURE 6: IDENTIFICATION OF INFORMATION CHALLENGES FOR THE WHEAT RESEARCHER PERSONA, THROUGH THE PERSONA GRAPH
TABLE 5: INFORMATION-RELATED CHALLENGES AS IDENTIFIED DURING THE OPENMINTED KICK-OFF MEETING
Thematic area Information-related challenges (indicative list)
Scholarly Communication
Assigning & curating metadata
How to describe license
How to define the link between text and metadata
Figure out licensing
Technology updates
Exposing metadata to many aggregators
Too many standards
Create relationship to publishers
Visibility
Life Sciences Literature is copyrighted-needs clear indicators on the copyrights of papers
Information scattered throughout different papers & different sources
Extracting different dimensions of information
Knowledge hidden in tables, figures and supplementary data
PUBLIC
D4.1 - Requirements methodology Page 31
Thematic area Information-related challenges (indicative list)
Filter different types of mentions of common entities to create new use cases
Agriculture / Biodiversity
Raw data & publications are always separate
Info expressed in many ways – dispersed in silos
Challenge is to identify, to normalize and integrate it in existing knowledge
No user-friendly tools to search, visualize and check/ correct accuracy of info
Has the necessary subscriptions but no way/ not allowed to download and share the publications. No software/ legal barriers
Bioinformaticians are not text mining people; are not interested
Proper use of TDM would save time and money
Needs to filter non-relevant journals/ wheat varieties that are commercialized in her country
Social Sciences N/A
It should be noted that during the OpenMinTeD kick-off meeting, the use cases of Social Sciences were not introduced.
2.4 Description of information services & systems
The OpenMinTeD project has compiled a list of available information services and systems that will be used as use cases and will be enhanced through the integration of the OpenMinTeD services. On top of that, additional ones are expected to be identified during the user requirements elicitation. Both existing and new information services and systems need to be described in details by the users during the interviews and related workshop sessions. This information needs to be extracted from the expected end users of the project’s outcomes through the interactive sessions of the related workshops to be organized by the project as well as through interviews and other means of extracting requirements. These services and systems are expected to be enhanced through the integration of TDM-powered functionalities that will enhance the experience of their end users.
An example of such a platform is AGRIS8, a large bibliographic database of more than 8M bibliographic records. These records are already linked to various external sources through the use of the AGROVOC thesaurus as the backbone of its linked data approach allowing users to retrieve related resources from these external sources. The AGRIS use case aims to allow users of the platform to retrieve components of research publications, such as images, charts and datasets, which are currently integrated in the document through the use of semantically enriched, TDM-powered services.
8 http://agris.fao.org
PUBLIC
Page 32 D4.1 - Requirements methodology
FIGURE 7: IDENTIFICATION OF SOLUTION FEAUTURES FOR THE WHEAT RESEARCHER PERSONA, THROUGH THE PERSONA GRAPH
TABLE 6: SOLUTION FEATURES AS IDENTIFIED DURING THE OPENMINTED KICK-OFF MEETING
Thematic area Information-related challenges (indicative list)
Scholarly Communication
Service to provide & curate metadata
Service to automatically extract the funding source & the license type
Automatic linking to knowledge bases (entities), datasets, software code
Interfacing to intelligent services, e.g. semantic search, elsewhere
Training activities as part of dissemination identifying related content from everywhere
Automatically disambiguating authors’ names
Life Sciences Repository of copyrighted material, use of Open Data, integrated OMICS
Dynamic & Interactive interface with Text Mining Workflows integrated to Biocuration Workflows
Community Curating Tool
Crowdsourcing of info & community involvement
Dynamic filtering, relevance detection, provenance info
PUBLIC
D4.1 - Requirements methodology Page 33
Thematic area Information-related challenges (indicative list)
Agriculture / Biodiversity
Service that connects data & publications presented in a user-friendly way
Possibility to query in a specific way or browsing through knowledge base without using exact term
Measurement of quality/ citation impact
Possibility to run bioinformatics tools with the data extracted from text and databases
Should have alternative means to check validity
Interface to check and correct extracted data going back to the text
Data is accessible to everyone
Social Sciences N/A
It should be noted that during the OpenMinTeD kick-off meeting, the use cases of Social Sciences were not introduced.
2.5 Description of current usage scenarios of information services
& systems
A set of requirements of high importance to the project will be related to the description of the interaction between the user and the information services and systems. The way that users interact with such services need to be described in details and depicted as a set of steps consisting a workflow. This information will allow a better understanding of the activities of a user when using a service or a system and therefore a better identification of his/her requirements, steps that could be improved or enhanced with the use of the OpenMinTeD services etc.
2.6 Design interface wireframes
During the interactive sessions of the workshops to be organized by the project for elicitation of user requirements, participants will be asked to share their ideas on the integration of the new, envisaged service or feature powered by text mining, in a new or existing website or repository. This design refers to the location of the new service box or button in a way that the user will feel comfortable with, e.g. not interfering with the current workflow of the user. These requirements will drive the User Interface Design (UI) that should be considered and implemented by the corresponding project partners. The wireframes that will result from the workshops will be in the form of a drawing in a flip chart paper. The final wireframes will be more professional and will be created by the partners using specific software tools (see example in figure 8).
PUBLIC
Page 34 D4.1 - Requirements methodology
FIGURE 8: WIREFRAME PRODUCED FOR THE AGRICULTURE / WHEAT USE CASE USING A SOFTWARE TOOL, DURING THE OPENMINTED KICK-OFF
MEETING
2.7 Description of user-envisaged usage workflows
After describing the content-related issues and proposing potential solutions that would address these issues, usage workflows related to the use of these envisaged services need to be described. More specifically, potential users of the OpenMinTeD services (e.g. interviewees or participants of the project’s workshops) will be asked to describe the interaction between the users and these services in the form of a usage workflow.
A usage workflow refers to the description of all the activities that take place for a user completing a specified task; more precisely, a usage workflow is defined as a use case drawn out into a step-by-step procedure, sometimes accompanied by a flowchart. For example, if a user wants to extract a figure from a research publication, he/she would have to go through a number of steps, which could be more or less the following:
The user visits a content repository;
He/she uses a search term for retrieving publications related to his/her work;
He/she retrieves a number of results;
He/she filters the more relevant results through the use of filters;
He/she selects a specific publication
He/she goes through the publication and identifies an interesting reference
He/she clicks on a button allowing him/her to retrieve only the figures of this publication
PUBLIC
D4.1 - Requirements methodology Page 35
FIGURE 9: PROCESS WORKFLOW FOR THE AGRICULTURE/WHEAT USE CASE, DRAWN DURING THE OPENMINTED KICK-OFF MEETING
All these steps need to be recorded by the users in details and in a logical order. This will allow the OpenMinTeD project partners to provide users with a set of functionalities that will meet the expectations of the users, taking them into consideration as User Experience Design (UX) requirements.
PUBLIC
Page 36 D4.1 - Requirements methodology
FIGURE 10: WORKFLOW & WIREFRAMES FOR THE AGRICULTURE / WHEAT USE CASE, FROM THE KICK-OFF MEETING
2.8 List resulting requirements for TDM-powered outcomes
services
The feedback and requirements collected during the interviews, workshops and other means organized for this purpose will have to be collected, organized and validated before it is used for the development of the corresponding OpenMinTeD TDM-powered outcomes, such as e-infrastructure, text mining and final online services for end users. All additional related information collected through the aforementioned means will also have to be transformed into the corresponding requirements. The resulting requirements will be classified according to their topic, in the following categories:
Requirements for the definition and optimal description of the user persona (referring to the
demographics of the persona);
Requirements related to the content sources already used or of interest to each persona;
Requirements related to the content-related issues of each persona;
Requirements related to the expected TDM-powered services of each persona;
Requirements related to the interaction of a user with the new service(s);
PUBLIC
D4.1 - Requirements methodology Page 37
Requirements related to the integration of each envisaged service in an existing or new website,
portal and other content source.
The requirements collected and validated through the processes described in the following chapters will drive the development of the project’s TDM-powered services both at functionality and at user interface level.
PUBLIC
Page 38 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 39
3| METHODOLOGY Project partners involved in tasks related to the elicitation of requirements will have to organize and implement a number of interviews with specific stakeholders that fall into the user groups identified by the project, as described earlier in this document. The initial profiling of the personas may take place in the form of customized online questionnaires or interviews with users featuring the characteristics of interest to the OpenMinTeD project. In both cases, the same set of predefined questions should be used; in the former, the responses are provided by the anonymous user himself/herself while in the latter the responses are recorded by the facilitator (project partner).
The validation of this feedback and elaboration on the responses collected will be achieved during dedicated, focused workshops that aim to engage different types of users (personas) and allow them to work closely on the description of solutions that each different persona is facing. During these workshops, participants will be asked to describe each persona in details, providing detailed information on its demographics, content-related sources and content-related issues that need to be addressed (per persona). In addition, during the workshop, they will be asked to design how an initial design will look like in the form of wireframes, using paper and markers.
The next step of the process has to do with the validation of this initial feedback (referring to both the persona and the wireframes) with feedback acquired through online questionnaires, interviews & workshops. Only after this information has been validated can the project partners involved in this task confirm the suggested personas and design the wireframes using an appropriate software tool.
The proposed methodology consists of the following steps:
1. Initial profiling of a user persona
a. A generic online questionnaire is developed for the collection of initial requirements and the
initial profiling of a persona.
b. The generic questionnaire is adapted by the project partners responsible, in order for it to
meet the specific needs of each user community.
i. E.g. translated the survey questionnaire (all the results should be in English – report
and interviewees’ answers)
c. The adapted questionnaires are used for the initial profiling of each persona for each use
case described in this document.
i. one adapted questionnaire for each use case
ii. partners distribute the survey questionnaire, using their own means
d. Feedback is collected from the online questionnaires by the corresponding project partners.
e. Profiling can also take place through interviews (face to face, through Skype or phone call)
2. Validation of the collected requirements
a. Workshops are organized for the validation of the feedback collected regarding the
personas.
b. New information regarding the personas is acquired.
c. Feedback from each workshop is collected through reporting
d. Validation can also take place through questionnaires
3. Brainstorming on new TDM-powered services, in terms of functionalities, user interface (wireframes)
and interaction between the user and the services (usage workflows)
PUBLIC
Page 40 D4.1 - Requirements methodology
4. Transformation of validated information and user requirements to requirements that will allow
technical partners of the project to work on the TDM-powered outcomes to serve the corresponding
user communities and personas.
Reports from all workshops are used for the analysis of requirements that will drive the development of services by the project partners
3.1 Initial profiling of targeted personas
The initial profiling of personas, including content sources of interest and relevance to them as well as information on content-related challenges and problems that these personas face is probably the most important aspect of this work. The information extracted from these personas will have to be transformed to requirements that will drive the development of the project’s services.
3.1.1 Using online questionnaires
For the initial profiling of targeted personas, the use of online questionnaires is proposed, as they provide a free, efficient and quick mean for collecting requirements with only limited effort required for setting up the initial questionnaire. While any online survey tool can be used for acquiring requirements, Google Forms9 is suggested for this purpose; it is a free, widely used tool, allows automatic export of responses in a spreadsheet and easy processing in this form. In addition, a Google Form can be easily and collaboratively revised by team members working on a specific use case, adapted and reused with modifications for serving different needs, only requiring a valid Gmail address. However, the use of alternative tools such as LimeSurvey10 and SurveyMonkey11 may also be considered for this purpose. For example, Agro-Know plans to use the integrated Drupal form functionality of its Agro-Know Stem platform12.
FIGURE 11: EXAMPLE DRUPAL-BASED FORM FOR COLLECTING RESPONSES: THE AGRIS USE CASE
9 https://www.google.com/forms/about 10 https://www.limesurvey.org 11 https://www.surveymonkey.com 12 for example see http://www.akstem.com/agris
PUBLIC
D4.1 - Requirements methodology Page 41
What is important is that the responses collected through any online questionnaire should be exported and stored as a spreadsheet file that will allow the processing and analysis of the extraction of requirements.
STRUCTURE OF THE ONLINE QUESTIONNAIRE
An initial, generic online questionnaire in the form of Google Form is available online13 and accessible upon request. Partners working on specific use cases for the project are encouraged to use this questionnaire and adapt it in order to meet the specific needs of the communities involved in the corresponding use cases. In this context, online copies of this questionnaire may be provided to partners working on this task, so that it can be freely adapted.
FIGURE 12: PARTIAL VIEW OF THE OPENMINTED GENERIC QUESTIONNAIRE
The structure of this generic questionnaire is the following:
13 https://docs.google.com/forms/d/1_OBM6cAl28MdhMSMU20xo2TePwses_JGhpnNoZitG2k/viewform?usp=send_form
PUBLIC
Page 42 D4.1 - Requirements methodology
Questions 1-7 refer to demographics, aiming to provide a generic profile of the persona;
Questions 8-10 refer to the content / data of relevance and interest to the specific persona;
Question 11 refers to content-specific challenges faced by the persona. This question should start with “Difficulty to...” or “Lack of...”. This question also aims to extract the importance of each one of the challenges reported by the person completing the survey by using a scale with 5 values ranging from Very important to Not important at all.
Question 12 refers to the suggestion of potential solutions for addressing the content-specific challenges and enhancing existing content portals and websites. This question should start with “I would like to...”. This question also aims to extract the importance of each one of the challenges reported by the person completing the survey by using a scale with 5 values ranging from Very important to Not important at all.
Despite the fact that each partner is encouraged to adapt the questionnaire so that it better reflects the requirements of their specific communities, it is important that the structure of the questionnaire (e.g. the four different sections) is maintained.
CONSIDERATIONS REGARDING THE ONLINE QUESTIONNAIRE
A number of additional aspects should be taken into consideration, for ensuring the maximization of the questionnaire’s outputs.
For each use case, a minimum of fifteen (15) responses is required. While this number refers to the
total number of responses collected by any mean (e.g. including face-to-face or Skype/call
interviews), it is important that a number of at least 15 responses are provided for each use case.
The questionnaire can be used either as an online survey by sharing the public URL of the survey (not
the internal one that allows full access to the form) or as a script for direct interviews, i.e. through
Skype, phone calls or even face to face.
The questionnaire can also be used for collecting preliminary feedback and requirements from
registered participants of any of the events planned for collecting requirements (see next section).
Then, participants may elaborate on their initial feedback through the interactive sessions of these
events, which will also allow the validation of the personas and the feedback acquired through them.
The online questionnaire may remain open / active for the collection of as many responses as
possible. It can also be used for the collection of feedback on solutions and features at a later stage,
when the scenarios for each use case will run.
3.1.2 Interviews
Another mean for the elicitation of requirements is through interviews. Interviews may be conducted with potential stakeholders, using the same set of questions included in the online questionnaire, in any of the following means:
face to face
through Skype
through a phone call
Responses collected through the interviews should be recorded either using the online questionnaire (so that the same spreadsheet will contain all the responses) or in a spreadsheet
PUBLIC
D4.1 - Requirements methodology Page 43
that contains the same fields as the one provided by the online form, in order to ensure the homogeneity of the templates and facilitate the aggregation of responses from various means.
A minimum of fifteen (15) interviews per use case are expected to provide sufficient feedback for the purposes of this task.
3.1.3 Using the Persona Graph
The Persona Graph consists of four (4) different boxes that need to be completed in details:
1. Persona characteristics (demographics): Includes name, role in team, affiliation, research interests, field
of expertise etc.
2. Data/content related requirements: What data/content types are of interest to the specific persona
and which one is this persona currently using?
3. Key information challenges: Does this persona face any challenges related to identifying, accessing,
retrieving and managing data/content of interest?
4. Features of the solution: What is the expected solution that would help the persona address the
aforementioned challenges?
The following figure provides an example of the information collected by a specific user for building the persona profile.
FIGURE 13: INFORMATION ABOUT A PERSONA TO BE TRANSFERRED TO A PERSONA GRAPH
It should be noted that during the collection of information for each persona, the importance of each need and challenge should also be mentioned and the corresponding input should be prioritized accordingly in the persona profile table. More specifically, the most important statement for each category should be at the top of the list while the least important one at the bottom of the same list.
PUBLIC
Page 44 D4.1 - Requirements methodology
3.2 Validation of users’ requirements
After the targeted personas have been identified and their initial profiles have been created (including demographics, content-related issues and proposed solutions), they need to be validated in order to ensure their accuracy and eligibility for being used as the basis for the project’s envisaged services. This validation may take place in the form of smaller- or larger-scale events.
Small scale events include interviews, that can take place either face to face, through phone or online (e.g. Skype calls), as described earlier.
Large scale events include meetings with teams of stakeholders and even workshops. They can facilitate a high number of participants and should include sessions for teamwork. Such events need to be based on a well-defined agenda that will include at least the following sessions:
1. Introduction to the event, including the methodology to be followed throughout the event;
2. Introduction to the OpenMinTeD project;
3. Presentation of the specific user community and related content-related issues;
4. Interactive session for the profiling of a persona (Persona graph);
5. Interactive session for the design of initial wireframes of the envisaged TDM-powered services (hand-
drawn of the wireframe on paper;
a. It will be useful if text miners partners will be included into the discussions in order to guide the
participants
6. Interactive sessions for the definition and design of user-related workflows that will highlight the
interaction of the users with the aforementioned services (hand-drawn of the workflow on paper).
a. It will be useful if text miners partners will be included into the discussions in order to guide the
participants
The outcomes of each user requirements’ event should include the following:
1. input per persona in the Persona Graphs provided by the facilitators. It should be noted that in the
case of such events, there should be parallel interactive sessions running per persona. The outcome
should look like the one in Figure 4 of Section 2.5.
2. a visualization of the envisaged services to be developed by OpenMinTeD (per persona) (hand-
drawn, using markers and paper)
3. a proposed workflow per persona (hand-drawn, using markers and paper), explaining the use of such
as service and the role of different types of users.
Chapter 4 of this document provides detailed information on the preparation, implementation and reporting of events like workshops for the elicitation of user requirements.
3.3 Designing final wireframes with required features for the
information services of the use cases
Through the aforementioned means of acquiring feedback from the specific user types, a number of expected services are expected to be identified, described and analyzed for each persona within each one of the use cases studied by the project. Such brainstorming sessions should take
PUBLIC
D4.1 - Requirements methodology Page 45
place in the context of workshops, where different types of potential end users are expected to participate.
As regards the design of wireframes, the following steps should be followed:
1. Initial wireframes will be drawn by the participants of the events (one wireframe per persona),
highlighting the way that these services will be integrated in existing websites, finders and other
sources of content.
2. Additional requirements and feedback collected through interviews or online questionnaire to be
taken into consideration for improving or revising the initial wireframes
3. Refined wireframes to be designed by the OpenMinTeD project partners who are involved in this task,
using software such as Balsamiq14, Pencil Project15 and Mockingbird16. The use of Balsamiq is
proposed but partners are allowed to use any related software they feel more comfortable with, also
taking into consideration the software’s business model.
4. Follow-up interviews to take place for the validation of the proposed wireframes, with selected
representatives of the targeted users’ community (solution validation)
Agro-Know will provide training on the aforementioned steps, focusing on the design of the wireframes and workflows using Balsamiq and other tools, where applicable.
3.4 Translating features into requirements for text mining services
and e-infra
The definition and description of the expected TDM-powered services will provide a number of requirements that are expected to shape and drive the development of the OpenMinTeD services. The features of these services will need to be transformed into the corresponding requirements so that the partners of the project will work on and provide the services described by the users.
These features will be collected from the aforementioned means of requirements’ elicitation, including the interviews, the online questionnaires and the workshops to be organized by the project partners for this purpose, and will have to be transformed into requirements to be used by the project’s technical partners for the development of text and data mining services.
14 Desktop app or web app with 30 trial https://balsamiq.com 15 Open source desktop app http://pencil.evolus.vn 16 Web app https://gomockingbird.com
PUBLIC
Page 46 D4.1 - Requirements methodology
4| TOOLS AND TEMPLATES The implementation of the methodology described in this document requires the use of tools for the extraction, collection and organization of the user requirements. In order to facilitate the process, a number of tools and templates have already been developed and shared with the project partners working on the user requirements. These will need to be adapted by the project partners in order to include the specific information required for meeting the requirements of specific user communities.
The following sections provide information on these templates, the full version of which can be found as an Annex to this document.
4.1 Generic questionnaire for user & content profiling
The initial mean for the elicitation of requirements is the online questionnaire that will be set up and shared with potential users of the TDM-powered services. In order to facilitate the process, a generic questionnaire was developed consisting of generic questions that can be revised by each partner responsible in order to meet the requirements of each community.
The generic questionnaire was developed as a Google Form and is available at http://goo.gl/forms/FopxFlNOao. It consists of twelve (12) questions in four (4) groups, as mentioned earlier in this document. The generic questionnaire contains questions based on the AGRIS use case (to be used as examples) as well as comments in each question that facilitate its adaptation; these comments should be removed before publishing and sharing the questionnaire with the users.
FIGURE 14: PART OF THE GENERIC ONLINE QUESTIONNAIRE WITH COMMENTS UNDER EACH QUESTION
The generic questionnaire can also be found in the Annex B of this document.
PUBLIC
D4.1 - Requirements methodology Page 47
4.2 Example of customized questionnaire for agricultural
personas
The first step in the process regarding the questionnaire is the adaptation of the generic questionnaire of each one of the use cases studied by the project. In this context, the first customized questionnaire that was developed for profiling members of a specific community is the one for the AGRIS use case. In this case, the number and structure of the questions were kept intact while the questions are adapted in order to be appropriate for the identification of requirements from the agricultural research and information management community.
FIGURE 15: PART OF THE AGRIS USE CASE ONLINE QUESTIONNAIRE
The specific questionnaire is addressed to the users of the AGRIS platform, who are mostly researchers aiming to retrieve research publications related to their work. In this context, all pre-defined responses have been adapted in order to be among the most prominent for the specific user community.
The questionnaire is currently available at http://gTo.gl/forms/c0fHKLNP7d and is ready to be circulated among the potential stakeholders and targeted users of the use case. It may also be found in the Annex C of this document.
4.3 Guidelines for the organization of requirements' workshops
per user partner (for persona validation & brainstorming)
The methodology proposed in this section is based on the organization and implementation of face-to-face workshops, which will allow the direct communication between the facilitator(s) and
PUBLIC
Page 48 D4.1 - Requirements methodology
the stakeholders. These workshops can vary in scale (from meetings with less than 5 people to larger scale events where participants will work in groups). The planning, organization and implementation of events, such as small-scale meetings and larger-scale workshops that will engage the stakeholders defined by the project and will allow the collection of requirements that will drive the outcomes of the project is an important task that will be analyzed in the following sections.
The basic steps for the extraction and validation of user feedback during a workshop are the following:
1. Sessions during which users will provide their requirements related to the envisaged services to be
provided by the OpenMinTeD project;
2. Design of the proposed envisaged services by users drawing on paper (flip chart paper).
In practice, during a workshop the project partners will only need to collect and record the challenges/issues for each persona, and ask participants to draw potential new services using markers and paper as well as to show how the user (each persona identified) interacts with the service (workflow design).
4.3.1 Practicalities
Basic aspects that should be taken care of for the planning and implementation of a workshop include:
Identification of the target group and selection of people to be invited. This will allow the
adaptation of the material provided in this section and the better organization of the event. The list of
participants for each workshop should take into consideration the personas identified for the specific
use case.
Invitation: After selecting the target group, potential participants should be formally invited through
an invitation letter (probably in the form of an e-mail).
Pre-Workshop Checklist: Before the beginning of the workshop you should make sure that the
following have been developed and are ready for usage:
o List of attendants
o Agenda
o Presentations
o Reporting Template
Running the Workshop: A number of practical issues should be examined before the implementation
of the Workshop, in order to ensure its success:
o Internet connection
o PC or laptop with PowerPoint installed for the presentations
o Projector (if needed) for delivering the presentations
o An internet browser, such as Mozilla Firefox, Google Chrome, Opera or Safari
o Flipchart sheets and/or whiteboard for drawing
Reporting: After the end of the workshop, facilitators are expected to provide a small report
including a description of the workshop, the feedback gathered and photos of the event using a pre-
defined template.
PUBLIC
D4.1 - Requirements methodology Page 49
4.3.2 Interactive Sessions
There are three envisaged interactive sessions for each workshop:
Interactive Session 1: Identification and definition of the OpenMinTeD personas. For each persona, the following information is required:
1. Persona Demographics
2. Data Types of interest for each persona
3. Top challenges related to accessing research information for the specific persona
4. Proposed solutions for the aforementioned challenges for the specific persona
Interactive Session 2: Design of the envisaged OpenMinTeD-related solution for each persona.
Interactive Session 3: Design of the workflows for each persona = depicting the interaction between the user and the new service
You can find more information about each one of these sessions in the PPT templates provided for each session in the Annex of this document.
4.3.3 Materials required for the implementation of the event
The organization of an event for eliciting user requirements requires the preparation of a number of materials in the form of agenda, presentations, script for facilitating the implementation of the event etc. The following sections aim to provide the necessary templates of various types of materials that are required for the implementation of an event. It should be noted that these templates are provided as generic material that provides a basic functionality; they can (and should) be adapted in order to meet the specific requirements of each event.
INVITATION TO THE WORKSHOP
A template for the invitation text is available through the project’s Redmine installation17 and also included in the Annex D of this document. This template can be freely adjusted in order to meet the style of the organizer as well as the specific user community to which it will be addressed.
AGENDA OF THE WORKSHOP (TEMPLATE)
The agenda of each event should be developed before the event, in order to ensure that all necessary sessions will be included and that the time available for the event will be allocated in the best possible way. In order to facilitate the development of an agenda, a template is provided. This template can be adapted in each case, in order to meet the specific needs of a specific event, based on the type of stakeholders, number of participants and other factors.
The agenda of the event is expected to include at least the following sessions:
an introduction to the Workshop;
a presentation of the project;
a presentation of the specific use case of interest to the workshop;
a presentation of the methodology to be followed during the workshop;
17 http://redmine.openminted.eu/attachments/download/21/1.%20OpenMinTeD_Invitation_to_workshop.docx
PUBLIC
Page 50 D4.1 - Requirements methodology
an interactive session for the extraction of requirements for each persona identified;
an interactive session for the visualization (design/drawing) of expected user interfaces for the services
identified in the previous session;
an interactive session for the visualization of the user workflows = how the user interacts with the new
service
a number of breaks, depending on the total duration of the event.
The minimum duration of a large-scale event is expected to be around 4,5 hours, which may be elaborated depending on the number of participants and the availability of experts delivering additional presentations on the topics of text and data mining, among others.
An indicative agenda of a Workshop is available through the project’s Redmine installation18 and also included in the Annex E of this document..
SCRIPT, FACILITATION & NOTES
In order to ensure the uninterrupted flow of an event, a script indicating the roles and expected activities of a team of facilitators should be prepared. The script refers to a list of activities that take place before, during and after the event, their expected duration and the people assigned for each activity.
Three major teams are expected to be involved in the preparation and implementation of such an event:
1. Facilitators: They will be responsible for delivering the presentations (at least the ones related to the
project), introducing external speakers and the general overview of the event. They are also
responsible for reporting back on the event.
2. Assistants: A number of people assisting with practicalities will be required. They are the ones
responsible for the documentation of the event (note keeping, taking photos and videos), practical
arrangements (e.g. setting up the venue with laptop, projector, speakers, whiteboards and flipcharts)
as well as for ensuring that participants will have access to whatever needed during the event (e.g.
sheets of paper, markers etc.).
3. TDM experts: A number of experts involved in each use case are welcome to take part and guide the
participants (it is not required). In case no TDM experts from the project partners participate to the
events, they will need to help the use cases partners to develop the final wireframes at the next stage.
An indicative script of a Workshop is available through the project’s Redmine installation19, as well as in the Annex F of this document.
PRESENTATIONS (TEMPLATES)
There are three presentations needed for the basic implementation of an event:
1. A presentation of the project, focusing on the project's aims & objectives of relevance to the audience
of the event;
2. A presentation of the specific use case / user community of interest to the workshop;
18http://redmine.openminted.eu/attachments/download/3/1.%20OpenMinTeD_Event_agenda.docx 19 http://redmine.openminted.eu/attachments/download/16/2.%20OpenMinTeD_Event_script.xlsx
PUBLIC
D4.1 - Requirements methodology Page 51
3. A presentation that will provide information on the methodology to be followed during the event, as
well as the key concepts. A template/example of this use case is available through the project’s
Redmine installation20, as well as in the Annex G of this document.
All three presentations should be revised accordingly in order to be appropriate for the specific audience of the event.
Additional slides will be required for introducing the participants to the concept of the three interactive sessions that are expected to be included in the agenda of the event.
1. Interactive Session I: Identification of requirements. A template/example case is available through the
project’s Redmine installation21, as well as in the Annex H of this document.
2. Interactive Session II: Visualization of solutions. A template/example case is available through the
project’s Redmine installation22, as well as in the Annex I of this document.
3. Interactive Session III: Design of user workflows. A template/example case is available through the
project’s Redmine installation23, as well as in the Annex J of this document.
REPORTING FORM (TEMPLATE)
All partners organizing a workshop are kindly requested to provide a short report about it so that we can promote their work and make the most out of the feedback received. The following should be included per persona for each use case, as a part of the report:
1. Persona Graph: A Persona Graph for each different persona represented in the workshop should be
prepared and collected. An example from the Kick-off meeting for the Agro-Wheat Use Case AS3 is
available online24.
2. Hand-drawn wireframe on paper: An example from the Kick-off meeting for the Agro-Wheat Use
Case AS3 is available online25.
3. Hand-drawn Workflow on paper, depicting the interaction between the user and the new service
described above. An example from the Kick-off meeting for the Agro-Wheat Use Case AS3 is
available online26.
This short report should also include the name of the event during which the workshop was held, some information regarding the event and information about the participants (number, background, main area of work and expertise). Then a short description of the workshop should follow presenting its structure and focusing on the feedback received. Also any other interesting comments and/or ideas from participants should be included, so that feedback received as a result of face-to-face interaction can be collected. Also, presentations made as well as any photos taken should be included in the report.
A template for the reporting form is available through the project’s Redmine installation27 and is also available in the Annex K of this document.
20 http://redmine.openminted.eu/attachments/download/22/4.%20OpenMinTeD_Workshop_Methodology_template.pptx 21 http://redmine.openminted.eu/attachments/download/17/Interactive%20Session%20I.pptx 22 http://redmine.openminted.eu/attachments/download/19/Interactive%20Session%20II.pptx 23 http://redmine.openminted.eu/attachments/download/18/Interactive%20Session%20III.pptx 24 http://redmine.openminted.eu/attachments/download/12/UseCase3_PersonaGraph.jpg 25 http://redmine.openminted.eu/attachments/download/13/UseCase3_Wireframe_Workflow.jpg 26 http://redmine.openminted.eu/attachments/download/13/UseCase3_Wireframe_Workflow.jpg 27 http://redmine.openminted.eu/attachments/download/24/5.%20OpenMinTeD_Reporting_Template.docx
PUBLIC
Page 52 D4.1 - Requirements methodology
4.4 Final interface wireframes to revisit & refine per user partner
and information service
Each partner responsible for collecting feedback for a specific use case should ensure the collection of requirements and feedback from all available activities (workshops, online questionnaire and/or interviews) and follow the steps described below:
1. Drawing of the proposed wireframes using Balsamiq or any other related software;
2. Drawing of the proposed workflows using PowerPoint.
In both cases, Agro-Know can help by providing trainings on how these can be implemented.
As a next step, the project partners responsible for this task should validate the outcomes related to the wireframes and the workflows. This can happen by presenting them to stakeholders and potential end users either online (e.g. through Skype) or through focused meetings with a small number of participants (e.g. 5-10 people). During this step, project partners should be able to showcase the envisaged new services and explain their main functionalities. The aim of this step is to understand if these new services are of interest and use for the end users.
Project partners are kindly requested to provide the Agro-Know team with a list of people that have contributed to the validation of these services (including name, affiliation and email) so that they can be contacted for further information and clarifications, if needed.
PUBLIC
D4.1 - Requirements methodology Page 53
5| SCHEDULE OF NEXT STEPS This chapter presents a plan showing when each requirement analysis step is planned to take place, gathered & analysed. The figure below presents the Gantt of the various steps and deliverables, providing an overview of the process, while Table 5 presents an overview of all steps and their results/deliverables, the partners responsible and the respective deadlines.
FIGURE 16: REQUIREMENTS ELICITATION GANTT
TABLE 7: IMPORTANT DATES
Tasks / Deliverables Responsible Deadline
Generic online questionnaire AK 31/07/2015
Online questionnaire customisation and setup User partners 31/08/2015
Submission of results from online questionnaire to AK User partners 30/09/2015
Schedule 1st round of interactive sessions User partners 15/09/2015
Run 1st round of interactive sessions and submit updated personas matrix, initial wireframes and workflows to AK
User partners 31/10/2015
Interim Requirements Report AK 30/11/2015
Schedule 2nd round of interactive sessions User partners 15/12/2015
Run 2nd round of interactive sessions and submit updated personas matrix, initial wireframes and workflows to AK
User partners 15/01/2016
Interim Requirements Analysis Report AK 29/02/2016
Design High Fidelity Wireframes User partners 29/02/2016
Translate features into requirements for text mining services & e-infra
AK 30/04/2016
D4.2 Community Requirement Analysis Report AK 31/05/2016
D4.3 OpenMinTeD functional specifications ARC 31/07/2016
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Tasks and Deliverables M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14
Initial profiling of targeted personas
Online Questionnaire Customisation and Setup
Online Questionnaire Launch and Data Collection
Validation of users’ requirements
Schedule Interactive Sessions
Run Interactive Sessions
Interim Requirements Report
Design High Fidelity Wireframes
Translate features into requirements and specs for
text mining services and e-infra
D4.2 Community Requirement Analysis Report
D4.3 OpenMinTeD functional specifications
Deliverable
Milestone
20162015
PUBLIC
Page 54 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 55
6| CONCLUSIONS The methodology described in this document is intended to be used for the elicitation of requirements from the targeted end-users. The related work undertaken so far includes the identification of the user communities of interest for the project, as well as the definition of a number of use cases that are expected to be studied by the project. Through these use cases, personas of interest to the project will be engaged to activities related to the elicitation of requirements that will eventually drive the TDM-powered outcomes of the project.
The methodology consists of a number of well-defined steps that should be completed for the successful acquisition of requirements. These series of steps allow the building of the targeted persona as well as the extraction of requirements in the following way:
1. Elicitation of initial feedback using an online questionnaire or interviews:
a. Initial profiling of a target persona through the online questionnaire, in terms of
demographics;
b. Collection of content-related requirements, such as content-related challenges and sources;
c. Identification of potential or expected solutions to the aforementioned challenges;
2. Acquisition of feedback regarding the
a. Functionalities of the envisaged by the users services;
b. Expected user interface of the new service(s), in the form of hand-drawn wireframes;
c. Interaction between the users and the new service(s) in the form of hand-drawn workflows /
flowcharts.
3. Validation of feedback collected
a. During focused workshops, to be attended by various personas included in each use case;
b. Through interviews or focus groups.
4. Transformation of feedback collected into requirements that can be used for the development of the
project’s TDM-powered outcomes;
The results acquired through this process are expected to exhibit a significant diversity in terms of terminology, level of familiarity with the proposed technologies, community maturity, but at the same time they are expected to converge in several areas of expectancies and needs.
The next steps of this process include the analysis and harmonization of the requirements collected during this process and the subsequent synthesis of the results into a single report that allows:
categorization of users (typology) and prioritization of the requirements;
better understanding of end user requirements from the text mining research community;
better use of available resources at the stage of the applications design and implementation, by
reducing the number of systems that can handle user requirements and targeting more sustainable /
reusable technology.
These activities are expected to take place in the context of the Task T4.2 “Requirements Analysis and Harmonization”, which will take place as soon as the elicitation of requirements has been successfully completed.
PUBLIC
Page 56 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 57
7| REFERENCES
[1] Bradbeer, J. (1999). “Evaluation”, FDTL Technical Report no. 8, University of Portsmouth
[2] Farbey, B., Land, F. and Targett, D. (1993). “How to Assess your IT investment: A Study of Methods
and Practices”, Oxford: Management Today and Butterworth-Heinemann.
[3] Manouselis, N., Psochios, Y., Marianos, N., Nikoulina, V. and Bosca, A. (2012). Deliverable D7.1
“Evaluation and Validation Plan”, Organic.Lingua ICT-PSP project.
[4] Maurya, A. (2012). Running Lean: Iterate from Plan A to a Plan that Works. O'Reilly Media; 2nd
edition (March 9, 2012). ISBN-10: 1449305172.
[5] Ries, E. (2011). The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create
Radically Successful Businesses. Crown Business; 1st Edition (September 13, 2011). ISBN-10:
9780307887894
[6] Swinkels, F.G. (1997). “Managing the life cycle of Information and Communication Technology
investments for added value.” In European Conference on the Evaluation of Information Technology.
Delft: Delft University Press.
[7] Willcocks, L. and S. Lester. (1996). “The evaluation and management of information systems
investments: from feasibility to routine operations.” In Investing in information systems: evaluation and
management, ed. L. Willcocks. London: Chapman & Hall.
PUBLIC
Page 58 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 59
8| ANNEX A: PERSONA GRAPH TEMPLATE Also available as a separate file at the project’s common space.
PUBLIC
Page 60 D4.1 - Requirements methodology
9| ANNEX B: GENERIC ONLINE QUESTIONNAIRE Available at:
https://docs.google.com/forms/d/1_OBM6cAl28MdhMSMU20xo2TePwses_JGhpnNoZitG2k
PUBLIC
D4.1 - Requirements methodology Page 61
PUBLIC
Page 62 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 63
PUBLIC
Page 64 D4.1 - Requirements methodology
10| ANNEX C: ONLINE QUESTIONNAIRE EXAMPLE (AGRIS USE CASE)
Available at:
https://docs.google.com/forms/d/1iCNCkp5l9k8oBNQbT0ngZaeqyRp2lEl8tLVwPjK_60Q/viewform
PUBLIC
D4.1 - Requirements methodology Page 65
PUBLIC
Page 66 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 67
PUBLIC
Page 68 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 69
11| ANNEX D: INVITATION TO THE WORKSHOP Also available as a separate file at the project’s common space.
Invitation to attend a workshop on (title of the workshop here)
Dear colleague,
By this email I would like to invite you to participate in the workshop that we will be holding on xx/xx/xx at the (place here) focusing on the identification of user requirements .................:
During this workshop participants will have the chance to get introduced to the OpenMinTeD project which is a European initiative (www.openminted.eu) that aims to enable the creation of an infrastructure that fosters and facilitates the use of text mining technologies in the scientific publications world, builds on existing text mining tools and platforms, and renders them discoverable and interoperable through appropriate registries and a standards-based interoperability layer, respectively.
During the workshop you will be also asked to participate in two interactive sessions aiming to collect requirements regarding the use of text and data mining technologies in the field of [name of field here]
This survey will also allow participants to express their ideas on the design of the new services and applications that they have in mind, by drawing the user interface of these and sharing ideas on their integration in existing Web interfaces, like Web portals.
We want to better understand the current situation, as well as the needs and requirements of all people involved in research in the fields of [name of field here]. Your opinion is of great value to us. Therefore we would kindly like to ask you for your participation in our workshop.
For more information about the workshop and to let us know whether you are interested in participating please contact (your email here).
On behalf of the organizers,
Kind regards,
[Name of the project partner sending the invitation]
PUBLIC
Page 70 D4.1 - Requirements methodology
12| ANNEX E: TEMPLATE FOR THE AGENDA OF A WORKSHOP Also available as a separate file and online at http://redmine.openminted.eu/attachments/download/3/1.%20OpenMinTeD_Event_agenda.docx
OpenMinTeD:
Open Mining INfrastructure for TExt and Data
User Requirements’
Workshop/Meeting Agenda
XX/XX/2015, City, Country
This project is funded under the Horizon 2020 programme, H2020-EINFRA-2014-2 | RIA - Research and Innovation action.
PUBLIC
D4.1 - Requirements methodology Page 71
Location & duration
The meeting will be held on Friday, 24th of July 2015, from 14.00 to 18.40 at the [location].
Agenda
Time Session
14:00-14:10
Welcome Setting up & Introduction to the event (15 mins)
14:10-14:30
Introduction to OpenMinTeD
Introduction to the OpenMinTeD project (general, aims & objectives) (20 mins)
14:30-14:50
About the event Aims & objectives, and methodology to be followed in
the event (20 mins)
14.50-15.10
Presentation of the specific community
Presentation of the community represented at the Workshop: User types, data sources and data-related
issues
15:10-15:20
Short break
15:20-16:20
Interactive Session I
Identification of requirements
(persona characteristics, data requirements, challenges, proposed solution)
16.20-17.00
Interactive Session II
Drawing the wireframes
17:00-17:20
Coffee break
17:20-18:00
Interactive Session III
Design of user workflows
18:00-18:20
Wrap-up Overview of the workshop, discussions on the outcomes
Approximate duration of the event: 4,5 hours (duration may vary)
PUBLIC
Page 72 D4.1 - Requirements methodology
Useful information
Reaching the meeting premises
Please provide short and useful information on how the participants will be able to reach the meeting premises using public transportation or their own means. A map (screenshot or a URL to a Google map) would also be useful in some cases.
FIGURE 17: EXAMPLE OF A MAP GUIDING PARTICIPANTS TO THE MEETING PLACE
Necessary material
Will the participants need something specific with them during the meeting? Will they need e.g. their own laptops or to work on their own ideas before the workshop? If so, please mention that here.
PUBLIC
D4.1 - Requirements methodology Page 73
List of Participants
This section will be useful for keeping track of the participants of the event. Alternatively, an online registration form can be used; however, the people that actually participated need to be highlighted.
Name Organization Role email
PUBLIC
Page 74 D4.1 - Requirements methodology
13| ANNEX F: SCRIPT FOR RUNNING A WORKSHOP Also included as a separate file at the project’s common space.
Slot Time Basic Event ActivitiesCoordinated by
(duration)Time Support Activities People Assigned Notes
13.40-14.00 Make sure all the material needed are available Person 1 (30') Assistant 1
Refers to all necessary materials (e.g. post-
its, whiteboard, flipchart sheets, markers,
laptop & projector etc.)
13.30-13.00 Discuss with local hosts to make sure everything is OK Person 2 (30') Assistant 2
Ensure wireless connectivity to the
internet, availability of PPTs in laptop,
working power outlets, set up projector
etc.
13.40-14.00 Short meeting to go over the entire meeting Facilitators (20') Allocate presentations
Event starts
Person 1 (10') 14.00-14.10 Documentation with photos Assistant 1
Person 2 (10') 14.00-14.10 Helping people to setup Assistant 2
14.10-14.50 Introduction to OpenMinTeD & About the event Facilitators (40') 14.10-15.10 Help facilitators switch between PPts Assistant 1
14.50-15.10 Facilitators lead the group into a group activity (to be decided) Facilitators (20') 14.10-15.10 Documentation with photos, videos Assistant 2
15.10-15.20 Short break to stretch feet and get up - not a long coffee break Person 1 (10') 15.10 Announcement of short break Assistant 1Help participants if needed (e.g. smoking
area)
15.20-15.35Facilitators explain the concept of the interactive session to the
participantsFacilitators (15') 15.20-16.20
Organize participants in groups if needed, explain the
concept
15.35-16.10Participants work on the identification & recording of
requirements using the templates providedParticipants (60') 15.20-15.30 Ensure availability of paper and markers Assistant 1 (10')
16.10-16.30 Announce end of Session I, participants to finalize their work Facilitators (15') 16.10-16.20Ensure that all groups will finalize their work &
documentation of session with photosAssistant 2 (10')
16.20 Upcoming coffe break Person 1 (30') 16.00-16.20 Arrange practical details for coffee breaks coming up Assistant 1 (20') Ensure coffee availability, snacks etc.
16.20-16.30Facilitators explain the concept of the interactive session to the
participantsFacilitators (10') 16.20-16.35 Documentation with photos Assistant 2 (15')
16.30-17.00 Participants work on the drawing/design of wireframes Participants (30') 16.20-16.30 Ensure availability of paper and markers Assistant 1 (10')
16.50-17.00 Announce end of Session II, participants to finalize their work Facilitators (10') 16.50-17.00Ensure that all groups will finalize their work &
documentation of session with photosAssistant 2 (10')
16.55 Assistant 1 makes the announcement for the coffee break Assistant 1 (5')
17.00-17.15 Facilitators meet to briefly discuss upcoming phase Facilitators (15')
17.15Assistant 1 makes the final announcement for people to come
back in the roomsAssistant 1 (5')
17.20-17.30Facilitators explain the concept of the interactive session to the
participantsFacilitators (10') 17.20-17.40 Documentation with photos Assistant 2 (20')
17.30-18.00 Participants work on the drawing/design of workflows Participants (30') 17.20-17.30 Ensure availability of paper and markers Assistant 1 (10')
17.50-18.00 Announce end of Session II, participants to finalize their work Facilitators (10') 17.50-18.00Ensure that all groups will finalize their work &
documentation of session with photosAssistant 2 (10')
18.00-18.10 Overview of the workshop, discussions on the outcomes Facilitators (10') 18.00-18.20 Documentation with photos Assistant 1 (20')
18.10-18.20 Participants to provide their feedback on the workshop Participants (10') 18.20-18.30
Adapt online feedback questionnaire to meet the needs of the
eventFacilitators (30')
Share questionnaire with participants through email Assistant 1 (10')
Evaluate outcomes of the event and prepare report Facilitators (3h)
Organize and share photos & videos from the event Assistant 1 (60')
13.30-
14.00
16.20-
17.00
Interactive Session III
14.10-
15.10
17.20-
18.00
Event organizers go to the venue to make sure that everything is in place. More specifically:
Interactive Session II
Short break15.10-
15.20
15.20-
16.20
Interactive Session I
17.00-
17.20
Presentations
14.00-
14.1014.00-14.10 Setup, settle down, introduction
18.00-
18.20
After the event
Coffee break
Conclusions / Wrap-up
PUBLIC
D4.1 - Requirements methodology Page 75
PUBLIC
Page 76 D4.1 - Requirements methodology
14| ANNEX G: METHODOLOGY OF THE WORKSHOP PPT TEMPLATE Also included as a separate file at the project’s common space.
PUBLIC
D4.1 - Requirements methodology Page 77
PUBLIC
Page 78 D4.1 - Requirements methodology
15| ANNEX H: INTERACTIVE SESSION I PPT TEMPLATE Also included as a separate file at the project’s common space.
PUBLIC
D4.1 - Requirements methodology Page 79
PUBLIC
Page 80 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 81
16| ANNEX I: INTERACTIVE SESSION II PPT TEMPLATE Also included as a separate file at the project’s common space.
PUBLIC
Page 82 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 83
17| ANNEX J: INTERACTIVE SESSION III PPT TEMPLATE Also included as a separate file at the project’s common space.
PUBLIC
Page 84 D4.1 - Requirements methodology
PUBLIC
D4.1 - Requirements methodology Page 85
18| ANNEX K: REPORTING FORM FOR A WORKSHOP Also available as a separate file at the project’s common space.
Reporting Template for Organizers of the OpenMinTeD Workshops
For User Requirements
Formal Data:
Name of Institution organizing the event:
Date of event:
Location of event:
Number of Participants:
Type of participants:
Names of facilitators:
Any other supporting person? If yes, please list the name and function of this person:
Summary of the Workshop:
Content related Data:
List issues/questions raised during the session Presentation of the OpenMinTeD project
List issues/questions raised during the session Presentation of the Methodology
List issues/questions raised during the Interactive Session I
PUBLIC
Page 86 D4.1 - Requirements methodology
List issues/questions raised during the Interactive Session II
List issues/questions raised during the session Closing - Discussion
Other comments?