+ All Categories
Home > Documents > A Case Study on the Content Curation for the Improving...

A Case Study on the Content Curation for the Improving...

Date post: 12-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
33
A Case Study on the Content Curation for the Improving Effectiveness of Research Report 2018.12.04 20 th GreyNet International Conference Seokjong Lim Content Curation Center, KISTI [email protected]
Transcript
Page 1: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

A Case Study on the Content Curation for the Improving Effectiveness of

Research Report

2 0 1 8 . 1 2 . 0 4

20 t h G reyNe t I n t e rna t i ona l Con fe rence

Seokjong Lim

Content Curation Center, [email protected]

Page 2: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

CONTENTS

I. Content Curation?

II. The KISTI Curation Model

III. The KISTI Curation Cases

Page 3: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

I. Content Curation

Page 4: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

1.1 Definition of Content Curation (1/3)

4

Is a set of activities to systematically store data sets for current and future users

to encourage the reuse of content and to valorize produced content.

- Examples: research results, research data, public (government) data,

and cultural heritage data. [Source] Digital Curation Centre. What is digital curation?

Is a set of active and ongoing data management actions to render the data lifecycle

useful for science and education.

Data discovery and search, quality management, valorization, and reuse.

Related areas

Authentication, Archiving, Management, Preservation, Retrieval, and Representation.

[Source] CLIR. What is data curation? https://www.clir.org/initiatives-partnerships/data-curation/

Data Curation?

Digital Curation?

Page 5: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

5

• Is a set of activities that systematically collect

science and technology content according to a

standardized protocol, and establish a database

accordingly to promote reuse. It valorizes results

to strengthen the research impact of Korean

researchers

1.1 Definition of Content Curation (2/3)

Page 6: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

6

gets started with a brand new concept something like a specific subject or interest.

making a new category according to the defined concept.

becoming to get constructive content something different.

UserDiscovery

Search

Reuse

QualityMaintenance

Give Value

Sustainable data management

Content Curation

1.1 Content Curation (3/3)

Page 7: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

1.2. The Needs for Curation

7

General need for curation

The amount of data is increasing exponentially due to the heated research competition, the increased number of researchers, and the rapid advancement of IT technologies.

The range, forms, and amount of data content vary largely. Thus, it is necessary to develop new methods to scientifically and systematically collect, manage, and store contents considering their future reuse.- Digital Curation Centre. What is digital curation? http://www.dcc.ac.uk/digital-curation/what-digital-curation

“Providing optimized content for users in the age of information overload.”- Duyeong Heo. Contents Curation, 2016.

It is necessary to establish a long-term plan for data file extension, system operation, and media conversion to ensure the continued use of data despite the rapid advancement of technologies.– Ross Harvey. Digital curation: A how-to-do-it manual., 2010.

It is necessary to prepare for the use and reuse of data by current and future users.

Page 8: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

1.2. The Needs for Curation

8

KISTI’s need for curation

A content curation model that reflects the KISTI’s missions is required to

research, develop, and establish a service framework for the knowledge

information infrastructure of science and technology.

A content management policy for the age of big data is necessary to properly collect,

analyze, and provide science and technology information that meets the needs of

researchers.

It is important to continue to develop relevant technologies and policies for science

and technology information, as well as to standardize its management and distribution,

in order to support the development of science and technology and key industries of

Korea.

A content curation model is required to establish a national high-value-added inf

ormation infrastructure.

Page 9: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

II. The KISTI Curation Model

Page 10: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

2.1 Methods

10

Literature analysis

Literature review on curation models and analysis of related internal KISTI

manuals and documents.

In-depth interview with KISTI staff

Focused on 1:1 interview

Conducting an in-depth interview with field staff to identify the current

agenda of KISTI content establishment and the switch to the

content curation system.

Benchmarking outstanding

modelsBenchmark analysis

Benchmarking outstanding curation lifecycle models in other countries,

including DCC, DCC&U, and UC3.

KISTI Curation Model testing

KISTI model testingKISTI Curation Lifecycle Model testing in consultation with the Digital Curation Center (a globally renowned British

research institute specialized in digital curation) and Korean digital curation

experts.

Methods

Page 11: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

2.2 Benchmark

11

Page 12: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

Creation Acquisition Database Service

The range of content curation

Content Curation Center

2.3 The Range of Content Curation

12

Page 13: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

2.4 The KISTI Curation Lifecycle Model (Draft)

13

* This tentative model is currently under development and will be finalized in consultation

with the British Digital Curation Center and Korean experts (estimated to be completed by

November 2018).

Emphasize performance by tasks by focusing on the job role of the department(s) responsible for curation based on the hierarchical actions of the organization

A model demonstrating the main curation actions performed by KISTI

Page 14: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

*SA : Semantic Description

Create/ReceiveAppraise/

SelectSemantic

Description

Individual

identification

Organization

identification

Term identific

ation

Subject sorting

Reference item identific

ation

Metaextracti

on

Non-textual table and

figure extracti

on

Original content convers

ion(PDF ->XML)

Original content convers

ion(PDF ->PDF/A)

Research and data

connection(DLI)

Individual name recogni

tion

Personal

information

removal

DOIregistra

tion

Funding informa

tion connect

ion

Similarity check

KnoBaS(~2017)

KnoBaS(~2017)

KnoBaS(~2017)

Developed in 2018

Developed in 2018

S&T(~2017,

food area)

NRMS(~2017)

NRMS(~2017)

NRMS(~2017)

NRMS(구매)

(2017~)KDCRPMSNRMS

(2017~)

Connecting ISNI, ORCID, and KRI (scientist and technician registration number)

Automatic identification

Connecting papers with NTIS

Automatic sorting Automatic extraction

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Content description(Semantic description)

Ingest

Conceptualize Dispose

SA

2.5 Korean Paper Collection (K-Paper) (4/9)

Sequential Actions: SA

14

Developed in 2018

Page 15: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

SCOPUS

Authoridentification

OrganizationidentificationLiterat

ureidentification

Funding

information

Subject

classification code

Citations

Tables and

figuresPaper registration

information

Research

publication

information

Original

URL

Statistics

DOI

DLI

KIS

TICro

ssRef

SA

*SA : Semantic Description

License

information

Abstract

2.6 Korean Paper Collection (K-Paper) (5/9)

Sequential Actions: SA

15

Page 16: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

2.7 Korean Paper Collection (K-Paper) (6/9)

Occasional Actions: OA

16

*OA : Dispose

Individual

identification

Organization

identification

Termidentific

ation

Subject classific

ation

Reference item identific

ation

Meta-extracti

on

Non-text

(tables and

figures) extracti

on

Original docume

nt convers

ion(PDF ->XML)

Original docume

nt conversion(PDF

->PDF/A)

Research

data connect

ion(DLI)

Entity name

recognition

Erasing persona

l informa

tion

DOIregistra

tion

FundingInforma

tion

Similarity check

KnoBaS(~2017)

KnoBaS(~2017)

KnoBaS(~2017)

2018개발 2018개발 2018개발S&T

(~2017, food area)

NRMS(~2017)

NRMS(~2017)

NRMS(~2017)

NRMS(구매)

(2017~)KDCRPMSNRMS

(2017~)

Linking of ISNI,ORCID & KRIReferenceidentification

Connecting paper-NTIS SubjectClassification

Metadataextraction

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Semantic description

Create/Receive

Appraise/Select

Semantic Description

Ingest

Conceptualize Dispose

SA

Page 17: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

FA: Enhancement Content Curation rates

Full lifecycle actions:

improvement

Results planning

◦ Setting a target: KISTI paper content curation (detection

rate) 30%

Improving results

◦ Monitoring paper content curation (detection rate) target

and management

Measuring results

◦ Measuring and checking the content curation (detection

rate) target

2.8 Korean Paper Collection (K-Paper) (7/9)

Full lifecycle Actions: FA

17

Page 18: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

III. The KISTI Curation Case

Page 19: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.1 Development of Elementary Technology for Curation (1/6)

Automatic metadata extraction

19

Page 20: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.2 Development of Elementary Technology for Curation (2/6)

Metadata automatic extraction

Applying rule-based and machine-learning-based automatic metadata extraction technology

20

PDF structure analysis

Metadata extraction

through neural network

Rule-basedmetadata extraction

Metadata tagging

Metadata DB

XML for inspection

Neural network inputCONLL

Paper in PDF

Neural network input

data conversion

MetadataJATS-XML Output results

conversion

Page 21: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.3 Development of Elementary Technology for Curation (3/6)

Automatic subject classification

Purpose: to develop a subject classification technology for academic papers and apply it to the curation service model.

Subject classification method: metadata (title, abstract, and keyword) are inputted. Noun clusters are created and keywords are classified.

Stages of subject classification: PDF -> noun extraction -> noun vectorization -> application of the cluster model -> application of in-depth learning -> classification

21

PDF paper Text extraction Noun extraction Embedding VectorGrouping

Subject SelectionCONLL formatting

Deep Learning multi encoder(CNN+RNN) basedSubject Classification Model

Results of subject

Metadata Mapping

Page 22: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.4 Development of Elementary Technology for Curation (4/6)

Automatic reference identification

22

Reference information

CONLL formatting

Bi-RNN+CRF basedAutomatic reference identifier

Information of predicted results

Extraction results

Page 23: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.5 Development of Elementary Technology for Curation (5/6)

Generating author/organization identification data

23

Page 24: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.6 Development of Elementary Technology for Curation (6/6)

Personal information detector

Detects personal information in electronic documents and removes only parts containing personal information

24

Initial screen of the personal informationautomatic detector (client version)

Select electronic document for detection

Personal information detection and view detection results

Removing personal information

Personal information detection and removal report (in EXCEL)

Personal information detection and removal report (in EXCEL)

Page 25: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.7 National R&D Reports (1/8)

25

Searches required information and provides results from the entire texts contained in

original reports.

Specialized search

Searches entire reports and provides non-textual results (such as table and

image) from reports

Non-textual search

Searches complete references of an original report.

Reference search

Analyzes keywords of the target digital report and shows the entire content of the original report in the form of a graph.

Keyword summary graph

Detects and removes personal information contained in digital reports.

Automatic personal information detector

Page 26: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

Advanced search result (Keyword: stem cell)

3.7 National R&D Reports (2/8)

Advanced search service for original reports [Search > general search, advanced search]

26

Advanced search results

Search sections (title, abstract, background, Introduction, discussion,

and conclusion)

Search keywords

Downloading individual

search result reports

A comprehensive report of the search

results

Detailed results of advanced search

Page 27: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.7 National R&D Reports (3/8)

Non-textual search in the original text of reports [Search > Non-textual search]

27

Non-textual search result (keyword: particulate matter)

Viewing the original report

containing images

Integrated download function for chosen

images

Page 28: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.7 National R&D Reports (4/8)

References in original reports [Search > Reference search]

28

Reference search result (keyword: particulate matter)Selecting reference types

Report registration number isrequired for search inividualreports

Page 29: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

keyword요약그래프 사용법

3.7 National R&D Reports (5/8)

Report content analysis and keyword summary [Search > Advanced search]

29

Page 30: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.7 National R&D Reports (6/8)

30

① Run the automatic personal information detector and click ‘detect personal information’

② Click ‘search’ , and select and uploadelectronicdocument

③ Click ‘detect personal information’ and ‘removepersonal information’ to remove personal information

④ Download the personal information removal file and check the removal result

Detectpersonal

information

Removepersonal

information

search

Personal information

removalreport

Download individual personal information

removal files

Screen shot of electronic document with personal information removed

Page 31: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.7 National R&D Reports (7/8)

Digitalized original reports are available to the general public on NDSL.

31

Digitalized reports available on NDSL

Page 32: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

3.7 National R&D Reports (8/8)

Applied functions available, including report content search, text search, and image search in connection with project information.

32

Digitalized reports available on NTIS

Original content available through the connection to NDSL

Non-textual search function

Advanced search function

Page 33: A Case Study on the Content Curation for the Improving ...greyguide.isti.cnr.it/attachments/category/33/GL20_Lim.pdf · A content curation model that reflects the KISTI’s missionsis

Recommended