ReDefine - Research articles summarization

Post on 23-Jan-2017

30 views 1 download

transcript

ReDefine – Research Articles SummarizationReDefine – Research Articles Summarization

Presentation by Presentation by Dig Vijay Kumar YarlagaddaDig Vijay Kumar Yarlagadda

dy5kc@mail.umkc.edudy5kc@mail.umkc.edu

MotivationMotivation

• Publications / research articles are often complex and difficult to understand for a variety of reasons

including:

• New terms are coined in many papers, the definition of those terms is often buried deep in the

content of the paper.

• Papers are written to be concise due to length restrictions

• Obscure abbreviations

• Journal articles are written in much accessible language but are still incomprehensible for general public.

“When everyday life is understood in terms of spatialization, temporalization and embodiment, ubiquitous computing offers a unique opportunity to evaluate the ‘relational’ as flows, intensities and transductions that mobilize sociotechnical assemblages.” - Galloway, A. (2004). Imitations of Everyday Life. Cultural Studies 18(2/3), 384 – 408.  

ObjectivesObjectives

● A graphical representation of the contents of the research paper including the important key terms and the relations between them. This would help understand the contents of the paper and the topic of discussion.

 

● Categorize a research article/publication into one of subfields (of Computer Science).

● Generate a text summary.

ApproachApproach

• Preparing dataset

• Convert publications in PDF format to text format using IBM Watson Document

conversion service

• Extract meta-data of PDF files using Apache PDFBox

• Extracted N-ary relations in text using Allen AI Open IE 4.1

• Sentence: • The U.S. president Barack Obama gave his speech on Tuesday to thousands of people.

• Extracted Relations: • (Barack Obama, is the president of, the U.S.)

• (Barack Obama, gave, [his speech, on Tuesday, to thousands of people])

• Allen AI Open IE 4.1 is much better than other Open IE versions including TextRunner,

Reverb and Ollie

Approach (Cont.)Approach (Cont.)

• Perform NLP

• Lemmatization

• Stopword removal (Update stopword list)

• TFIDF

• Train Naïve-Bayes Model on 10 categories:

• Extract topics using LDA

WorkflowWorkflow

ResultsResults

• Open IE 4.1 Relation Extraction:

• N-ary relations are represented in JSON format

ResultsResults

• Key terms and relations expressed in a graph

ResultsResults

• Ontology

ResultsResults

• Classification of terms into sub-fields of Computer Science