ReDefine – Research Articles SummarizationReDefine – Research Articles Summarization
Presentation by Presentation by Dig Vijay Kumar YarlagaddaDig Vijay Kumar Yarlagadda
[email protected]@mail.umkc.edu
MotivationMotivation
• Publications / research articles are often complex and difficult to understand for a variety of reasons
including:
• New terms are coined in many papers, the definition of those terms is often buried deep in the
content of the paper.
• Papers are written to be concise due to length restrictions
• Obscure abbreviations
• Journal articles are written in much accessible language but are still incomprehensible for general public.
“When everyday life is understood in terms of spatialization, temporalization and embodiment, ubiquitous computing offers a unique opportunity to evaluate the ‘relational’ as flows, intensities and transductions that mobilize sociotechnical assemblages.” - Galloway, A. (2004). Imitations of Everyday Life. Cultural Studies 18(2/3), 384 – 408.
ObjectivesObjectives
● A graphical representation of the contents of the research paper including the important key terms and the relations between them. This would help understand the contents of the paper and the topic of discussion.
● Categorize a research article/publication into one of subfields (of Computer Science).
● Generate a text summary.
ApproachApproach
• Preparing dataset
• Convert publications in PDF format to text format using IBM Watson Document
conversion service
• Extract meta-data of PDF files using Apache PDFBox
• Extracted N-ary relations in text using Allen AI Open IE 4.1
• Sentence: • The U.S. president Barack Obama gave his speech on Tuesday to thousands of people.
• Extracted Relations: • (Barack Obama, is the president of, the U.S.)
• (Barack Obama, gave, [his speech, on Tuesday, to thousands of people])
• Allen AI Open IE 4.1 is much better than other Open IE versions including TextRunner,
Reverb and Ollie
Approach (Cont.)Approach (Cont.)
• Perform NLP
• Lemmatization
• Stopword removal (Update stopword list)
• TFIDF
• Train Naïve-Bayes Model on 10 categories:
• Extract topics using LDA
WorkflowWorkflow
ResultsResults
• Open IE 4.1 Relation Extraction:
• N-ary relations are represented in JSON format
ResultsResults
• Key terms and relations expressed in a graph
ResultsResults
• Ontology
ResultsResults
• Classification of terms into sub-fields of Computer Science