Date post: | 05-Apr-2017 |
Category: |
Engineering |
Upload: | shubhangi-tandon |
View: | 113 times |
Download: | 1 times |
TextRank: Bringing Order into Texts
Rada Mihalcea and Paul Tarau
Presented by :
Sharath T.S
Shubhangi Tandon
The TextRank Algorithm1. Identify text units that best define the task at hand,and add them as vertices in the graph.2. Identify relations that connect such text units, and use these relations to draw edges between vertices in the graph. Edges can be directed or undirected, weighted or unweighted.3. Iterate the graph-based ranking algorithm until convergence.4. Sort vertices based on their final score. Use the values attached to each vertex for ranking/selection decisions.
The TextRank Model
G = (V, E)
V = Set of vertices , E = Set of Edges
V(in) = Set of incoming edges
V(out) = Set of outgoing edges
d = damping factor
In addition, W = set of edge weights
Note : For undirected graphs, V(in) = V(out)
ConvergenceConvergence of 4 different kinds of graphs
with respect to directed/undirected and
weighted unweighted.
KeyWord ExtractionHow is the graph built?
Each word(lexical unit) is a node.
A co-occurrence relation, two vertices are connected if their corresponding lexical units co-occur within a window of maximum words, where it can be set anywhere from 2 to 10 words.
Example
Results for Keyword Extraction
Sentence Extraction
Goal is to rank entire sentences, vertex = sentence. Co-occurrence cannot be used. Why ?We need a new relation for our edges : Similarity. Measured as content overlap between two sentences( nodes).
EvaluationSingle Document Summarisation
Data : DUC (2002) , 567 news articles
Evaluation metrics :ROUGE
Compared against 15 systems , including baseline provided by DUC
ResultsHighly Dense Graph
Output compared to human summaries
Comparison - TextRank and OpinosisBoth are unsupervised graphical algorithms
Both try to identify the regions most traversed node/path in a graph(topics, content described most about)
TextRank uses node importances(as a word and sentence) for KeyWord extraction and summarization whereas Opinosis uses path weights across nodes(words) to generate fine-grained summaries.
Observations1.Common pattern : usage of text-unit co-occurrence as a feature in
all supervised topic modelling algorithms ( LDA, BTM, TextRank )
2.Future work : http://web.fi.uba.ar/~fbarrios/tprofesional/articulo-en.pdf
3.Industry started :Included as a module in gensim