Date post: | 17-Jan-2018 |
Category: |
Documents |
Upload: | millicent-josephine-alexander |
View: | 215 times |
Download: | 0 times |
Updating methods and relations among concepts
in DOE Research Students:
Chakravarthi S VelvadapuGovind R Maddi
Ratnakar R Krishnama
Faculty Advisors:Dr.James Gil De Lamadrid
Dr.Sadanand Srivastava
CADIP’02 Conference
Sponsored byUS Department Of Defense
OVERVIEW1. The system takes text documents as
its input2. Performs semantic analysis on these
documents3. Generates useful ontology4. Represents it graphically
GOAL OF THE PROJECT
To build an Ontology utilizing• Statistical methods• A small amount of user feedback• Automation
Normalization
Latent Semantic Indexing(SVD)
Pre-processing
Text Document
Document Ontology
Graph Construction
GUI
Architecture of DOE
Updating Methods
Pre-processing
Read-in text fileExtract meaningful termsCount their frequencies
NormalizationCalculate weight of each term using
W i,k = frequency i,k nk
Σ frequency j,k
j=1
Calculate weight of each term using
W i,k = frequency i,k nk
Σ frequency j,k
Normalization(contd)
Calculate normalized weight using W i,k w(i,k)
nk
sqrt(Σ w2(j,k))
j=1
Build Term-Doc Matrix
Rows of Term-Doc matrix contains weights of each term in different concepts
Columns of Term-Doc matrix contains weights of different terms in each concept
Latent Semantic Indexing(LSI)
Statistical method representing documents by statistically independent concepts
Based on Singular Value Decomposition (SVD),technique that decomposes a given
matrix A into three components – U, S and V.
SVDA is formed from LSI as follows:
A = US * SS * VsT
US - derived from U removing all but the s columns
SS - derived from S removing all but the largest s singular values
VsT - derived from VT removing all but the s corresponding rows
SVD (contd)
US
VsT
Am x n
Um x n
Sn x n
VT
n x n
SS
Document Ontology
Build Concept Nodes and Term Nodes using columns and rows of the term matrix (U).
Graph Construction
A bipartite graph is constructed with concept nodes and term nodesA concept node is connected to all term nodes that belong to it.A term node is connected to all concept nodes to which it belongs.
Graph Construction (contd)
Term 1
Concept 1
Concept 2
Term 2
Term 3
Term 4
Term 5
Graphical User Interface
(GUI)
GUI (contd)
GUI consists of• Concepts list• Terms list• Display for bipartite graph• Display for relations among concepts• Display for list of files in ontology
GUI
To view terms related to a concept, user selects that concept from concepts list
To view concepts related to a term, user selects that term from terms list
GUI – File Operations
NewOpenSave
saveAsCloseExit
GUI – Ontology Updates
Add DeleteChangeSVDThresholdchangeConcThresholdChangeDuplicateThresholdfoldInDocSVDUpdatedefaultBuild
GUI – Ontology Modifications
Rename• Renames a selected concept
DelTerm• Deletes a selected term
Undo• Ignores last modification and returns to the
previous state
Updating Ontology
Adding new documents
Investigated less expensive methods for adding new documents:
• Fold-In• SVD update
Fold-In
A method to add new document(s) to an existing ontologyUses the existing data in document addition processLess expensive process than the regular build method
Fold-In(contd)Two step methodStep1
Fold-In document vector Compute new document vector(V) usingd^ = dT * Uk * Sk
-1
where d is document vector to be added Append d^ to the columns of Vk
Folding-In document vector
Ak
m x (n+p)
Uk
Uk
m x k
Sk
k x k
VkT
k x (n+p)
Fold-In (contd)Step 2 • Fold-In term vector• Compute new term vector(U) using t^ = t * Vk * Sk
-1 where t is term vector to be added• Append t^ to the rows of Uk
Folding-In term vector
Ak
(m+q) x n
Uk
(m+q)x
Sk
k x k
VkT
k x n
Fold-In (contd)
Using new document vector ( Vk ) and new term vector ( Uk )
• Rebuild concept nodes and term nodes• Reconstruct bipartite graph • Update GUI
SVD Update
A method to add new document(s) to an existing ontologyUses the existing data in document addition processLess expensive process than the regular build method
SVD Update (contd)
Three step method.Step 1:
SVD Updating Documents Let D = [ Ak / Dp ]
where Ak is original term-document matrix
and Dp is new document vector to be added.
SVD(D ) = UD x D x VTD
SVD Update (contd)
SVD of D can also be computed asUD = Uk x UUD and
VD = Vk 0 x VUD
0 Ip
where UD = [ k / UTk x Dp ].
SVD Update (contd)Step 2:SVD Updating TermsLet T = [ Tk / Tq ]
where Ak is original term-document matrix
and Tq is new term vector to be added.
SVD(T ) = UT x T x VTT
SVD Update (contd)
SVD(T) can also be computed as
UT = Uk 0 x UUT
0 Iq
and VT = Vk x VUT
where UT = [ k /Tq x Vk ]
SVD Update (contd)Step 3:Correction of term weightsLet W = Ak + Xi x Yi
T where Xi is a m x i matrix comprised of rows of zeros or rows of the i-th order identity matrix, Ii. Yi is a n x i matrix representing the differences between old and new weights for each of the i terms.SVD(W) = UW x W x VT
W
SVD Update (contd)
SVD(w) can also be computed asUW = Uk x UQ
and VW = Vk x VQ
where Q = [k + UTk x Xi x YiT x Vk ].
SVD Update (contd)
Using new document vector ( Vw ) and new term vector ( Uw )
• Rebuild concept nodes and term nodes• Reconstruct bipartite graph • Update GUI
Time Complexity
Time complexities for different update methods in the descending orderRecomposing SVD(default build)SVD UpdateFold-In
Relations among concepts
Relations among conceptsSignificance of V :
Rows of V represent documents Columns of V represent concepts
doc1 doc2 doc3 doc4
concept1
concept2
concept3
concept4
Concept vector (V)
Types of relations
Sub concepts Sub-super concepts Disjoint concepts Overlapping concepts
Parallel concepts Parallel concepts Antagonistic concepts
Sub concepts
If % of overlap is < threshold value – Disjoint > 100-threshold value – Sub-super other wise - overlapping
Parallel concepts
If two concepts describe the same document – parallel conceptsOther wise – antagonistic concepts
Relations among concepts for updating methods
In the same way we can generate the relations between concepts for the updating methods, Fold-in and SVD-Update
Future work
Build a domain specific ontology, and test the system.
Replacing concept and term nodes with XML nodes.
Acknowledgements
• I express my appreciation to • Faculty advisors: Dr.Gil de Lamadrid and
Dr.Sadanand Srivastava• Dr.Charles Nicholas, University of
Maryland, Baltimore County.• Sponsor: US Department of Defense.