+ All Categories
Home > Documents > Text Analytics (Text Mining) -...

Text Analytics (Text Mining) -...

Date post: 01-Jun-2020
Category:
Upload: others
View: 20 times
Download: 0 times
Share this document with a friend
66
Text Analytics (Text Mining) LSI (uses SVD), Visualization CSE 6242 / CX 4242 Apr 3, 2014 Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song
Transcript
Page 1: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Text Analytics (Text Mining) LSI (uses SVD), Visualization

CSE 6242 / CX 4242 Apr 3, 2014

Duen Horng (Polo) ChauGeorgia Tech

Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song

Page 2: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Singular Value Decomposition (SVD): Motivation

Problem #1: "" Text - LSI uses SVD find “concepts”""

Problem #2: "" Compression / dimensionality reduction

Page 3: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - MotivationProblem #1: text - LSI: find “concepts”

Page 4: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - MotivationCustomer-product, for recommendation system:

bread

lettu

cebe

ef

vegetarians

meat eaters

tomato

sch

icken

Page 5: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Motivation• problem #2: compress / reduce

dimensionality

Page 6: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Problem - Specification~10^6 rows; ~10^3 columns; no updates;"Random access to any cell(s); small error: OK

Page 7: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Motivation

Page 8: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Motivation

Page 9: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x

3 x 2 2 x 1

=

Page 10: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x

3 x 2 2 x 1

=

3 x 1

Page 11: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x

3 x 2 2 x 1

=

3 x 1

Page 12: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x

3 x 2 2 x 1

=

3 x 1

Page 13: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Definition(reminder: matrix multiplication)

x =

Page 14: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - DefinitionA[n x m] = U[n x r] Λ [ r x r] (V[m x r])T"

"

A: n x m matrix e.g., n documents, m terms"U: n x r matrix e.g., n documents, r concepts"Λ: r x r diagonal matrix r : rank of the matrix; strength of each ‘concept’"V: m x r matrix " e.g., m terms, r concepts

Page 15: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - DefinitionA[n x m] = U[n x r] Λ [ r x r] (V[m x r])T

= x xn

m r

rrn

mr

n documentsm terms

n documentsr concepts"

diagonal entries: concept strengths

m termsr concepts

Page 16: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - PropertiesTHEOREM [Press+92]:

always possible to decompose matrix A into A = U Λ VT"

U, Λ, V: unique, most of the time"U, V: column orthonormal "i.e., columns are unit vectors, orthogonal to each other"

UT U = I"VT V = I

Λ: diagonal matrix with non-negative diagonal entires, sorted in decreasing order

(I: identity matrix)

Page 17: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - ExampleA = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

Page 18: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

CS-conceptMD-concept

Page 19: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

CS-conceptMD-concept

doc-to-concept similarity matrix

Page 20: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

‘strength’ of CS-concept

Page 21: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

term-to-concept similarity matrix

CS-concept

Page 22: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Example• A = U Λ VT - example:

datainf.

retrievalbrain lung

=CS

MD

x x

term-to-concept similarity matrix

CS-concept

Page 23: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #1‘documents’, ‘terms’ and ‘concepts’:!• U: document-to-concept similarity matrix!• V: term-to-concept sim. matrix!•Λ: its diagonal elements: ‘strength’ of each

concept

Page 24: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:!Q: if A is the document-to-term matrix, what

is AT A?!A:!Q: A AT ?!A:

Page 25: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:!Q: if A is the document-to-term matrix, what

is AT A?!A: term-to-term ([m x m]) similarity matrix!Q: A AT ?!A: document-to-document ([n x n]) similarity

matrix

Page 26: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

• V are the eigenvectors of the covariance matrix ATA !

!

• U are the eigenvectors of the Gram (inner-product) matrix AAT

SVD properties

Thus, SVD is closely related to PCA, and can be numerically more stable. For more info, see:http://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.

Page 27: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2best axis to project on !

(‘best’ = min sum of squares of projection errors)

min RMS error

v1

First Singular Vector

Page 28: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

• A = U Λ VT - example:

= x x

variance (‘spread’) on the v1 axis

v1

Page 29: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

• A = U Λ VT - example:!–U Λ gives the coordinates of the points in the

projection axis

= x x

Page 30: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2• More details!• Q: how exactly is dim. reduction done?

= x x

Page 31: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2• More details!• Q: how exactly is dim. reduction done?!• A: set the smallest singular values to zero:

= x x

Page 32: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

~ x x

Page 33: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

~ x x

Page 34: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

~ x x

Page 35: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #2

~

Page 36: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

= x x

Page 37: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix

= x x

Page 38: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix =!• ‘communities’ (bi-partite cores, here)

Row 1

Row 4

Col 1

Col 3

Col 4Row 5

Row 7

Page 39: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD algorithm

• Numerical Recipes in C (free)

Page 40: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• Drill: find the SVD, ‘by inspection’!!• Q: rank = ??

= x x?? ??

??

Page 41: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• A: rank = 2 (2 linearly independent rows/

cols)

= x x??

????

??

Page 42: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• A: rank = 2 (2 linearly independent rows/

cols)

= x x

orthogonal??

Page 43: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• column vectors: are orthogonal - but not

unit vectors:

= x x

1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0

0 1/sqrt(2)0 1/sqrt(2)

Page 44: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• and the singular values are:

= x x

1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0

0 1/sqrt(2)0 1/sqrt(2)

Page 45: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• Q: How to check we are correct?

= x x

1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0

0 1/sqrt(2)0 1/sqrt(2)

Page 46: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - Interpretation #3• A: SVD properties:!

–matrix product should give back matrix A –matrix U should be column-orthonormal, i.e.,

columns should be unit vectors, orthogonal to each other!

–ditto for matrix V –matrix Λ should be diagonal, with non-negative

values

Page 47: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

SVD - ComplexityO(n*m*m) or O(n*n*m) (whichever is less)!!

Faster version, if just want singular values! or if we want first k singular vectors! or if the matrix is sparse [Berry]!!

No need to write your own!Available in most linear algebra packages (LINPACK, matlab, Splus/R, mathematica ...)

Page 48: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

References• Berry, Michael: http://www.cs.utk.edu/~lsi/!• Fukunaga, K. (1990). Introduction to Statistical

Pattern Recognition, Academic Press.!• Press, W. H., S. A. Teukolsky, et al. (1992).

Numerical Recipes in C, Cambridge University Press.

Page 49: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!Q2: multi-lingual IR (english query, on

spanish text?)

Page 50: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!Problem: Eg., find documents with ‘data’

datainf.

retrievalbrain lung

=CS

MD

x x

Page 51: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?

datainf.

retrievalbrain lung

=CS

MD

x x

Page 52: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?

datainf.

retrievalbrain lung

q=

term1

term2

v1

q

v2

Page 53: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?

datainf.

retrievalbrain lung

q=

term1

term2

v1

q

v2

A: inner product (cosine similarity) with each ‘concept’ vector vi

Page 54: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?

datainf.

retrievalbrain lung

q=

term1

term2

v1

q

v2

A: inner product (cosine similarity) with each ‘concept’ vector vi

q o v1

Page 55: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIcompactly, we have:! q V= qconcept!

Eg:data

inf.retrieval

brain lung

q=

term-to-concept similarities

=

CS-concept

Page 56: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIDrill: how would the document (‘information’,

‘retrieval’) be handled by LSI?

Page 57: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIDrill: how would the document (‘information’,

‘retrieval’) be handled by LSI? A: SAME:!dconcept = d V Eg: data

inf.retrieval

brain lung

d=

term-to-concept similarities

=

CS-concept

Page 58: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIObservation: document (‘information’,

‘retrieval’) will be retrieved by query (‘data’), although it does not contain ‘data’!!

datainf.

retrievalbrain lung

d=

CS-concept

q=

Page 59: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSIQ1: How to do queries with LSI?!Q2: multi-lingual IR (english query, on

spanish text?)

Page 60: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSI• Problem:!

–given many documents, translated to both languages (eg., English and Spanish)!

–answer queries across languages

Page 61: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Case study - LSI• Solution: ~ LSI

datainf.

retrievalbrain lung

CS

MD

datosinformacion

Page 62: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Switch Gear to Text Visualization

What comes up to your mind?!

What visualization have you seen before?

62

Page 63: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Word/Tag Cloud (still popular?)

http://www.wordle.net!63

Page 64: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Word Counts (words as bubbles)

http://www.infocaptor.com/bubble-my-page64

Page 65: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Word Tree

http://www.jasondavies.com/wordtree/65

Page 66: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014  · Text Analytics (Text Mining) LSI (uses

Phrase Net

http://www-958.ibm.com/software/data/cognos/manyeyes/page/Phrase_Net.html 66

Visualize pairs of words that satisfy a particular pattern, e.g., X and Y


Recommended