+ All Categories
Home > Documents > What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying...

What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying...

Date post: 22-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
37
9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair Eric P. Xing William W. Cohen Ambuj K. Singh, University of California at Santa Babara What this talk is about? 2
Transcript
Page 1: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

1

Mining and Querying Multimedia DataFan GuoSep 19, 2011Committee Members:Christos Faloutsos, ChairEric P. XingWilliam W. CohenAmbuj K. Singh, University of California at Santa Babara

What this talk is about?

2

Page 2: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

2

What this talk is about?

3

Going Multimedia

4

Page 3: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

3

Beyond Text and Images

5

Thesis Outline

MiningM1: MultiAspectForensics

M2: QMAS

Querying

Q1: Click Models

Q2: C-DEM

Q3: BEFH

6

Page 4: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

4

Thesis Outline

MiningM1: MultiAspectForensics

M2: QMAS

Querying

Q1: Click Models

Q2: C-DEM

Q3: BEFH

7

Mining Multimedia Data (1)

• Labeling Satellite Imagery

8

Input Output

Page 5: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

5

Mining Multimedia Data (2)

• Network Traffic Log Analysis

9

Mining Multimedia Data (3)

• Web Knowledge Base

10

Page 6: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

6

Mining Multimedia Data

• Data-driven problem solving over multiple modes at a non-trivial scale.

11

Thesis Outline

MiningM1: MultiAspectForensics

M2: QMAS

Querying

Q1: Click Models

Q2: C-DEM

Q3: BEFH

12

Page 7: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

7

Querying Multimedia Data (1)

• A querying system provides an interface to retrieve records that best match users’ information need.

13

Querying Multimedia Data (1)

• Here is another example:

14

https://www.facebook.com/pages/browser.php

Page 8: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

8

Querying Multimedia Data (1)

• May be transformed into a graph search problem

15

Querying Multimedia Data (2)

• Calibrate ranking from user feedback

16

Page 9: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

9

Querying Multimedia Data (2)

• Calibrate ranking from user feedback

17

Thesis Outline

MiningM1: MultiAspectForensics

M2: QMAS

Querying

Q1: Click Models

Q2: C-DEM

Q3: BEFH

18

Page 10: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

10

Data

• Large-Scale Heterogeneous Networks

19

Port

198.129.1.2131.243.2.10

131.243.2.5

128.3.10.40 128.3.1.50

IP-source IP‐destination

80 (HTTP)

80 (HTTP)

993 (IMAP)

Goal

• How can we automatically detect and visualize patterns within a local community of nodes?

20

Page 11: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

11

Preliminary

• Tensor for high-order data representation▫ 3 data modes: source IP, destination IP, port #

21

Approach

22

Page 12: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

12

Data Decomposition

• The canonical polyadic (CP) decomposition can factor tensor into a sum of rank-1 tensors

23

Data Decomposition

• A special case is Singular Value Decomposition

24

Page 13: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

13

Attribute Plot

25

How to compute?

Spike Detection

• Iteratively search for spikes in the histogram plot along each data mode.

26

“ “” ”

Page 14: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

14

Substructure Discovery

• Focus on part of the data within the spike

• Categorize into a few subgraph patterns

27

Pattern 1: Generalized Star (1)

28

IP-src’s sending packets to the same IP-dst & the same port

Typical client/server

system

Page 15: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

15

Pattern 1: Generalized Star (1)

29

A ‘bar’ in a carefully reordered tensor

Pattern 1: Generalized Star (2)

30

Extending along “Port-Number”

Page 16: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

16

Pattern 1: Generalized Star (2)

31

Port scanning or P2P

Port numbers used in packets from the same IP-src to the same IP-dst

Pattern 2: Generalized Bipartite-Core (1)

32

A ‘plane’ in a carefully reordered tensor

Page 17: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

17

Pattern 2: Generalized Bipartite-Core (1)

33

IP-src’s sending packets to the

same IP-dst’s & the same port

Clients talking to a shared server pool

Pattern 2: Generalized Bipartite-Core (2)

34

A ‘plane’ in a carefully reordered tensor

Page 18: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

18

Pattern 2: Generalized Bipartite-Core (2)

35

IP-src’s sending packets over

multiple ports to one IP-dst

A multi-purpose

windows server

M1: MultiAspectForensics

• Automatically detects novel patterns in heterogenous networks

36

Page 19: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

19

Thesis Outline

MiningM1: MultiAspectForensics

M2: QMAS

Querying

Q1: Click Models

Q2: C-DEM

Q3: BEFH

37

QMAS: Mining Satellite Imagery (1)

• Low-labor labeling

38

Input Output

Page 20: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

20

QMAS: Mining Satellite Imagery (2)

• Low-labor labeling• Identification of Representatives

39

QMAS: Mining Satellite Imagery (2)

• Low-labor labeling• Identification of Representatives and Outliers

40

Page 21: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

21

QMAS: Mining Satellite Imagery (2)

• Low-labor labeling• Identification of Representatives and Outliers

41

QMAS: Mining Satellite Imagery (3)

• Low-labor labeling• Identification of Representatives and Outliers• Linear in time & space

42

Page 22: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

22

Thesis Outline

MiningM1: MultiAspectForensics

M2: QMAS

Querying

Q1: Click Models

Q2: C-DEM

Q3: BEFH

43

Web Search

44

Page 23: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

23

User Clicks as Quality Feedback

45

# of total clicks

Motivation

• Leverage the signal from click data to improve search ranking.

46

Page 24: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

24

Click Through Rate (CTR)

• CTR = # of Clicks / # of Impressions

47

Position Bias

48

Page 25: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

25

Relevance of Web Document

• Relevance = CTR @ Position 1

49

# Clicks @ Position 1# Impressions @ Position 1=

Problem Definition

• Estimate the relevance of web documents given clicks and their positions.

50

Page 26: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

26

Design Goals / Constraints

• Scalable: single-pass, easy to parallel.

• Incremental: real-time updates possible.

• Accurate: consistent with past and future observations.

51

Approach

52

Page 27: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

27

User Behavior Model

53

Last Clicked Position

54

Page 28: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

28

Empirical Results

• Click data after pre-processing▫ 110K distinct queries, 8.8M query sessions.

• Training time: <6 mins

• Online update:▫ Bump impression and click counters▫ No data retention required

55

Empirical Results

• Higher log-likelihood indicates better quality.

56

27% accuracy in prediction2% improvement over ICM, the baseline model

Page 29: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

29

Empirical Results

• Position-bias visualized

57

Ground Truth

DCM

Scaling to Terabytes

• 265TB data, 1.15B document relevance results,running time on wall clock ~ 3 hours

58

Page 30: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

30

Q1: Click Models

• A statistical approach to leveraging click data for better ranking aware of position-bias.

• They are incremental, more accurate than the baseline, scaling to almost petabyte-scale data.

59

Thesis Outline

MiningM1: MultiAspectForensics

M2: QMAS

Querying

Q1: Click Models

Q2: C-DEM

Q3: BEFH

60

Page 31: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

31

Q2: C-DEM

• A flexible query interface for 3-mode data: images, genes, annotation terms.

61

Q2: C-DEM

62

Images

Terms Genes

Page 32: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

32

Q2: C-DEM

• Solution: random walk with restart on graphs.

63

Thesis Outline

MiningM1: MultiAspectForensics

M2: QMAS

Querying

Q1: Click Models

Q2: C-DEM

Q3: BEFH

64

Page 33: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

33

Q3: BEFH (1)• Bayesian exponential family harmonium• Deriving topical representations for multimedia

corpora (e.g., video snapshots and captions)

65

Input Model

Q3: BEFH (2)• Bayesian exponential family harmonium• Deriving topical representations for multimedia

corpora (e.g., video snapshots and captions)

66

Validation – Synthetic Data Validation – TRECVID Data

Better Quality

Better Quality

Page 34: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

34

Thesis Outline

MiningM1: MultiAspectForensics

M2: QMAS

Querying

Q1: Click Models

Q2: C-DEM

Q3: BEFH

67

Conclusion

• Data-driven research under the theme of pattern mining and similarity querying.

68

Page 35: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

35

Conclusion

• Data-driven research under the theme of pattern mining and similarity querying.

• An array of practical tasks addressed:▫ Internet traffic surveillance (M1)

69

Conclusion

• Data-driven research under the theme of pattern mining and similarity querying.

• An array of practical tasks addressed:▫ Internet traffic surveillance (M1)▫ Satellite image analysis (M2)

70

Page 36: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

36

Conclusion

• Data-driven research under the theme of pattern mining and similarity querying.

• An array of practical tasks addressed:▫ Internet traffic surveillance (M1)▫ Satellite image analysis (M2)▫ Web search (Q1)

71

Conclusion

• Data-driven research under the theme of pattern mining and similarity querying.

• An array of practical tasks addressed:▫ Internet traffic surveillance (M1)▫ Satellite image analysis (M2)▫ Web search (Q1)▫ …

72

Page 37: What this talk is about?fanguo/dissertation/fanguo-defense.pdf · 9/20/2011 1 Mining and Querying Multimedia Data Fan Guo Sep 19, 2011 Committee Members: Christos Faloutsos, Chair

9/20/2011

37

Thank You!

• http://www.cs.cmu.edu/~fanguo/dissertation/

73

74


Recommended