Date post: | 02-Jul-2015 |
Category: |
Technology |
Upload: | luca-costabello |
View: | 2,081 times |
Download: | 0 times |
WWW2008Workshop on Social Web Search and MiningApril 22th, 2008 - Beijing
Time Based Context Cluster Analysis for Automatic Blog Generation
Luca Costabello and Laurent-Walter GoixTelecom Italia, Italy
2
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
Context as Blog Content
User context is gaining importance
Location info
Nearby buddies
The surrounding environment in general
We mine context data to detect daily user actions
User actions are converted into natural text
Blog posts describing the user days enable the detection of a community of users with similar behavioral patterns.
3
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
1) Raw data gathering
Daily actions
2) Offline Cluster analysis
3) Blog post generation
Context-Based Blog Generation
4
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
System Architecture
5
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
Cluster Analysis: Detecting User Actions
2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a 2007-10-03 08:47:50 222-1-61104-72395762 n/a,n/a 2007-10-03 08:47:21 222-1-61104-72395762 n/a,n/a 2007-10-03 08:46:51 222-1-61104-72384437 n/a,n/a 2007-10-03 08:46:20 222-1-61104-72376116 n/a,n/a 2007-10-03 08:45:15 222-1-61104-72395763 n/a,n/a 2007-10-03 08:44:02 222-1-61104-72400263 n/a,n/a 2007-10-03 08:42:33 222-1-61104-72395770 n/a,n/a 2007-10-03 08:42:02 222-1-61104-72400262 n/a,n/a 2007-10-03 08:40:08 222-1-24650-1281 residence,home2007-10-03 08:36:26 222-1-24650-1281 residence,home 2007-10-03 08:33:02 222-1-24650-1281 residence,home
Cluster 1 (Static)Start 08:58End 11:02CGI 222-1-61101-162201VP CGI Office, TILabVP Bth Not available
Cluster 2 (Movement)Start 08:42End 08:56CGI From 222-1-24550-1281CGI To 222-1-24650-121 VP CGI From Residence,homeVP CGI To Office, TILabVP Bth Not available
Timestamp Cell ID Cell ID Virtual Place
6
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
Clustering Algorithms Dimensions
Location
GSM/UMTS Cell IDs
User-defined Cell ID Labels
Time
Chronological order of actions must be respected
Categorical attributes
Euclidean distance not available
Time must be evaluated according to
“temporal distance”
Ad-hoc algorithms had to be designed
7
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
Cell-Based Location Data Issues
Context updates occur with variable frequency
Detecting static situations VS detecting movement
Base station concentration affects context data patterns
Frequent cell handovers during static actions
8
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
Compare&Merge Algorithm
2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a
Context History Preliminary Context Scan
Long Temporary Cluster
Short Temporary Clusters
Temporary Clusters Merge
Static Cluster
Movement Cluster
Static Cluster
9
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
MultiLevel Sliding Window Algorithm
For each window iteration:
2. Check if any user-defined label is available.
3. Detect user movement
4. Detect the most frequent position
5. Merge window data with previous window iteration (if detected position is the same)
10
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
Algorithms Comparison
Lower precision than C&M.
(A 30 minute long window leads to a less than 30 minutes error)
Very high in optimal situations
(less than 2-5 minutes)Precision
Non-labeled areas
Frequent cell handovers
Good user labeling
Cells with low handovers issuesOptimal usage
NoneFrequent cell handoversCritical situations
MultiLevel Sliding WindowCompare&Merge
11
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
Cluster Analysis Accuracy VS User Perception
12
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
From Clusters To Blog Post
Context Clusters NLG
Natural Text Generation
Action Detector
User Preferences
13
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
Results
Mining context history leads to user pattern discovery
Daily actions sharing
Detection of user communities, according to daily behaviors
Clustering accuracy VS personal memories perception
Movement detection
Location-labeling importance
14
Luca Cos ta be lloLa ure nt-Wa lte r Goix
Time Based Context Cluster Analysis for Automatic Blog Generation
Any Questions?Thank You!