Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | isabelle-hopper |
View: | 29 times |
Download: | 0 times |
1
Predicting Download Directories for Web Resources
George Valkanas Dimitrios Gunopulos
4th International Conference on Web Intelligence, Mining and SemanticsJune 3, 2014
Dept. of Informatics & TelecommunicationsUniversity of Athens, Greece
2
Online User Activities
Activity ABS Survey
StatCan Survey
Infoplease Survey
Emailing 91% 93% 92%
General Web Browsing
87% > 70% 83%
Online Purchases
45% > 50% 62%
Download Content
37% ~30% 42%
3
Facilitating Downloads
Save Link In Folder
4
Facilitating Downloads
Save Link In Folder
Problems:• Predefined Directories• Blunt approach / No learning • UI Clutter• Tedious user management
5
A principled solution
6
A principled solution
Associate the navigation through the hierarchy with a cost function
One possible c.f.: Hierarchical Navigation Cost (HNC), i.e., #clicks
HNC(imgs/, docs/) = 2
7
Problem Definition
Given The hierarchical structure A target directory T, where the
resource will be saved
Goal Suggest a directory S that minimizes the cost function
cf( S, T )
8
Problem Definition
Given The hierarchical structure A target directory T, where the
resource will be saved
Goal Suggest a directory S that minimizes the cost function
cf( S, T )
•But if I know T, why not suggest T directly? (0 cost)
9
Problem Definition
Given The hierarchical structure A target directory T, where the
resource will be saved
Goal Suggest a directory S that minimizes the cost function
cf( S, T )
•But if I know T, why not suggest T directly? (0 cost)
In this setting, we don’t know T until it’s too late!
10
Casting to a classification framework Directories are potential class values T is the true target class S is the output of a classification process Web resource properties → classification features
Recommend S that best matches T Use directories from past saves as candidate classes
11
Features & Distances
Feature DistanceTimestamp Exponential decay
Domain (current / referrer) Equality
Path, filename (current / referrer page)
Tokenize & Jaccard
Title Tokenize & Jaccard
Filename Tokenize & Jaccard
Extension Covariance Matrix
Keywords Jaccard
12
Experimental Setup
Implement classifier as a FF plugin DiDoCtor approach Javascript 1-NN classifier
6 participants 4-month minimum use period
Baseline Last-by-domain (LBD), current browser approach Simulated, based on submitted result
Metrics Click Distance: HNC, Breadcrumbs Classification Accuracy
13
Preliminary Result Analysis
14
Preliminary Result Analysis
Take Home Messages1. Users have different saving pattern behavior(s)
15
Preliminary Result Analysis
Take Home Messages1. Users have different saving pattern behavior(s)
2. Users have high variability in their accesses to each directory
16
Click Distance - HNC
Take Home MessageSignificant reduction in number of clicks to reach target directory!
17
Click Distance - HNC
Take Home MessageSignificant reduction in number of clicks to reach target directory!
Click distance gainis even higher
when consideringa breadcrumbs UI!
18
Running Accuracy
Take Home MessageDiDoctor is much more accurate in predicting the download directory
19
Basic Model Extensions
Feature reweightingRELIEF_F
20
Basic Model Extensions
Feature reweightingRELIEF_F
Suggesting k directories
21
Alternative classifiers
Take Home Messages• Classifiers can help!• DiDoCtor generally
performs the best• Accuracy is affected
by user behavior!
22
Conclusions & Future work
Approach for facilitating downloads Optimization problem & classification framework
Experimentation with real users Basic model extensions
Further exploit the temporal dimension More informative features (e.g., entities) Automatic generation of directories
23
Thank you!
Questions?
AcknowledgementsTo the evaluators of our pluginHeraclitus II fellowship, THALIS-GeoComp,
THALIS-DISFER, Aristeia-MMD, EU project INSIGHT