Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | elaine-clark |
View: | 215 times |
Download: | 1 times |
2002-04-24 CHI Web Behavior Patterns 1
Separating the SwarmSeparating the Swarm Categorization Methods for Categorization Methods for User Sessions on the Web User Sessions on the Web
Jeffrey Heer, Ed H. Chi
Palo Alto Research Center
2002.04.24 – CHI Web Behavior Patterns
2002-04-24 CHI Web Behavior Patterns 2
Web Analytics: Web Analytics: What can you measure?What can you measure?
- content- page traffic
Marketing
Infrastructure- load testing
- user intent- usability- user experience
Site Design
Want to improve site design, content, and performance
2002-04-24 CHI Web Behavior Patterns 3
The Change in Web Sites:The Change in Web Sites:What What shouldshould you measure? you measure?
Page-based websites
Activity-based websites
Time
Sit
e C
om
ple
xity
Products
Management Team
I’d like information on used cars.
Search for a car dealer in my neighborhood.
TRAFFIC
USER EXPERIENCE
2002-04-24 CHI Web Behavior Patterns 4
MotivationMotivation
What are users’ information goals?
Understanding the composition of web user traffic.
Strategy: Use all available data to discover user goals.(Content, Usage, Topology)
System Description Evaluation Implications Conclusion
2002-04-24 CHI Web Behavior Patterns 5
System DescriptionSystem Description
Generate a user profile for each user session.– How: Use access logs and site content to to build
a multi-featured model of user activity (multi-modal clustering).
Group user profiles into common activities like “product browsing” and “job seeking”– How: Apply clustering algorithms to user profiles
2002-04-24 CHI Web Behavior Patterns 6
System DescriptionSystem Description
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
Steps:
1. Process Access Logs
2. Crawl Web Site
3. Build Document Model
4. Extract User Sessions
5. Build User Profiles
6. Cluster Profiles
2002-04-24 CHI Web Behavior Patterns 7
Document ModelDocument Model
Site is crawled– Pay special attention to pages in logs.
Documents described by feature vectors:Content: TF.IDF weighted keyword vector
URL: Tokenized and TF.IDF weighted
Inlinks: Column vectors in topology matrix
Outlinks: Row vectors in topology matrix
Vectors are concatenated to form a single multi-modal vector Pd for each document.
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
2002-04-24 CHI Web Behavior Patterns 8
User SessionsUser Sessions
Sessions extracted and represented by a vector s:– For path i = ABD, si = <1,1,0,1,0>
(For site with 5 documents <A,B,C,D,E>)
Different weightings can be employed in creating the session vector s:Frequency: number of times each page is accessed. ABD, s = <1,1,0,1,0> TF.IDF: hits / # paths including pagePosition: Use order of pages within surfing path.
ABD, s = <1,2,0,3,0>View Time: Use time spent viewing pages.
A10sB20sD15s, s = <10,20,0,15,0>
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
2002-04-24 CHI Web Behavior Patterns 9
User ProfilesUser Profiles
User profiles are linear combination of the viewed pages.– “You are what you see.”
N
ddidi PsUP
1User Profiles
Session weights
Document Vectors
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
2002-04-24 CHI Web Behavior Patterns 10
ClusteringClustering
Clustering is a form of statistical analysis which organizes data into individual clusters.
– Groupings are determined by a shared similarity.
– Similarity is defined by a computable similarity metric.
Clustering proceeds by recursive bisection, using K-Means to perform the bisections [Zhao01].
Web CrawlAccess Logs
Document Model
User Sessions
User Profiles
ClusteredProfiles
Modalitesm
mj
mimji UPUPwUPUPd ),cos(),(
weights wm specify the
contribution of each modality
2002-04-24 CHI Web Behavior Patterns 11
User population breakdown
Detailed stats
Keywords describing
user groups
Frequent documents accessed by group
2002-04-24 CHI Web Behavior Patterns 12
Clustering ResultsClustering Results
Users reached end of tutorial, had nowhere to go.
http://www.diamondreview.com
2002-04-24 CHI Web Behavior Patterns 13
System EvaluationSystem Evaluation
Does the system correctly infer user intentions?
Logs
System
User Intent Groupings
User Intent
Compare
2002-04-24 CHI Web Behavior Patterns 14
User StudyUser Study
Asked users to surf specific tasks on www.xerox.com– captured actions using the WebQuilt proxy logger [Hong01]– done at their leisure.
15 unique tasks: – Tasks developed after exploring xerox.com and reading user
e-mail feedback– 5 task groups with 3 tasks per group.– Products, TechSupport, Supplies, Company Info, and Jobs
Participation:– 21 users signed up, 18 went through, 104 usable sessions.
2002-04-24 CHI Web Behavior Patterns 15
Results: Results: 340 combinations of clustering schemes
Outlink-based schemes performed poorly (omitted).
2002-04-24 CHI Web Behavior Patterns 16
Analysis: ModalitiesAnalysis: ModalitiesAnalys is of Modalities in Unim odal Cases
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
Path Weighting Schem es
% c
orr
ec
tly
clu
ste
red
RAW PATH
CONTENT
URL
INLINK
OUTLINK
Linear Contrast shows Content sig. different:(unimodal) F(1,105)=32.51, MSE=.005361, p<0.0001
(multimodal) F(1,35)=33.36, MSE=.007332, p<0.0001
Content is King! Mean=0.96, StdDev=0.07
2002-04-24 CHI Web Behavior Patterns 17
Analysis: Path WeightingAnalysis: Path Weighting
Paired t-Test between Time-based and non-Time based weightings: n=60, t(59)=4.85, p=4.68e-6
V.T.mean=89.5%, s.d.=12.7%, non-V.T.mean=83.2%, s.d.=12.0%
Analysis of Path Weighting
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Modalities
% c
orr
ec
tly
clu
ste
red
uniform
tf idf
time
position
tf idf ,time
tf idf ,pos
time,pos
time,pos,tf idf
View Time is best!
2002-04-24 CHI Web Behavior Patterns 18
Observation: Multi-Modal vs. UnimodalObservation: Multi-Modal vs. Unimodal
In practice, Multi-Modal should be more robust– Some pages don’t have much content
» Images, Audio, Video
» PDF, PS (if you don’t have necessary software)
– URL Tokens: All pages have URLs.– Inlinks: don’t depend on any features of a page!
In our experience, Content-based Multi-Modal Clustering retains accuracy.
Linear Contrast shows no significant difference between multi-modal and uni-modal schemes:
F(1,77)=1.63, MSE=.004407, p=.21
2002-04-24 CHI Web Behavior Patterns 19
FindingsFindings
Incorporating View Time improves clustering accuracy.
Though it involves extra work, extracting Content can provide very high accuracy.
Adding other modalities make clustering more robust.
Modalities should be chosen carefully, and tailored for each specific site.
2002-04-24 CHI Web Behavior Patterns 20
Implications for DesignersImplications for Designers
Good design means understanding your users. It’s possible to understand trends of user
activities accurately.– Requires well-defined user tasks doable on the site.
Now you can design and tailor user experience.– Address discovered usability issues.– Update design to facilitate common tasks.
2002-04-24 CHI Web Behavior Patterns 21
Summary: “You are what you see.”
UserInformation
Goals
Web site
PageContent
Topology
InfoScent ClusteringObserved
Usage
Users follow the best Information Scent to accomplish their goals.
2002-04-24 CHI Web Behavior Patterns 22
Future WorkFuture Work
Determining # of clusters– Currently done semi-manually
Model unstructured task more directly Directly recommend design changes Integrate with
– Clustering Visualization– User Path Visualization
Lots of Commercial Interest, Licensing
2002-04-24 CHI Web Behavior Patterns 23
ConclusionConclusion
Performed first known user study to characterize the analytic space of session clustering techniques.
Found that session clustering can be highly accurate with respect to user intentions.
Demonstrated our method is scalable and useful in real-world scenarios.
This should prove to be a useful tool for web designers and researchers!
2002-04-24 CHI Web Behavior Patterns 24
AcknowledgementsAcknowledgements
Peter Pirolli, Stu Card, Adam Rosien, Pam Schraedley and the the UIR and Bloodhound Team at PARC.
George Karypis for CLUTO software Participants in our user study Office of Naval Research
Contact:
Jeff Heer ([email protected])
Ed H. Chi ([email protected])
Separating the SwarmSeparating the Swarm Categorization Methods for Categorization Methods for User Sessions on the Web User Sessions on the Web