Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | chastity-black |
View: | 216 times |
Download: | 0 times |
Memex: A Browsing Assistant forCollaborative Archiving and
Mining of Surf Trails
Soumen ChakrabartiSandeep Srivastava
Mallela SubramanyamMitul Tiwari
Indian Institute of Technology Bombay
IITB 2000
Sources of Web information Sources already exploited
• Text on pages (keyword search)• Link between pages (popularity rating)• Topic taxonomies (query expansion)
Sources not exploited enough yet• Public surfing history• Public bookmarks
Collaboration is central to hypertext Lack of trust limits collaboration on Web
IITB 2000
Our goals Infrastructure to support spontaneous
formation of topic-based collaborative Web communities• Browsing assistant client• Community server
Mining algorithms for personal and community level topic management and collaborative resource discovery
Extensible API for plugging in additional hypertext analysis tools
IITB 2000
1: Create aMemex account(password sent
by email)
3: Allow the Memexclient to attach toyour Web browser
4: Log on to theMemex server
2: Install theMemex applet signing
certificate and visitthe applet page
IITB 2000
Memex clientapplet attachesto browser
Privacy choice
Function ta
bs
IITB 2000
Preparing toimport initialbookmarks
IITB 2000
Bookmarksimported
IITB 2000
For Memex to suggestan initial topic organization,select all bookmarks…
IITB 2000
…and send themto the clustering tab
IITB 2000
Switch to theclustering tab
URLs to beclusteredappear here
IITB 2000
Submit the URLsto the server-sideMemex clusteringdemon
IITB 2000
Check later if theserver has completedthe clustering task
IITB 2000
Two top-levelclusters aboutsoftware andmusic
IITB 2000
Expanding thesoftware clusterto study it inmore detail
IITB 2000
User can freelyreorganize URLplacement usingcut-and-paste
IITB 2000
User can freelyreorganize URLplacement usingcut-and-paste
IITB 2000
User can freelyreorganize URLplacement usingcut-and-paste
IITB 2000
Moving an entirefolder from thecluster tab…
IITB 2000
…to the foldertab together withexample URLs
IITB 2000
…to the foldertab together withexample URLs
IITB 2000
Folder names can beedited as per taste; thisalso gives Memexadditional clues aboutthe folder’s contents
IITB 2000
New folders can becreated to hold clustersfound in the cluster tab
IITB 2000
New folders can becreated to hold clustersfound in the cluster tab
IITB 2000
A topic hierarchy which istoo detailed for the user canbe flattened
IITB 2000
A topic hierarchy which istoo detailed for the user canbe flattened
IITB 2000
Groups of closely relatedURLs can be moved backto folders in the folder tab
IITB 2000
Groups of closely relatedURLs can be moved backto folders in the folder tab
IITB 2000
Memex helps the user derivea starting topic hierarchy fromunstructured bookmarks
IITB 2000
The user then continuesbrowsing in multiple sessions.Relevant pages found by othermembers of the communityand made public are availablefor collaborative surfing
IITB 2000
If permission is granted, theMemex applet monitors the trailthat the surfer follows anduploads it to the server forfurther analysis and mining
IITB 2000
If permission is granted, theMemex applet monitors the trailthat the surfer follows anduploads it to the server forfurther analysis and mining
IITB 2000
Such surf trails together withpage contents are valuableinputs to the Memex server-sidehypertext mining and resourcediscovery demons
IITB 2000
In the background, the Memexclassifier finds the most suitablefolders to assign to each historyitems. History is never deleted (diskis cheap). When the user refreshesthe view, surf history from othersand herself are found categorizedinto the user’s familiar topic tree.
‘?’ indicates that Memex is not
sure about the folder assignment.
Users can easily correct mistakes
and this forms additional
valuable training data.
IITB 2000
Automatic collaborativeclassification also lets usersreturn to a topic-restrictedsurfing context quickly, andreplay the last few surfingactions within that topicof interest.
IITB 2000
Personalized topic-basedhistory management is farsuperior to the one-dimensional history listprovided by popularbrowsers
IITB 2000
Users can switch topics witha single click, and browsingis not limited by the linear“back and forward” paradigmsupported by browsers.
IITB 2000
Users can switch topics witha single click, and browsingis not limited by the linear“back and forward” paradigmsupported by browsers.
IITB 2000
A flexible interactive searchlets the user locate any pageever visited from anywhereusing this account, combiningcontent with popularity, siteselections and timeliness
IITB 2000
A flexible interactive searchlets the user locate any pageever visited from anywhereusing this account, combiningcontent with popularity, siteselections and timeliness
IITB 2000
Close integration of theMemex client with thebrowser is non-trivial toimplement but adds greatlyto comfort and ease of use
IITB 2000
Memex system diagram
Browser
Memex server
Client JARVisit
Runningclient applet
Download
Attach
Eve
nt-
han
dle
r se
rvle
ts
Search
Folder
Context
Archive
Memex client-serverprotocol and workloadsharing negotiations
Relationalmetadata
Textindex
Min
ing
de
mo
ns
Topicmodels
Taxonomy synthesis
Resource discovery
Recommendation
Classification
Clustering
IITB 2000
Document workflow
Demon Registry
X
Per-document version queue
NODEtable
Crawler
Searchindexer
Classifierservice
Clusteringservice
Garbagecollector
Push newversion
Pop anddiscard
old version
BrowserMemexclient
Page visit andbookmarkingevents logged
IITB 2000
Autonomous topic organization Bookmarks often collected into topics Surfers use personal topic organization One-size-fits all taxonomy inadequate
• Many topics over-developed for most of us• http://dmoz.org/Sports/Hockey/Underwater_Hockey/
• But deeper interests often underdeveloped• Structure reorganization also desirable
Best taxonomy depends on community behavior as well as page content
IITB 2000
Autonomy and collaboration Personalization picking Yahoo nodes Complex relations between topics Need “simplest common ground”
• Coalesce similar topics where possible…• …without sacrificing individual taste
Sports
Hiking
Subsumption
User2User1Yahoo
Biz
Shops
Bikeshops
Sports
Cycling
Cycling
Bikeshops
Sports
User3
Tree ‘inversion’
IITB 2000
Taxonomy synthesis example
Generating themes makes map simpler But distorts contents of original folders Joint optimization gives best themes
Entertainment
Studios
Broadcasting
Media kpfa.org
bbc.co.uk
kron.com
channel4.com
kcbs.com
foxmovies.com
miramax.com
lucasfilms.com
Share document
Share folder
Share termsThemes
‘Radio’
‘Television’
‘Movies’
IITB 2000
Summary and project status Collaborative resource discovery and topic
management system Testbed for hypertext mining research Signed Java2 client
• Netscape 4.5+ available• IE5+ planned
Server for Unix and Windows• IBM UDB, Berkeley DB, servlets• Non-trivial to install and manage• Simple-to-use RPMs being planned
http://www.cse.iitb.ernet.in/~soumen