Date post: | 14-Jul-2015 |
Category: |
Documents |
Upload: | amelie-marian |
View: | 349 times |
Download: | 1 times |
Remembrance of Data PastUsing Context in Personal Information SearchAmélie Marian, Rutgers University
Thu D. Nguyen, Rutgers University
Daniela Vianna, Rutgers University
Luan Nguyen, Rutgers University
What was the name of that restaurant?• I went there with Julia
• We had dinner
• It was pouring rain
Some Sources of helpful data
“With Julia”: Calendar, email, text
“Restaurant”: Check-ins, cell phone GPS logs
“Restaurant”: Credit Card statements
“Pouring rain”: Historical Weather reports
Amélie Marian - Rutgers University
The Web
hypertext universal library of text
and multimedia
personal/private data social data
Amélie Marian - Rutgers University
Personal data is fragmented
Amélie Marian - Rutgers University
We remember our data based on context clues• “Serge sent me this file while we were on a
conference call with Alkis”
Skype, Google hangout, email, calendar, filesystem
• “I found this shopping web site while talking to Tova on Skype, She was wearing a bue dress.”
Skype (+ snaphot), calendar, browser history
• “Are my insurance reimbursements up to date?”
Calendar, insurance account, bank account
Amélie Marian - Rutgers University
We also remember data from our social network• “Mohan posted this interesting article on CS
education on Facebook, or maybe on Twitter, or maybe it was Moshe Vardi who posted it”
Facebook, Twitter, browser history
• “What are the books my friends recommended”
Facebook (and comments), Twitter, emails
• “What are the place in Maui that my friends enjoyed”
Facebook, Twitter, emails, Foursquare
Amélie Marian - Rutgers University
Data dimensions
• Follow natural interrogative words:
• what? (content)
• who? (with whom, from whom, to whom,...)
• where? (physical or logical, in the real-world and in the system)
• when? (time and date, but also what was happening concurrently, before and after)
• why? (sequence of data/events that are connected)
• how? (application, author, environment). Amélie Marian - Rutgers University
What is an answer?
• Content
• File
• Link
• List of objects (insurance reimbursements)
• But also part of the context
• Location
• Meeting participants
• Time
Amélie Marian - Rutgers University
Personal Data Context
• Explicit• Metadata information stored by the file system or
application, e.g., timestamp, GPS location, tags, directory structure.
• Implicit• Identified through application-based semantic
information, e.g., email recipients, calendar meeting participants, check-in location
• Inferred• Knowledge about the environment of the data collection.
• System environment (Which applications/documents were opened concurrently with a given document)
• Social environment (Which Facebook members had access to an event)
• Real world environment (Who was physically in the room –RFID tags, skype –, weather).
Amélie Marian - Rutgers University
Challenges
• Indexing content and context• Semantic analysis for extracted context
• Data integration
• Identify inferred context• Store and index as it is produced (system environment)
• Use API calls on-demand or copy information (social and real-world environment)
• Unified data model• Content and structure
• Data in context
• Navigation
Amélie Marian - Rutgers University
Challenges (2)
• Powerful data tools• Access and query (possibly remote) sources
• Search based on content and contextual clues• Approximate matching
• Explore data to get relevant information
• Discover new relevant information• “It’s been six month, you need to make a dentist
appointment!”
• “You forgot to pay the home insurance bill!”
• “Last time you bought toothpaste was a month ago, you are probably running out.”
Amélie Marian - Rutgers University
Previous results:Unified Structure, Content, and Metadata Search
• Data and query models that unify content and structure along one dimension
• System metadata seen as a separate dimension• A unified multi-dimensional scoring mechanism
• IDF-based scores for each dimension• Individual dimension scores easily combined• TF scores to break ties
• Query processing algorithms and index structures to score and rank answers efficiently
EDBT’08ICDE’08 (demo)EDBT’11TKDE’12with Wei Wang, Chris Peery, andThu D. Nguyen
Amélie Marian - Rutgers University
Unified Structure and ContentTarget file: Halloween party pictures taken at home where someone
wears a witch costume
File
Boundaryroot
“Halloween” “witch”
Home
//Home*.//“Halloween” and .//“witch”+
Amélie Marian - Rutgers University
Unified IDF ScoreFor a unified data tree T, a path query PQ, and a file
F, we define:
• IDF Score
where N is total number of files, and is the set of files that match PQ in T.
N
PQTmatches
N
PQscoreidf
log
),(log
)(
),( PQTmatches
Amélie Marian - Rutgers University
Case Study
Target file: Electronic version of the novel SeaWolf by Jack London
Content and filtering QueryKeywords: sea, wolf, jack, londonDirectory: /JackLondon/Ebooks
Target file does not appear in result
Approximate QueryKeywords: sea, wolf, jack, londonDirectory: /JackLondon/Ebooks
Target file atRank 3
Date: 26 Feb 07File Extension: .txtDirectory:Personal/Ebook/Novel/JackLondon
Content and filtering QueryKeywords: sea, wolf, jack, londonDate:19 Feb 07; type: pdfDirectory: /JackLondon/Ebooks
Target file does not appear in result
Approximate QueryKeywords: sea, wolf, jack, londonDate: 19 Feb 07; type: pdfDirectory: /JackLondon/Ebooks
Target file atRank 2
Amélie Marian - Rutgers University
Conclusions
• First step towards an automated Personal Data Assistant• Looks at data and its context
• Gathers personal data from remote sources• Cloud applications, social networks, emails, phone
logs, financial accounts, friends public data,…
• Integrates data in a unified data model• Based on natural questions
• Provide search and discovery capabilities• Beyond keyword search
• Context-aware
Funded by a Google Research AwardAmélie Marian - Rutgers University
Ushi Wakamaru!(that’s the restaurant)
Amélie Marian - Rutgers University