Post on 22-Jan-2016
description
transcript
Full-Text Support in a Database Semantic File System
Kristen LeFevre & Kevin Roundy
Computer Sciences 736
Leveraging DBs in File Systems
What do databases have to offer?
• Transactions
• Concurrency control
• Crash recovery
• Query power (metadata)
• Extensibility – add new objects/modules• Efficient Search!
Re-thinking Directories
• Current state of directories:• User remembers what, not whereOur System:• Search tools for grouping related files• Semantically meaningful directories
[Semantic FS]• Files are stored in tables• Directories are just for looks
LAME!
Related Work
• Semantic Filesystems• Use a DB [Inversion Filesystem]• NFS Meets Databases [Halverson]
• NFS for portability, transparency, existing code support, familiar semantics
• Server-side caching for performance
Bringing ideas together:• Use [Halverson]’s infrastructure to
implement semantic filesystem ideas
Roadmap
• Overview of System Design and Implementation
• Virtual Directories and Full-Text Queries
• Live Demonstration
• Conclusions & Future Work
System Architecture
Standard NFS Clients:
Client Client
NFS Server:NFS Front End
Custom Backend
...
Object-Relational Database: Storage
M TS2
Storage
TS2M M M
Postgres Capabilities
An object-relational DB such as Postgres lets you define and add modules.
Case in point: Tsearch2
New type: tsvector
Related function: to_tsvector to_tsvector(‘a b a c'); ‘a':1,3 ‘b':2 ‘c':4
Related index: idxFTI
Set triggers to do updates
Mapping FS data to DB Schema
Filesystem Data Database Tables
Metadata fileatt
Directory Structure naming
Non-indexed File Content
allfiles
Indexed File Content
allfiles_txt
[Halverson] Schema
inode uid gid mode nlinks size ctime mtime atime
fileatt
inode name parent
naming inode chunk_id data
allfiles
1 1
NN
1
N
Database Schema
inode uid gid mode nlinks size ctime mtime atime istext
fileatt
inode name parent
naming inode chunk_id data
allfiles
1 1
NN
1
N
strstr(a,”.txt”)
Database Schema
inode uid gid mode nlinks size ctime mtime atime istext
fileatt
inode name parent
naming inode chunk_id data
allfiles
1 1
NN
inode fulltext tsvector
allfiles_txt
1
1
1
N
tsearch2 index
strstr(a,”.txt”)
Roadmap
• Overview of System Design and Implementation
• Virtual Directories and Full-Text Queries
• Live Demonstration
• Conclusions & Future Work
Virtual Directories and Text Search
• Want to handle 2 types of text queries• Boolean keyword queries
• e.g. (‘Kristen’ | ‘Kevin’ | ‘Remzi’) & ‘file’ & ‘system’
• IR rank queries• e.g. Rank files with respect to (‘computer’ & ‘architecture’)
• More powerful than grep!
• Virtual directories proposed for Semantic File systems• Incorporate full-text queries without “breaking” NFS
interface for existing applications
DBMS Full-Text Support
• Keyword Search• Text indices support search over keywords• Words extracted from document, stemmed,
“stopwords” removed
• Rank• Used existing rank() function as a black-box• rank() counts number of times each word appears in
document, and whether search terms are near one another
• Optionally, normalize by document length• Other notions of IR rank could easily be substituted
Semantics of Virtual Directories
• Encountered some tradeoffs• What we did:
• Static virtual directories (search once on mkdir)• Directory contents as a snapshot at one point in time• Hard links
/CS736/CS736
projectproject paperspapers reading questions
reading questions
%nfs%%nfs%
writeup
writeup
NFSNFStalk outline
talk outline
NFS vs AFS
NFS vs AFS
Thread ideas
Thread ideas
Semantics of Virtual Directories
• Encountered some tradeoffs• Alternatives (all also valid):
• Static virtual directory creation with symbolic links• leads to dangling (broken) links
• Process query lazily on readdir command• Semantics used in Semantic File System paper
• Dynamically update contents of virtual directories on file creation, deletion, or write
• Can be implemented using database triggers• More expensive, heavier back-end load
Roadmap
• Overview of System Design and Implementation
• Virtual Directories and Full-Text Queries
• Live Demonstration
• Conclusions & Future Work
Roadmap
• Overview of System Design and Implementation
• Virtual Directories and Full-Text Queries
• Live Demonstration
• Conclusions & Future Work
Conclusions
• Benefits of our proxy architecture:• Standard NFS clients• Postgres as black box• Simple to expose functionality of DB• Use & add DB objects at will
Future Work• Performance evaluation to understand the
overhead of new functionality• Dynamic index maintenance (file creation &
modification)• Virtual directory creation and text querying
• Block-level text writes and caching• Query support for other file types
• Mechanisms for extracting and indexing meta-data from additional file types (e.g., image files)
• Performance Monitoring, Adaptive Indexing and storage format within the NFS Proxy
Thanks!Questions?
Special Thanks:Remzi Arpaci-Dusseau
Alan HalversonDavid DeWitt