+ All Categories
Home > Documents > Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei,...

Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei,...

Date post: 29-Mar-2015
Category:
Upload: byron-hollier
View: 214 times
Download: 1 times
Share this document with a friend
Popular Tags:
27
Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/ 32
Transcript
Page 1: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

EMFS: Email-based Personal Cloud Storage

NAS 2011

Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu

• 1/32

Page 2: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Agenda

IntroductionData Organization and AccessEmail-based File System DesignPerformance EvaluationRelated WorkConclusion

• 2/32

Page 3: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Motivation Existing personal cloud storage services

o Tie storage with internal data format and processing applicationso Non-free general-purpose storage and not widely utilized

Existing email services o The capacity of a single email account has increased dramaticallyo Provided by many reliable and reputable online service providers

Leveraging existing email serviceso Benefit service providers as it extends their access to valuable customer

data

• 3/32

Page 4: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

EMFS Overview Target Workload and Assumptions

o Typical personal workload Reading, editing, and backing up documents such as Word, pdf, etc. Targets file sizes ranging from several KBs to tens of MBs

o Users will not share storage with others or allow concurrent access to his/her data.

Design Goalso Usability (generic file system interface)o Scalability (extensible personal storage space)o Reliability (access despite single email failure)

• 4/32

Page 5: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

EMFS System Architecture

• 5/32

• Email File System Interface through FUSE

• Email Mapping Service

• Email Cloud Storage Interface

• Memory Cache

• Local Cache

• …• replicati

on• replicati

on• replicati

on• replicati

on

• striping • striping

• …

Page 6: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Agenda

IntroductionData Organization and AccessEmail-based File System DesignPerformance EvaluationRelated WorkConclusion

• 6/32

Page 7: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Data Organization and Access File Organization

o Metadata

o File Data stored as attachments or in the body of emails

• 7/32

Page 8: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Data Organization and Access cont’d Metadata and Data Access

o Client cache managemento Metadata updateo Data access operations

Consistency and Failure Recoveryo Adopt a mechanism to ensure the atomicity of updates

• (a) Lost metadata update • (b) Lost part of data update

Page 9: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Agenda

IntroductionData Organization and AccessEmail-based File System DesignPerformance EvaluationRelated WorkConclusion

• 9/32

Page 10: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Email Protocol Selection Simple Mail Transfer Protocol(SMTP)

o Only used for transferring emails to the servero Restriction on number of messages sent through SMTP

Internet Message Access Protocol (IMAP)o Support both sending and retrieving messageso Allows users to “append” a message to their own mailboxo Not limited by traffic restrictions

Post Office Protocol (POP)o Primarily used for retrieving emailso Supports simple download-and-delete access pattern

Page 11: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Email Protocol Selection cont’d Email sending and appending performance

o IMAP is faster than SMTP in almost all cases, by 5.5% on average and up to 42.64%

Page 12: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Data Placement Within Emails Multiple places used to store data in an email

o Headerso Subject lineo Bodyo Attachment

o In EMFSo Metadata is stored in the body sectiono The unique identifiers are stored in the subject lineo Data can be stored either as attachments or in the body

Page 13: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Data Placement Within Emails cont’s Single email sending/retrieving performance

o Similar performance regardless of whether the payload is placed in the body or the attachment

o Attachment payload slightly outperforms the body payload with Gmail

Page 14: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Block Size and File Striping Organize email accounts as a RAID

o Each account identified by a ”RAID Index” from 0 to n-1o Data blocks striped across email accountso Blocks stored on randomly chosen disks instead of having a fixed array

of email disks and striping data in a round-robin mannero Metadata emails are usually small, so they are not striped

EMFS uses 512KB as its default block size and 8 as the default stripe width

Page 15: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Block Size and File Striping cont’d Figure 5 measures a 4MB file’s read/write latency

o File access latency steadily decreases when we increase the file block (attachment) size, for both Gmail and Gaweb mail

Page 16: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Block Size and File Striping cont’dFigure 6 and 7 show the effect of striping with different block

sizeso Striping provides a significant performance improvemento Increasing the stripe width beyond 8 or the block size beyond 1MB does

not help the performanceo Block sizes smaller than 256KB degrades performance in almost all cases

Page 17: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Data Replication Replication group

o Consists of two or more disks mirroring the same datao Updates written to one of the email disks within the groupo Email disks (accounts) can be added or removed from a group

Replication Strategieso Read-one and Write-one

All reads and writes from EMFS go to the same email accounto Read-fast and Write-fast

Reads and writes go to different accounts based on their uploading and downloading performance

Page 18: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Agenda

IntroductionData Organization and AccessEmail-based File System DesignPerformance EvaluationRelated WorkConclusion

• 18/32

Page 19: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

EMFS Evaluation System Implementation

o Prototype is based on FUSEo Implemented in around 3000 lines of Python codeo Two replication strategies implemented for comparison

What we doo Compare EMFS with three existing distributed file systemso Use Postmark and IOZone and a synthetic file access benchmark

Experiment Setupo Duo-core desktop (2.66 Ghz) with 3 GB of RAM running Ubuntu 8.10o Both NFS and AFS servers were configured on dedicated machines

inside the campus networko Jungle Disk was configured such that background or asynchronous

transfers were disabledo EMFS was configured using accounts from Gmail and Gawab Mail

Page 20: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Postmark measures performance for network based systems by simulating access on short lived small files

Generate different workloads (equal bias, read heavy, append heavy, and create heavy) by varying the operation bias

Performance Results – Postmark

Settings 200 files File size range from 4K to 16MB 200 transactions

Results AFS and NFS perform better than EMFS

and Jungle Disk EMFS offers comparable performance to

Jungle Disk EMFS-Fast does offer better performance

than EMFS-One

Page 21: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Performance Results – IOZone Unlike Postmark, IOZone mainly focuses on file data access

Settings 16 MB file Request sizes range from 128

KB to 4 MB

Results AFS and Jungle Disk achieve a transfer rate between 25 to 50 MB/s for sequential read EMFS reports very high transfer rates Jungle Disk reports very low throughput (about 550-600 KB/s) for random reads

Page 22: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Performance Results – IOZone cont’d

Settings 16 MB file Request sizes range from 128

KB to 4 MB

Results EMFS is slightly better than Jungle Disk in terms of write throughput NFS and AFS are faster due to their high file transfer performance and low overhead

Page 23: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Performance Results – Editing Workload A synthetic benchmark that simulates a document editing task

Settings 100 files, 14 directories (with

a maximum depth of 3) File sizes range from 8KB to

4MB

Results Lookup operations for AFS is

lightning fast EMFS-Prefetch help reducing the

total lookup time by 17.4% All systems perform nearly the same for editing operations. EMFS-Fast does bring an improvement of 31% for file save operation, which is quite close to

Jungle Disk.

Page 24: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Agenda

IntroductionData Organization and AccessEmail-based File System DesignPerformance EvaluationRelated WorkConclusion

• 24/32

Page 25: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Related Work Email-based file systems

GmailFS [http://sr71.net/projects/gmailfs/] YaFS [Lu, et al., IPDPS 2009] Free email accounts for data backup [Traeger, et al., StorageSS 2006]o EMFS systematically examines email-based file system design issues

Other existing client-server systems LftpFS [http://lftpfs.sourceforge.net/] ExpandDrive [http://en.wikipedia.org/wiki/ExpanDrive]o EMFS enables users to take advantage of widely available and

increasingly powerful web-based email services Distributed file systems

NFS [Pawlowski, et al., USENIX 1994], AFS [Howard, et al., ACM Trans 1998], LBFS [Muthitacharoen, et al., SOSP 2001], GFS [Ghemawat, et al., SOSP 2003], and Ceph [Weil, et al., SODI 2006]

o EMFS complements existing studies on distributed file/storage systems

Page 26: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

Conclusion

To our best knowledge, our work is the first that systematically examines email-based file system design issues, and thoroughly

Contributionso Provides a personal cloud storage solution on top of multiple web-

based free email accountso Implements a prototype based on FUSEo Evaluates the effectiveness of features such as multi-account space

aggregation, file striping, and data replication

• 26/32

Page 27: Computer Science EMFS: Email-based Personal Cloud Storage NAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32.

Computer Science

•Thank you•Questions?

• 27/32


Recommended