+ All Categories
Home > Documents > THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of...

THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of...

Date post: 28-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
20
CONTENTS AN INFORMAL PUBLICATION FROM ACADEMIAS PREMIERE STORAGE SYSTEMS RESEARCH CENTER DEVOTED TO ADVANCING THE STATE OF THE ART IN STORAGE SYSTEMS AND INFORMATION INFRASTRUCTURES. P A R A L L E L D A T A L A B O R A T O R Y C A R N E G I E M E L L O N U N I V E R S I T Y THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL 2002 THE http://www.pdl.cmu.edu/ http://www.pdl.cmu.edu/ http://www.pdl.cmu.edu/ http://www.pdl.cmu.edu/ Over the past 10 years, Carnegie Mellon’s Parallel Data Lab (PDL) has estab- lished itself as academia’s premiere storage systems research center, consistent- ly pushing the state-of-the-art with new storage system architectures, technologies, and design methodologies. Today, the PDL consists of over 40 active researchers and has an annual budget of over $2.5 million. As we pre- pare for the 10th Annual PDL Retreat, it is fun to look back at how we got here. Dr. Garth Gibson founded the PDL in 1993. It started with Gibson and 7 stu- dents from CMU’s CS and ECE Departments. Having recently finished his Ph.D. research, which defined the industry standard RAID terminology for re- dundant disk arrays, Gibson guided the PDL researchers in advanced disk ar- ray research. The name “Parallel Data Lab” comes from this initial focus on parallelism in storage systems. In the PDL’s formative years, its researchers de- veloped technologies for improving failure recovery performance (parity de- clustering) and maximizing performance in small-write intensive workloads (parity logging). They also developed an aggressive prefetching technology (transparent informed prefetching, or TIP) for converting serial access patterns into highly parallel workloads capable of exploiting large disk arrays. The first PDL Retreat was held in October of 1993, and was attended by 20 CMU partici- pants and 11 industry visitors. As is still the case, the first Retreat was highly interactive, allowing the sponsors to hear about and give feedback on PDL re- search and offering the students a chance to de- velop relationships with future colleagues and potential employers. Bill Courtright, a PDL student at the time, re- calls everyone wondering if they would have enough solid content to keep the industry attendees’ attention throughout the 3-day retreat — but of course it was not a problem. Every year since then, the difficult problem has been what to leave out, as the PDL researchers generate more cool ideas than will fit into Greg Ganger & Joan Digney … continued on pg. 12 PDL Celebrates its 10 th Year! PDL’s 10th Year ................................. 1 Director’s Message ............................ 2 New Faculty ...................................... 3 Year in Review .................................. 4 Recent Publications ........................... 5 Database Data Structures ................... 8 Awards & Other News ..................... 10 Survivable Storage........................... 13 Proposals & Defenses ...................... 15 PDL Spring Open House ................. 15 Comings & Goings .......................... 16 Storage Education ............................ 20 EMC Corporation Hewlett-Packard Labs Hitachi, Ltd. IBM Corporation Intel Corporation Microsoft Corporation Network Appliance PANASAS, Inc. Seagate Technology Sun Microsystems Veritas Software Corporation CONSORTIUM MEMBERS Garth displays the Scotch I and II storage hardware used in early TIP and RAID research, two of the first research projects undertaken by the PDL as a group. (1994)
Transcript
Page 1: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

C O N T E N T S

AN INFORMAL PUBLICATIONFROM ACADEMIA’S PREMIERESTORAGE SYSTEMS RESEARCH

CENTER DEVOTED TOADVANCING THE STATEOF THE ART IN STORAGE

SYSTEMS AND INFORMATIONINFRASTRUCTURES.

PA

RA

LLEL DATA LABORA

TO

RY

CA

R

NE

GIE MELLON UNIV

ER

SIT

Y

T H E N E W S L E T T E R O N P D L A C T I V I T I E S A N D E V E N T S • F A L L 20 0 2

T H E

http://www.pdl.cmu.edu/http://www.pdl.cmu.edu/http://www.pdl.cmu.edu/http://www.pdl.cmu.edu/

Over the past 10 years, Carnegie Mellon’s Parallel Data Lab (PDL) has estab-lished itself as academia’s premiere storage systems research center, consistent-ly pushing the state-of-the-art with new storage system architectures,technologies, and design methodologies. Today, the PDL consists of over 40active researchers and has an annual budget of over $2.5 million. As we pre-pare for the 10th Annual PDL Retreat, it is fun to look back at how we got here.

Dr. Garth Gibson founded the PDL in 1993. It started with Gibson and 7 stu-dents from CMU’s CS and ECE Departments. Having recently finished hisPh.D. research, which defined the industry standard RAID terminology for re-dundant disk arrays, Gibson guided the PDL researchers in advanced disk ar-ray research. The name “Parallel Data Lab” comes from this initial focus onparallelism in storage systems. In the PDL’s formative years, its researchers de-veloped technologies for improving failure recovery performance (parity de-clustering) and maximizing performance in small-write intensive workloads(parity logging). They also developed an aggressive prefetching technology(transparent informed prefetching, or TIP) for converting serial access patternsinto highly parallel workloads capable of exploiting large disk arrays.

The first PDL Retreatwas held in October of1993, and was attendedby 20 CMU partic i-pants and 11 industryvisitors. As is still thecase, the first Retreatwas highly interactive,allowing the sponsorsto hear about and givefeedback on PDL re-search and offering thestudents a chance to de-velop relationships withfuture colleagues andpotential employers.

Bill Courtright, a PDLstudent at the time, re-

calls everyone wondering if they would have enough solid content to keep theindustry attendees’ attention throughout the 3-day retreat — but of course itwas not a problem. Every year since then, the difficult problem has been whatto leave out, as the PDL researchers generate more cool ideas than will fit into

Greg Ganger & Joan Digney

… continued on pg. 12

PDL Celebrates its 10th Year!

PDL’s 10th Year................................. 1

Director’s Message ............................ 2

New Faculty ...................................... 3

Year in Review .................................. 4

Recent Publications ........................... 5

Database Data Structures................... 8

Awards & Other News..................... 10

Survivable Storage........................... 13

Proposals & Defenses...................... 15

PDL Spring Open House ................. 15

Comings & Goings .......................... 16

Storage Education............................ 20

EMC Corporation

Hewlett-Packard Labs

Hitachi, Ltd.

IBM Corporation

Intel Corporation

Microsoft Corporation

Network Appliance

PANASAS, Inc.

Seagate Technology

Sun Microsystems

Veritas Software Corporation

C O N S O R T I U MM E M B E R S

Garth displays the Scotch I and II storage hardware used in early TIP and RAID research, two of the first research

projects undertaken by the PDL as a group. (1994)

Page 2: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

2 T H E P D L P A C K E T

T H E P D L P A C K E T

The Parallel Data LaboratorySchool of Computer Science

Department of ECECarnegie Mellon University

5000 Forbes AvenuePittsburgh, PA 15213-3891

VOICE 412•268•6716FAX 412•268•3010

PUBLISHERGreg Ganger

EDITORJoan Digney

The PDL Packet is published once peryear and provided to members of thePDL Consortium. Copies are given toother researchers in industry and aca-demia as well. A pdf version resides inthe Publications section of the PDL Webpages and may be freely distributed.Contributions are welcome.

COVER ILLUSTRATION

Skibo Castle and the lands that com-prise its estate are located in the Kyle ofSutherland in the northeastern part ofScotland. Both ‘Skibo’ and ‘Sutherland’are names whose roots are from OldNorse, the language spoken by theVikings who began washing ashore reg-ularly in the late ninth century. The word‘Skibo’ fascinates etymologists, who areunable to agree on its original meaning.All agree that ‘bo’ is the Old Norse for‘land’ or ‘place.’ But they argue whether‘ski’ means ‘ships’ or ‘peace’ or ‘fairyhill.’

Although the earliest version of Skiboseems to be lost in the mists of time, itwas most likely some kind of fortifiedbuilding erected by the Norsemen. Thepresent-day castle was built by a bishopof the Roman Catholic Church. AndrewCarnegie, after making his fortune,bought it in 1898 to serve as his sum-mer home. In 1980, his daughter, Mar-garet, donated Skibo to a trust that latersold the estate. It is presently being runas a luxury hotel.

Hello from fabulous Pittsburgh!

2002 has been a fun year of building on thegrowth and solidity achieved last year, whichbrought us over $5 million in new governmentfunding, 3 new faculty in key growth areas, anew storage systems class, a new storage sys-tems conference (FAST), and several new stu-

dents and staff. The result has been progress on existing projects combinedwith cool new research directions. This year is also noteworthy for its histori-cal significance, as it is PDL’s 10th year and includes the 10th PDL Retreat.

The PDL continues to pursue a broad array of storage systems research, rang-ing from the underlying devices to the applications that rely on storage. Thepast year saw excellent research progress, new Fellowships for PDL students(one from IBM, one from Intel, and one from Microsoft), numerous studentsspending summers with PDL Consortium companies, and Best Student PaperAwards at two top-tier conferences. Allow me to highlight a few things.

The self-securing devices project has made great strides. Highlighted in April2002 by several news organizations, this project adapts medieval warfare no-tions to the defense of networked computing infrastructures. In a nutshell, de-vices are augmented with relevant security functionality and made intrusion-independent from client OSes and other devices. This architecture makes sys-tems more intrusion-tolerant and more manageable when under attack. Theself-securing devices vision has brought with it many interesting challengesand a healthy source of funding. In the past year, we have developed networkinterface software for containing compromised client systems, expanded onself-securing storage, and come up with a new way of detecting intruders:storage-based intrusion detection. Storage devices are uniquely positioned tospot some common intruder actions (such as scrubbing audit logs and insert-ing backdoors), making this an exciting new concept.

Also exciting has been the continuing growth in database systems researchalong several parallel tracks. One project is developing new data structuresthat simultaneously maximize CPU cache and disk performance. A secondproject complements the first by extending the storage-specific knowledge indatabase storage managers, allowing them to match their access patterns to de-vice characteristics automatically. Another project applies data mining tech-niques to I/O traces in order to characterize their spatial and temporal features;Mengzhi Wang won the Best Student Paper Award at Performance 2002 forthis work. Other projects are creating tools for automatically partitioning largedatabase tables and database architectures for superior memory performance.

Building on our previous work, we have initiated several interrelated projectsin automated storage management. At the lowest level, we are exploring theuse of freeblock scheduling for continuous reorganization of data within stor-age devices. At the level of small collections of storage servers, layered clus-tering balances load among servers without requiring changes to clients or theclient-server protocol. For large systems, at the data center and beyond, we areculling lessons from human organizations to gain traction on dynamic manage-ment of self-configuring, self-organizing components. Not to be outdone bythose inventing buzzwords for the goal of automated storage management, wecollectively refer to these projects by the meta-buzzword “Self-* Storage.”

… continued on pg. 3

FROM THE DIRECTOR’S CHAIR

G r e g G a n g e r

Page 3: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

CONTACTING USWEB PAGES

PDL Home: http://www.pdl.cmu.edu/

Please see our web pages athttp://www.pdl.cmu.edu/PEOPLE/

for further contact information.

FACULTY

Greg Ganger (director)412•268•1297

[email protected]

Anastassia [email protected]

Christos [email protected]

Garth [email protected]

Seth [email protected]

Mor [email protected]

Chris [email protected]

Todd [email protected]

Adrian [email protected]

Mike [email protected]

Srinivasan [email protected]

Dawn [email protected]

Chenxi [email protected]

Hui [email protected]

STAFF MEMBERSKaren Lindenfelser

(pdl business administrator)412•268•6716

[email protected] Bielski

Mike BigriggJohn Bucy

Joan DigneyGregg Economou

Ken TewLinda Whipkey

STUDENTS

F A L L 2 0 0 2 3

P A R A L L E L D A T AL A B O R A T O R Y

Mukesh AgrawalAditya AkellaShimin ChenChris CostaGarth GoodsonJohn Linwood GriffinStavros HarizopoulosJames HendricksAndrew KlostermanVinod Das KrishnanChris LumbAmit ManjhiMichael MesnierSpiros PapadimitriouStratos PapadomanolakisAdam Pennington

David PetrouBrandon Salmon

Asad SamarJiri Schindler

Steve SchlosserBianca Schroeder

Minglong ShaoCraig SoulesJohn Strunk

Eno ThereskaMonica Ullagaddi

Mengzhi WangTed WongJay Wylie

Shuheng Zhou

Other ongoing PDL projects are also producing exciting results. For example,the DIXtrac disk characterization tool has been used to explore the use ofdisk-specific knowledge in file systems, resulting in the Best Student Paper atthe first File and Storage Technologies (FAST) conference. The PASIS projectcontinues to develop tools and methodologies for exploring the complextrade-off space of survivable storage systems. The CHIPS research centercontinues to develop hardware and process technologies to realize MEMS-based storage devices, while PDL researchers are looking at reliability issuesand system-level performance issues for MEMS-based storage. For the latter,we developed a timing-accurate storage emulator that looks to systems like areal MEMS-based storage device. We have built a working freeblock schedul-er inside a FreeBSD device driver and are building demonstration applica-tions to show off in a future code release. This newsletter and the PDLwebsite offer more details and additional research highlights.

On the education front: this spring, for the second time, we offered our newstorage systems course to undergraduates and masters students at CarnegieMellon. Topics spanned the design, implementation, and use of storage sys-tems, from the characteristics and operation of individual storage devices tothe OS, database, and networking techniques involved in tying them togetherto make them useful. The base lectures were complemented by real-world ex-pertise generously shared by 8 guest speakers from industry, including 2CTOs and 4 of the 8 members of the SNIA Technical Council. We continue towork on the storage systems textbook, and two other schools (Johns Hopkinsand NYU) have already picked up and started teaching similar storage sys-tems courses. We view providing storage systems education as critical to thefield’s future, so stay tuned.

I’m always overwhelmed by the accomplishments of the PDL students andstaff, and it’s a pleasure to work with them. As always, their accomplishmentspoint at great things to come.

… continued from pg. 2

F R O M T H E D I R E C T O R ’ S C H A I R

… continued on pg. 4

N E W P D L F A C U L T Y

Dawn SongDr. Dawn Xiaodong Song joined the PDL and the Depart-ments of ECE and CS this fall as an Assistant Professor.She received her Ph.D. in Computer Science from UCBerkeley in 2002, following the defense of her dissertationtitled “Automatic Tools for Building Secure Systems.”Her main research interests are in computer security andapplied cryptography, including security in systems, net-working, databases, electronic commerce. She has workedon a wide range of research projects in the areas of sys-tems and networking security, creating new cryptographic protocols, and de-signing and developing automatic tools for building secure systems.

A. Chris LongDr. A. Chris Long joined the PDL in August as a Post-Doctoral Fellow inECE. He is working with Greg Ganger on user interfaces to allow system ad-ministrators to monitor and manage self-securing network interfaces and stor-age devices. He is also interested in the areas where human-computer

Page 4: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

4 T H E P D L P A C K E T

October 2002❖ Tenth annual PDL Retreat & Work-

shopSeptember 2002❖ Mengzhi Wang awarded Best Stu-

dent Paper at Perf’2002 in Rome❖ Dawn Song and Adrian Perrig join

the PDLAugust 2002❖ Srinivasan Seshan, Assistant Pro-

fessor of CS and ECE, helped orga-nize and served as the TutorialsChair at ACM’s SIGCOMM 2002Conference in Pittsburgh

❖ Christos Faloutsos tutorials: at SIG-COMM 2002 on “Data Mining theInternet,” and at VLDB ‘02 on“Sensor Data Mining: SimilaritySearch and Pattern Analysis”

❖ Chris Long joins the PDLJuly 2002❖ Stavros Harizopoulos spent part of

the summer working with Natassaat the Technical University of Cretein Chania, Greece

June 2002❖ SDI speaker: Michael Kozuch, of

Intel, on “Internet Suspend/Resume”

❖ Thesis Proposal (ECE): Jiri Schin-dler on “Matching Access Patternsto Storage Device Characteristics”

May 2002❖ John Strunk, Jay Wylie, Chris

Lumb and Steve Schlosser internedat HP Labs in Palo Alto

❖ Garth Goodson spent the summerinterning with IBM at Almaden.

❖ Thesis Proposal (ECE): David Pe-trou on “A System for MatchingApplication Resource Supply andDemand”

❖ SDI speaker: Winfried W. Wilcke,of IBM Almaden, on “The IceCubeProject”

April 2002❖ Fourth annual PDL Open HouseMarch 2002❖ Mengzhi Wang spoke on “Data

Mining Meets Performance Evalua-tion: Fast Algorithms for ModelingBursty Traffic” at the 18th ICDE inSan Jose

January 2002❖ Jiri Schindler, John Griffin & Chris

Lumb awarded Best Student Paperat FAST 2002 for “Track-Aligned

Extents: Matching Access Patternsto Disk Drive Characteristics.” Jirigave the conference talk.

❖ Chris Lumb spoke on “FreeblockScheduling Outside Disk Firm-ware” at FAST 2002.

❖ John Griffin spoke on “Track-aligned Extents: Matching AccessPatterns to Disk Drive Characteris-tics” at FAST 2002

❖ Over the spring term 8 visitors fromindustry were guest lecturers for thenew storage course, including:Steve Kleiman, Network Appli-ance; Wayne Rickard, Gadzoox;Dave Anderson, Seagate; RicWheeler, EMC; Jim Hughes, Stor-ageTek; Harald Skardal, NetworkAppliance; John Wilkes, HP; andRoger Cummings, Veritas.

December 2001❖ SDI speaker: PDL Alumni Tammo

Spalink, grad student at Princeton,on “Building a Robust Software-Based Router Using Network Pro-cessors”

November 2001❖ Ninth Annual PDL Retreat &

Workshop

Y E A R I N R E V I E W

N E W P D L F A C U L T Y

interaction and security intersect,such as developing interfaces to helpordinary users manage their electron-ic security and privacy more easilyand effectively.

Chris received his Ph.D. in ComputerScience from UC Berkeley in 2001.His dissertation focused on a tool for

helping designers of pen-based user interfaces create andimprove gestures for their interfaces. He has also workedon interfaces for editing digital video, speech interfaces,multimodal interfaces, and virtual reality.

Adrian PerrigDr. Adrian Perrig joined the PDL as an Assistant Profes-sor in ECE and Engineering and Public Policy at Carn-egie Mellon University. He earned his Ph.D. in ComputerScience from Carnegie Mellon University, and spentthree years during his Ph.D. with Doug Tygar as his advi-sor at UC Berkeley, writing his thesis on “Security Proto-

cols for Broadcast Networks.” Hereceived his Bachelors degree in Com-puter Science from the Swiss FederalInstitute of Technology in Lausanne(EPFL). Adrian’s research interests re-volve around building secure systemsand include network security, securityfor sensor networks and mobile appli-cations.

… continued from pg. 3

Bruce Worthington (Microsoft) and Greg discuss researchduring a Retreat poster session.

Page 5: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

F A L L 2 0 0 2 5

Timing-Accurate StorageEmulation

Griffin, Schindler, Schlosser & Ganger

Conference on File and StorageTechnologies (FAST), January 28-30, 2002. Monterey, CA.

Timing-accurate storage emulationfills an important hole in the set ofcommon performance evaluationtechniques for proposed storagedesigns: it allows a researcher toexperiment with not-yet-existingstorage components in the context ofreal systems executing real applica-tions. As its name suggests, a tim-ing-accura te s torage emula torappears to the system to be a realstorage component with servicetimes matching a simulation modelof that component. This paper pro-motes timing-accurate storage emu-lat ion by describing its uniquefeatures, demonstrating its feasibili-ty, and illustrating its value. A proto-type, cal led the Memulator, is

Application

OS

Application

OSDisk

Normal Computer

Normal Computer Storage Emulator

Simulatorrealtime

RAMcache

b) disk replaced by emulator

a) conventional system

A system with (a) real storage or (b) emu-lated storage. The emulator transparentlyreplaces storage devices in a real system.By reporting request completions at thecorrect times, the performance of differentdevices can be mimicked, enabling full sys-tem-level evaluations of proposed storagesubsystem modifications.

described and shown to produce ser-vice times within 2% of those com-puted by its component simulator forover 99% of requests. Two sets ofmeasurements enabled by the Mem-ulator illustrate its power: (1) appli-cation performance on a modernLinux system equipped wi th aMEMS-based storage device (nosuch device exists at this time), and(2) application performance on amodern Linux system equipped witha disk whose firmware has beenmodified (we have no access to firm-ware source code).

Track-Aligned Extents:Matching Access Patterns toDisk Drive Characteristics

Schindler, Griffin, Lumb & Ganger

Conference on File and StorageTechnologies (FAST) January 28-30,2002. Monterey, CA.

Track-aligned extents (traxtents) uti-lize disk-specific knowledge toma tc h a cc es s p a t t e r n s t o t hestrengths of modern disks. By allo-cating and accessing related data ondisk track boundaries, a system canavoid most rotational latency andtrack crossing overheads. Avoidingthese overheads can increase diskaccess efficiency by up to 50% formid-sized requests (100-500 KB).This paper describes traxtents, algo-rithms for detecting track bound-

aries, and the use of traxtents in filesystems and video servers. For largefile workloads, a modified version ofFreeBSD’s FFS implementationreduces application run times by20% compared to the original ver-sion. A video server using traxtent-based requests can support 56%more concurrent streams at the samestartup latency and buffer space. ForLFS, 44% lower overall write costfor track-sized segments can beachieved.

Capturing the Spatio-Temporal Behavior of RealTraffic DataWang, Ailamaki & Faloutsos

Performance 2002 (IFIP Int. Symp.on Computer Performance Model-ing, Measurement and Evaluation),Rome, Italy, Sept. 2002.

Traffic, like disk and memory ac-cesses, typically exhibits burstiness,temporal locality and spatial locality.There is much recent ground-break-ing work on temporal modeling(self-similarity etc.), on disk andweb traffic, with several statisticalmodels that generate realistic seriesof time-stamps. However, no workgenerates realistic traces for bothtime and location (e.g., block-id). Infact, except for qualitative specula-tions, it is not even known whether/how the time-stamps are correlatedwith the locations, nor how to mea-sure this correlation, let alone how toreproduce it realistically.

These are exactly the problems wesolve here: (a) We propose the‘entropy plots’ to quantify the spa-tial/temporal correlation (or lack ofit), and (b) we propose a new model,the ‘PQRS’ model, that captures allthe characteristics of real spatio-tem-poral traffic. Our model can generatetraffic that is bursty (or uniform) ontime; bursty or uniform on space;and it can mimic the correlation

R E C E N T P U B L I C A T I O N S

… continued on pg. 6

http://www.pdl.cmu.edu/Publications/

Mapping system-level blocks to disk sec-tors. Physical block 101 maps directly todisk sectors 1626-1641. Block 103 is anexcluded block because it spans the disktrack boundary between LBNs 1669-1670.

10 11 12

101 102 103 104 105

1626 1642 1658 1674 1690 1706

file offset

lblkno

physical blocks

blkno

disk sectors

LBN

track boundary

Page 6: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

6 T H E P D L P A C K E T

kernel. This freeblock scheduler cangive 15% of a disk’s potential band-width (over 3.1MB/s) to a back-ground disk scanning task withalmost no impact (less than 2%) onthe foreground request responsetimes. This increases disk bandwidthutilization by over 6x.

My Cache or Yours? MakingStorage More Exclusive

Wong & Wilkes

USENIX Annual Technical Confer-ence (USENIX 2002), June 10-15,2002, Monterey, CA.

Modern high-end disk arrays oftenhave several giga-bytes of cacheRAM. Unfortunately, most arraycaches use management policieswhich duplica te the same datablocks at both the client and arraylevels of the cache hierarchy: theyare inclusive. Thus, the aggregatecache behaves as if it was only as bigas the larger of the client and arraycaches, instead of as large as the sumof the two. Inclusiveness is wasteful:cache RAM is expensive.

We explore the benefits of a simplescheme to achieve exclusive cach-ing, in which a data block is cachedat either a client or the disk array, butnot both. Exclusiveness helps to cre-ate the effect of a single, large uni-f i ed c ac h e . We i n t ro d u c e aDEMOTE operation to transfer dataejected from the client to the array,and explore its effectiveness withsimulation studies. We quantify thebenefits and overheads of demotionsacross both synthetic and real-lifeworkloads. The results show that wecan obtain useful, sometimes sub-stantial, speedups.

During our investigation, we alsodeveloped some new cache-insertionalgorithms that show promise formulti-client systems, and report onsome of their properties.

Analysis of Methods forScheduling Low PriorityDisk Drive Tasks

Schindler & Bachmat

Proceedings of SIGMETRICS 2002Conference, June 15-19, 2002, Mari-na Del Rey, CA.

This paper analyzes various algo-rithms for scheduling low prioritydisk drive tasks. The derived closedform solution is applicable to classof greedy algorithms that include avariety of background disk scanningapplications. By paying close atten-tion to many characteristics of mod-ern disk dr ives , the analyt ica lsolutions achieve very high accuracy– the difference between the predict-ed response times and the measure-ments on two different disks is only3% for all but one examined work-load. This paper also proves a theo-rem which shows that backgroundtasks implemented by greedy algo-rithms can be accomplished withvery li t t le seek penalty. Usinggreedy algorithm gives a 10% short-

RE

AD

DE

MO

TE

D

Read Block

Tail

Head

Ghosts

Tail

Head

Cache

Demoted Block

Operation of read and demoted ghostcaches in conjunction with the array cache.The array inserts the metadata of incom-ing read (demoted) b locks i n to t hecorresponding ghost, and the data into thecache. The cache is divided into segmentsof either uniform or exponentially-growingsize. The array selects the segment intowh ich to i n se r t t he i ncoming read(demoted) block based on the hit count inthe corresponding ghost.

R E C E N T P U B L I C A T I O N S

… continued from pg. 5

… continued on pg. 7

between space and time, wheneversuch correlation exists. Moreover, itrequires very few parameters (p, q, r,and the grand total of disk/memoryaccesses), and it has linear scalabili-ty in computing these parameters.Experiments with multiple real datasets (disk traces from HP Labs,TPC-C memory traces), show thatour model can mimic real traces verywell, while the only obvious alterna-tive, the independence assumption,leads to more than 60x worse error.

Freeblock SchedulingOutside of Disk Firmware

Lumb, Schindler & Ganger

Conference on File and StorageTechnologies (FAST), January 28-30, 2002. Monterey, CA.

Freeblock scheduling replaces a diskdrive’s rotational latency delays withuseful background media transfers,potentially allowing backgrounddisk I/O to occur with no impact onforeground service times. To do so, afreeblock scheduler must be able tovery accurately predict the servicetime components of any given diskrequest – the necessary accuracy wasnot previously considered achiev-able outside of disk firmware. Thispaper describes the design andimplementation of a working exter-nal freeblock scheduler runningeither as a user-level applicationatop Linux or inside the FreeBSD

foreground scheduler freeblock scheduler

pool offreeblockrequests

current best selectionnext selected request

pool offoregroundrequests

fore2 fb2

device driver

dispatchqueue

fore1

fb1

disk

Freeblock scheduling inside a device driver.

Page 7: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

F A L L 2 0 0 2 7

er response time for the foregroundapplication requests and up to a 20%decrease in total background taskrun time compared to results frompreviously published techniques.

Intrusion Detection,Diagnosis, and Recoverywith Self-Securing Storage

Strunk, Goodson, Pennington, Soules & Ganger

Carnegie Mellon University Techni-cal Report CMU-CS-02-140, May2002.

Self-securing storage turns storagedevices into active parts of an intru-sion survival strategy. From behind athin storage interface (e.g., SCSI orCIFS), a self-securing storage serv-er can watch storage requests, keep arecord of all storage activity, andprevent compromised clients fromdestroying stored data. This paperdescribes three ways self-securingstorage enhances an administrator’sability to detect, diagnose, and re-cover from client system intrusions.First, storage-based intrusion detec-tion offers a new observation pointfor noticing suspect activity. Second,post-hoc intrusion diagnosis starts

The self-securing storage interface pro-vides a thin perimeter behind which astorage server can observe requests andsafeguard data. Note that this same pic-ture works for both block protocols, suchas SCSI or IDE/ATA, and distributed filesystem protocols, such as NFS or CIFS.Thus, self-securing storage could be real-ized within many storage servers, includingfile servers, disk array controllers, and evendisk drives.

S4

Storage interfacetreated as protectionboundary

StorageRequests

File Serveror

Block Store

Application

System Calls

Operating System

Client System

Application

File System

RPC orDevice Driver

with a plethora of normally-unavail-able information. Finally, post-intru-sion recovery is reduced to restartingthe system with a pre-intrusion stor-age image retained by the server.Combined, these features can im-prove an organization’s ability tosurvive successful digital intrusions.

The Set-Check-UseMethodology for DetectingError Propagation Failuresin I/O Routines

Bigrigg & Vos

Workshop on Dependability Bench-marking, in conjunction with the In-t e rn a t i o n a l Co n f e r en c e o nDependable Systems and Networks,DSN-2002; June 23-26, 2002, Wash-ington, D.C.

A methodology is presented that willdetect robustness failures in sourcecode where I/O errors could occurand where there is no mechanism inplace to handle the error. The detailsof the methodology are describedshowing how traditional compilerdata flow analysis can be augmentedto find structurally, within the appli-cation, code that can be used to per-form error checking. In addition wedescribe how this code can be usedto ensure the correctness of the I/Oerror checking

Verifiable SecretRedistribution for ThresholdSharing Schemes

Wong, Wang & Wing

Carnegie Mellon University Techni-cal Report CMU-CS-02-114, Febru-ary 2002.

We present a new protocol for verifi-ably redistributing secrets from an(m,n) threshold sharing scheme to an(m',n') scheme. Our protocol guardsagainst dynamic adversaries. We ob-serve that existing protocols either

cannot be readily extended to allowredistribution between differentthreshold schemes, or have vulnera-bilities that allow faulty old share-holders to distribute invalid shares tonew shareholders. Our primary con-tribution is that in our protocol, newshareholders can verify the validityof their shares after redistributionb e t w e en d i ff e r en t t h r e s h o l dschemes.

Fractal Prefetching B+-Trees:Optimizing Both Cache andDisk Performance

Chen, Gibbons, Mowry & Valentin

SIGMOD 2002, June 2002, Madi-son, Wisconsin.

B+-Trees have been traditionally op-timized for I/O performance withdisk pages as tree nodes. Recently,researchers have proposed new typesof B+-Trees optimized for CPUcache performance in main memoryenvironments, where the tree nodesizes are one or a few cache lines.Unfortunately, due primarily to thislarge discrepancy in optimal nodesizes, existing disk-optimized B+-Trees suffer from poor cache perfor-mance while cache-optimized B+-Trees exhibit poor disk performance.In this paper, we propose fractalprefetching B+-Trees (fpB+-Trees),which embed “cache-optimized”trees within “disk-optimized” trees,in order to optimize both cache andI/O performance. We design andevaluate two approaches to breakingdisk pages into cache-optimizednodes: disk-first and cache-first.These approaches are somewhat bi-ased in favor of maximizing disk andcache performance, respectively, asdemonstrated by our results. Bothimplementations of fpB+-Treesachieve dramatically better cacheperformance than disk-optimized

R E C E N T P U B L I C A T I O N S

… continued from pg. 6

… continued on pg. 17

Page 8: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

8 T H E P D L P A C K E T

Fractal Prefetching B+-Trees (fpB+-Trees) are a type of B+-Tree that op-timize both cache and I/O perfor-mance by embeddin g “cache -optimized” trees within “disk-opti-mized” trees. This improves CPUcache performance in traditional B+-Trees for indexing disk resident dataand I/O performance in B+-Trees op-timized for cache. At a coarse granu-larity an fpB+-Tree contains disk-optimized nodes that are roughly thesize of a disk page; at a fine granu-larity, it contains cache-optimizednodes that are roughly the size of acache line. The fpB+-Tree is referredto as “fractal” because of its self-similar “tree within a tree” structure,as illustrated in Figure 1.

Optimizing I/O Performance

One goal of fpB+-Trees is to effec-tively exploit I/O parallelism by ex-plicitly prefetching disk pages evenwhen the access patterns are not se-quential. Prefetching B+-Trees (pB+-Trees) are a proven technique for en-hancing CPU cache performance forindex searches and range scans onmemory-resident data, but can theybe applied to improving I/O perfor-mance for disk-resident data?

All nodes within a pB+-Tree are mul-tiple cache lines wide. To acceleratesearch performance, the pB+-Treeprefetches all cache lines within anode before accessing it. Thus, mul-tiple cache misses may be servicedin parallel, resulting in small overallmiss penalties. The net result is thatsearches become faster because

nodes are larger and trees are shal-lower. To apply this principle todisk-resident data, all pages of anode are prefetched when accessingit. By placing the pages that make upa node on different disks, multiplepage requests can be serviced in par-allel. While the I/O latency is likelyto improve for a single search, I/Othroughput may suffer because ofthe extra seeks for a node. Hence thetarget node size for optimizing diskperformance of fpB+-Trees is a sin-gle disk page.

Range scans are performed bysearching for the starting key of arange, then reading consecutive leafnodes in the tree until the end key forthe range is encountered. To enhancerange scan cache performance, ajump-pointer array scheme, contain-ing the leaf node addresses of thetree used in range scans, is employedfor prefetching the leaf nodes. By is-suing a prefetch for each leaf nodesufficiently in advance of when therange scan needs the node, the cachemisses of these leaves are over-lapped. The same technique can im-prove range scan I/O performance atpage granularity, overlapping leafpage misses. It is particularly helpfulin non-clustered indexes and whenleaf pages are not sequential ondisks. To prevent prefetching pastthe end key, fpB+-Trees begin bysearching for both the start key andthe end key, remembering the rangeend page. This approach is applica-ble for improving the I/O perfor-mance of standard B+-Trees, not justfractal trees, and can lead to a five-fold or more speedup for large rangescans.

Optimizing Cache Performance

fpB+-Trees can be implemented asdisk-first or cache-first. The disk-first approach begins with a disk-op-timized B+-Tree, and organizes thekeys and pointers within each page-sized node as a small tree. To pack

more keys and pointers into the in-page tree, short in-page offsets ratherthan full pointers in all but the leafnodes of the tree are used. Thecache-first approach begins with acache-optimized prefetching B+-Tree and, ignoring disk page bound-aries, seeks to place parent and childnodes on the same page. Adjacentleaf nodes are also placed on thesame page. Ideally, both the disk-first and the cache-first approacheswould achieve identical data layouts,and hence equivalent cache and I/Operformance. In practice, however,mismatching almost always occursbetween the size of a cache-opti-mized subtree and the size of a diskpage causing the two approaches tobe slightly biased in favor of diskand cache performance, respectively.Despite these slight disparities, bothimplementations of fpB+-Treesachieve better cache performancethan disk-optimized B+-Trees.

Disk-First fpB+-Trees start with adisk-optimized B +-Tree, wherepage-sized nodes containing keysand pointers are organized into acache-optimized tree called an in-page tree. Each node in an in-pagetree is aligned on cache line bound-aries and is several cache lines wide.When accessed in a search, all thecache lines comprising the node areprefetched. Disk-first fpB+-Treeshave both leaf and nonleaf in-pagenodes. The nonleaf nodes containpointers to other in-page nodes with-in the same page, and in-page leafnodes contain pointers to nodes ex-ternal to their in-page tree.

If considering cache performanceonly, there is an optimal in-pagenode size, calculated based on therelationships between the number oflevels in the in-page tree, the numberof cache lines of the nonleaf nodesand the number of cache lines of theleaf nodes. Ideally, in-page treesbased on this optimal size would fit

F R A C T A L P R E F E T C H I N G B + - T R E E S

Shimin Chen, Todd Mowry & Joan Digney

… continued on pg. 9

Figure 1: Self-similar “tree within a tree” structure.

Page 9: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

F A L L 2 0 0 2 9

tightly within a page. However,since optimal page size is deter-mined by I/O parameters and diskand memory prices, there is likely amismatch between the two sizes andit is recognized that in most cases,trees with cache-optimal node sizesare not possible. To combat over-flow, the root node size can be re-duced (or its fan-out restricted).Similarly, to combat underflow, theroot node may be extended so that itcan have more children.

Cache-First fpB+-Trees begin witha cache-optimized B+-Tree, and, ig-noring page boundaries, try to intel-ligently place the cache-optimizednodes into disk pages. The tree nodehas the common structure of acache-optimized B+-Tree node: aleaf node contains an array of keysand tuple IDs, while a nonleaf nodecontains an array of keys and point-ers. However, the pointers in nonleafnodes are different. Since the nodesare to be put into disk pages, a point-er is a combination of a page ID andan offset in the page, which allowsus to follow the page ID to retrieve adisk page and then visit a node in thepage by its offset.

There are two goals in node place-ment within a disk page to minimizethe structure’s impact on I/O perfor-mance: (1) group sibling leaf nodestogether into the same page so thatrange scans incur fewer disk opera-tions, and (2) group a parent nodeand its children together into thesame page so that searches only needone disk operation for a parent andits child. To satisfy the first goal,certain pages are designated as leafpages, and contain only leaf nodes.Leaf nodes in the same leaf page aresiblings of one another, ensuringgood range scan I/O performance.The second goal cannot be satisfiedfor all nodes, because only a limitednumber of nodes fit within a page.Moreover, the node size mismatchproblem means that placing a parent

and its children in a page almost al-ways results in either an overflow oran underflow for that page. Largeunderflow situations can be trans-formed by placing the grandchil-dren, the great grandchildren, and soon in the same page, until either amodest underflow or an overflow isincurred. There are two approachesfor dealing with the overflow: anoverflowed child can be placed intoits own page to become the top-levelnode in that page and have its chil-dren placed in the same page, or itcan be stored in a special overflowpage.

There are several fundamental trade-offs between the disk-first and thecache-first implementations of fpB+-Trees. While the performance ofeach of these implementations re-mains slightly biased toward its orig-inal goal, both versions of fpB+-Trees improve upon the cache per-formance of disk-optimized B+-Trees (without significantly degrad-ing I/O performance) as follows: (i)a factor of 1.1-1.8 improvement forsearch; (ii) up to a factor of 4.2 im-provement for range scans; and (iii)up to a 20-fold improvement for up-dates. fpB+-Trees can also be used toaccelerate I/O performance. In par-ticular, an over twofold to fivefoldimprovement for index range scanswas demonstrated in an industrial-strength commercial DBMS (IBM’sDB2). More information on the ex-perimental procedures used to arriveat our conclusions, on the algorithmsused in the creation of the fpB+-Treeindexes and on performance in typi-cal operations of the tree, such asbulkload, search, insertion and dele-tion, in both disk first and cache firstimplementations, is available else-where [1].

Conclusions

Previous studies on improving indexperformance have focused either onoptimizing the cache performance of

memory-resident databases, or onoptimizing the I/O performance ofdisk-resident databases. What hasbeen lacking prior to this study is anindex structure that achieves goodperformance for both of these impor-tant levels of the memory hierarchy.Our experimental results demon-strate that Fractal Prefetching B+-Trees, a novel index structure thatoptimizes both cache and disk per-formance simultaneously, are such asolution. They achieve large gains incache performance compared withdisk-optimized B+-Trees for search-es, range scans, and updates on mod-ern systems. Moreover, they provideup to a fivefold improvement in theI/O performance of range scans on acommercial DBMS (DB2).

References

[1] Chen, S., Gibbons, P.B., Mowry,T.C . and Va len t in , G. F rac ta lPrefetching B+Trees: OptimizingBoth Cache and Disk Performance.In Proceedings of SIGMOD 2002,June 2002, Madison, Wisconsin.

[2] Chen, S., Gibbons, P.B. andMowry, T.C. Improving Index Per-formance through Prefetching. InProceedings of SIGMOD 2001, May2001, pp. 235-246.

F R A C T A L P R E F E T C H I N G B + - T R E E S

… continued from pg. 8

PDL students and faculty alike visit Mark Twain for inspiration

Page 10: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

10 T H E P D L P A C K E T

September 2002Mengzhi Wang Receives Best Stu-dent Paper Award at Performance2002Mengzhi Wang’s paper, “Capturingthe Spatio-Temporal Behavior ofReal Traffic Data,” co-authored withAnastassia Ailamaki and ChristosFaloutsos, received the Best StudentPaper Award at the 22nd IFIP WG7.3 Int’l Symposium on ComputerModeling, Measurement and Evalua-tion, held in Rome, Italy, Sept. 23-27.

July 2002Congratulations Natassa andBabak!Congratulations to Natassa Ailamakiand Babak Falsafi (Assistant Profes-sor of ECE and CS) who were mar-ried on July 7 in Chania, Greece.The rest of the members of the PDLwish them all the best for manyyears to come!

July 2002Greg PromotedWe are pleased to congratulate Gregon his promotion to Associate Pro-fessor this year.

June 2002Welcome to the Newest Member ofthe Ganger Family!Greg, Jenny and Tim Ganger arethrilled to announce the arrival of

Will iam David Ganger, born at11:46 am on June 30, 2002 (a fewdays earlier than planned). Williamweighed in at 7 lbs. 5 oz. and at birthwas 20.5 inches.

June 2002†New Center for Computer andCommunications Security

Carnegie Mellon researchers haveformed a Center for Computer andCommunications Security (C3S) totackle the challenges and problemsrelated to Internet security, data stor-age and privacy issues stemmingfrom America's ongoing war againstterrorism.

The center is multidisciplinary withfaculty coming from Electrical andComputer Engineering, the CERT/CC, Engineering and Public Policy,the School of Computer Science, theStatistics Department, and the HeinzSchool of Public Policy.

Pradeep Khosla, ECE DepartmentHead; Philip and Marsha Dowd Pro-fessor of ECE and Robotics, directorof the C3S, said although securitytechnology is advancing, the Internetis still susceptible to viruses, com-puter intrusions and cyberterrorism.The new center will focus on cut-ting-edge technologies related to se-curity in distributed systems andwireless and optical networks as

well as new technologies to guaran-tee the privacy of information.

April 2002*PDL Makes the HeadlinesOn April 10, C|net News published apress release outlining the use of me-dieval castle architecture by GregGanger and the PDL as the inspira-tion for an innovative approach tocomputer security. This approachhas self-securing devices erectingtheir own security perimeters anddefending their own critical resourc-es just the way individual parts ofmedieval castles formed distinct pro-tective barriers, such as moats, innersanctums, and strategically placedguard towers. The Pittsburgh Post-Gazette and WPXI also visited Gregand the PDL to talk about computersecurity innovations.

April 2002PDL Student Receives IBM Ph.D.FellowshipStavros Harizopoulos (CS, advisedby Anastassia Ailamaki) has beenawarded a prestigious IBM Ph.D.Fellowship for 2002/2003. Thesec o m p e t i t i v eawards recog-nize “outstand-ing re s ea rc hand technicalexcel lence inareas of inter-est to IBM” andprovide a st i-pend , tu i t ionand fees, in addition to an opportuni-ty to pursue technical careers inIBM’s Research Division or devel-opment laboratories.

March 2002*Ailamaki and Harchol-BalterReceive NSF Career AwardsAnastassia Ailamaki and Mor Har-chol-Balter have each been awardeda National Science Foundation CA-REER Award. This prestigious pro-gram recognizes and supports theearly career development of “young

A W A R D S & O T H E R P D L N E W S

… continued on pg. 11

Will Ganger - 6 days old!

Page 11: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

F A L L 2 0 0 2 11

faculty members...most likely to be-come the academic leaders of the21st century.” Selection is made onthe basis of creative, integrative, andeffective research and education ca-reer development plans that build afirm foundation for a lifetime of in-tegrated contributions to researchand education. Anastassia’s re-search focuses on “Bridging Data-bases and Computer Architecture:Optimizing DBMS for Deep Memo-ry Hierarchies”, while Mor explores“The Impact of Resource Schedulingon Improving Server Performance.”

February 2002PDL paper named Best StudentPaper at FAST 2002The program committee of the US-ENIX Conference on File and Stor-a g e t ec h n o l o g i e s ( FA ST ’ 0 2 )presented the Best Student PaperAward to PDL researchers JiriSchindler, John Linwood Griffin,Christopher R. Lumb, and GregoryR. Ganger for their paper “Track-Aligned Extents: Matching AccessPatterns to Disk Drive Characteris-tics.” The conference included 21papers (three of which were PDLsubmissions) chosen from a pool of110 submissions. Jiri, John andChris also each gave talks on theirresearch at the conference.

January 2002**PDL Graduate Student Awarded

MicrosoftResearchFellowshipMicrosoft Corpo-rat ion has cho-sen Shimin Chen,a CS Ph.D. stu-dent, to receive aM i c ro s o f t Re -

search Fellowship. Awarded to 13 of52 applicants, the fellowship offersfinancial support for two years, in-cluding 100 percent of CMU tuitionand fees; a stipend for living expens-es of up to $20,000; a conference

and travel allowance; a laptop com-puter complete with Microsoft soft-ware; and a $1,000 donation to thestudents’ advisor, Todd Mowry, As-sociate Professor of CS and ECE.Chen also has the opportunity to par-ticipate in a 12-week paid internship,allowing him to interact with Mi-crosoft Researchers and work in ar-eas relevant to his own research.

January 2002Intel Equipment GrantThe PDL would like to thank IntelCorporation for its generous dona-tion of equipment in support of ourresearch. The PDL received 3 fullyequipped 1.7 GHz WS530 Xeon DPWorkstations and 100 PIII 850 MHzboxed processors. Included with thedonation is three years of productsupport.

December 2001*PDL Student receives HonorableMention in CRA OutstandingUndergraduate Awards

Cory Williams, aCS/Math Scienc-es s e n i o r an dPD L m e m b e r,received Honor-a b l e M e n t i o nwhen the Com-puting ResearchA s s o c i a t i o nselected the recipients of their Out-standing Undergraduate Awards for2002. Nominees were from universi-ties across North America and it is asignificant honor for Cory to havebeen selected for honorable mentionfrom this group.

Cory’s work focuses on Computerforensics and Intrusion detection,and the benefit achieved if systemlogs continued to be accuratelyrecorded after a system compromise.Specifically, he is working on howto use these accurately recorded sys-tem logs and what should be record-ed if accurate logging is expected.

November 2001Congratulations to CMU’s ACMProgramming Contest WinnersA CMU team consisting of CoryWilliams (PDL), Tom Murphy andEric Heutchy received 4th place inthe East Central North AmericanRegion in the 2001 ACM program-ming contest. In regional competi-tion, they competed at AshlandUniversity, where they placed first.

November 2001*Goldstein Participates in ICCAD2001 Nanotechnology PanelSeth Goldstein, Assistant Professorof Computer Science and ECE, wasone of six panelists to address thequestion “Will NanotechnologyChange the Way We Design and Ver-ify Systems?” at the InternationalConference on Computer-AidedDesign panel session on November7. The panel was part of a confer-ence for EE CAD professionals, heldin San Jose, CA.

Goldstein predicted that nanotech-nology systems would be repro-grammable and designers would usenanotechnology chips’ reconfig-urability to detect and avoid defects.

*SCS Today**ECE News†CMU 8 1/2 x 11 News

A W A R D S & O T H E R P D L N E W S

… continued from pg. 10

Andy concentrating hard on his research.

Page 12: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

12 T H E P D L P A C K E T

The PDL in 1996.

the available time. The first Retreatwas held at the Hidden Valley Resortin Pennsylvania; for the past 6 years,we have gathered at the beautifulNemacolin Woodlands Resort, inFarmington, Pennsylvania.

PDL’s initial seed funding camefrom CMU’s Data Storage SystemsCenter (DSSC), then directed byMark Kryder, and from DARPA(from which most PDL funding hascome over the years). Additionalfunding came from the membercompanies of the PDL Consortium,whose initial members were AT&TGlobal Information Systems, DataGeneral, IBM, Hewlett-Packard,Seagate and Storage Technology.Today, PDL funding comes fromDARPA, NSF, the Air Force Officeof Sponsored Research, and the PDL

Consortium members (listed on thefront page).

In 1995, Gibson and Dr. David Na-gle (then a new ECE faculty mem-ber) launched a new PDL projectcalled Network-Attached SecureDisks (NASD). NASD was a newnetwork-attached storage architec-ture for achieving cost-effective scal-able bandwidth. In addition to theirfundamental research advances, Gib-son founded and chaired an industryworking group within the NationalStorage Industry Consortium (NSIC)

to transferand movet o w ar dss t an d a rd -iza t ion ofthe NASDar ch i t e c -ture. In ‘99,w o rk i n ggroup members pro-duced a concrete pro-posal to launch anANSI standards effortaround object-basedstorage devices (withessentially the NASDarchitecture). Since itss t a r t , t he NA SDproject has stimulated much deriva-tive research and development in ac-ademia and industry.

In 1999, Nagle took over as PDL Di-rector when Gibson went on leave.In 2000, Greg Ganger, who joinedthe ECE faculty and the PDL in1997, jointly directed the PDL withNagle, then became Director in 2001when Nagle went on leave. In itsten-year lifetime, many faculty, staff,and students from both CS and ECEhave been active members of thePDL. PDL’s first Ph.D. graduate wasDr. Mark Holland (1994), who wrotehis dissertation on “On-Line DataReconstruction in Redundant DiskArrays.” Since then, 11 PDL stu-

dents have gradu-ated with Ph.D.s,20 o th e r s wi t hMasters degrees,and more than adozen with under-graduate degrees,many of whomhave moved on to

employment with PDL Consortiumcompanies. There are currently 30PDL students.

From the beginning, the PDL logohas included Skibo Castle, AndrewCarnegie’s summer home. In thepast, it has represented “a fortress ofstorage” (like a redundant disk ar-ray). More recently, it represents a“fortress of security” (à la self-secur-ing devices). Perhaps, though, it issimply our vision of the ideal PDLRetreat venue.

P D L C E L B R A T E S 1 0 T H A N N I V E R S A R Y

… continued from pg. 1

Many lasting friendships have been formed.

Eleven Ph.Ds, twenty Masters and over a dozen undergraduate degrees have been granted to PDL members in the PDL’s first

nine years.

Page 13: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

F A L L 2 0 0 2 13

Digital information is a critical re-source. We need storage systems towhich users can entrust critical infor-mation, ensuring that the data per-s i s t s , i s access ib le , cannot bedestroyed, and is kept confidential. Asurvivable storage system providesthese guarantees, despite failures andmalicious compromises of storagenodes, client systems, and user ac-counts. This article overviews twoPDL projects that address this need:the PASIS project focuses on surviv-ing attacks on storage servers, andself-securing storage focuses on sur-viving intrusions into client systems.

PASIS:

Survivable systems operate from thefundamental design thesis that no in-dividual service, node, or user can befully trusted; having some compro-mised entities is viewed as a com-mon case rather than an exception.Survivable storage systems musttherefore encode and distribute dataacross independent storage nodes,entrusting data persistence to sets ofnodes rather than to individualnodes. Further, if confidentiality isrequired, unencoded data should notbe stored directly on individual stor-age nodes; otherwise, compromisinga single storage node would let an at-tacker bypass access-control poli-cies. With well-chosen encoding anddistribution schemes, significant in-creases in availability, confidentiali-ty, and integrity are possible.Many research groups now explorethe design and implementation ofsuch survivable storage systems.These systems build on mature tech-nologies from decentralized storagesystems and also share a commonhigh-level architecture (Figure 1). Infact, development of survivable stor-age with the same basic architecturewas pursued over 15 years ago. As itwas then, the challenge now is toachieve acceptable levels of perfor-mance and manageability. Moreover,a means to evaluate survivable stor-age systems is needed.

One key to maximizing survivablestorage performance is mindful se-lection of the data distributionscheme. A data distribution schemeconsists of a specific algorithm fordata encoding & partitioning and aset of values for its parameters.There are many algorithms applica-ble to survivable storage, includingencryption, replication, striping, era-sure-resilient coding, secret sharing,and various combinations. Each al-gorithm has one or more tunable pa-rameters, such as the number offragments generated during a writeand the subset needed for a read. Theresult is a large toolbox of possibleschemes, each offering different lev-els of performance (throughput),availability (probability that data canbe accessed), and security (effort re-quired to compromise the confiden-tiality or integrity of stored data).For example, replication providesavailability at a high cost in networkbandwidth and storage space, where-as short secret sharing providesavailability and security at lower

storage and bandwidth cost but high-er CPU utilization. Likewise, select-ing the number of shares required toreconstruct a secret-shared value in-volves a trade-off between availabil-ity and confidentiali ty: if moremachines are compromised to steal asecret, then more must be operation-al to provide it legitimately.

No single data distribution scheme isright for all systems. Instead, theright choice for any particular sys-tem depends on an array of factors,including expected workload, sys-tem component characteristics, anddesired levels of availability and se-curity. Unfortunately, most systemdesigns appear to involve an ad hocchoice, often resulting in a substan-tial performance loss due to missedopportunities and over-engineering.

The PASIS project is developing abetter approach to selecting the datadistribution scheme. At a high level,this new approach consists of threesteps: enumerating possible data dis-tribution schemes (<algorithm, pa-rameters > pairs), modeling theconsequences of each scheme, andidentifying the best-performingscheme for any given set of avail-ability and security requirements.The surface shown in Figure 2 illus-trates one result of the approach.Generating such a surface requirescodifying each dimension of thetrade-off space such that all data dis-tribution schemes fall into a total or-d e r. T h e s u r fa ce s e r v es t w ofunctions: (1) it enables informedtrade-offs among security, availabili-ty, and performance; and (2) it iden-tifies the best-performing scheme foreach point in the trade-off space.Specifically, the surface shown rep-resents the performance of the best-performing scheme that provides atleast the corresponding levels ofavailabili ty and securi ty. Manyschemes are not best at any of thepoints in the space and, as such, arenot visible on the surface.

S U R V I V A B L E S T O R A G E S Y S T E M S

Greg Ganger & Joan Digney

Applications

Decode/Encode

Multi-read/Multi-write

Meta-Data

...StorageNodes

...Shares

DataRequests

IntermediarySoftware

-

...

Figure 1: Generic decentralized storagearchitecture. Intermediary software trans-lates the applications’ unified view ofstorage to the decentralized reality. Encod-ing t ra ns fo rms b l ock s i n to sha re s(decoding does the reverse). Sets of sharesare read from (written to) storage nodes.Intermediary software may run on clients,leader storage nodes, or at some point inbetween.

… continued on pg. 14

Page 14: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

14 T H E P D L P A C K E T

Wylie et al. [2] demonstrate the fea-sibility and importance of carefuldata distribution scheme choice. Theresults show that the optimal choicevaries as a function of workload, sys-tem characteristics, and the desiredlevels of availability and security.Minor (~2x) changes in these deter-minants have little effect, whichmeans that the models need not beexact to be useful. Large changes,which would correspond to distinctsystems, create substantially differ-ent trade-off spaces and best choices.Thus, failing to examine the trade-offspace in the context of one's systemcan yield both poor performance andunfulfilled requirements.

Of course, the research is not done.We continue to refine the configura-tion approach, exploring useful waysto approximate security metrics, andto push the boundaries of efficientdecentralization.

Self-Securing Storage:

Desktop compromises and misbe-having insiders are a fact of moderncomputing. Once an intruder infil-trates a system, he can generally gaincontrol of all system resources, in-cluding its storage access rights(complete rights, in the case of an OSaccessing local storage). Crafty in-truders can use this control to hidetheir presence, weaken system secu-rity, and manipulate sensitive data.Because storage acts as a slave to au-thorized principals, evidence of suchactions can generally be hidden. In

fact, so little of the system state istrustworthy after an intrusion thatthe common “recovery” approachstarts with reformatting storage.

Self-securing storage is an excitingnew technology for enhancing intru-sion survival by enabling the storagedevice to safeguard data even whenthe client OS is compromised. Itcapitalizes on the fact that storageservers (whether file servers, disk ar-ray controllers, or even IDE disks)run separate software on separatehardware. This opens the door toserver-embedded security that can-not be disabled by any software(even the OS) running on client sys-tems as shown in Figure 3. Ofcourse, such servers have a narrowview of system activity, so they can-not distinguish legitimate users fromclever impostors. But, from behindthe thin storage interface, a self-se-curing storage server can activelylook for suspicious behavior, retainan audit log of all storage requests,and prevent both destruction and un-detectable tampering of stored data.The latter goals are achieved by re-taining all versions of all data; in-stead of over-writing old data whena write command is issued, the stor-age server simply creates a new ver-sion and keeps both. Together withthe audit log, the server-retained ver-sions represent a complete history ofsystem activity from the storage sys-tem’s point of view.

Strunk et al. [3] introduced self-se-curing storage and evaluated its fea-sibility. It was demonstrated that,under a variety of workloads, a smallfraction of the capacity of moderndisk drives is sufficient to hold sev-eral weeks of complete storage his-tory. With a prototype implemen-tation, it was also demonstrated thatthe performance overhead of keep-ing the complete history is small.Two recent papers, listed elsewherein this newsletter, delve more deeplyinto how self-securing storage canimprove intrusion survival by safe-guarding stored data and providing

new information regarding storageactivities before, during, and afterthe intrusion. Specifically, self-se-curing storage contributes in threeways:

First, a self-securing storage servercan assist with intrusion detection bywatching for suspicious storage ac-tivity. By design, a storage serversees all requests and stored data, soit can issue alerts about suspiciousstorage activity as it happens. Suchstorage-based intrusion detectioncan quickly and easily notice severalcommon intruder actions, such asmanipulating system utilities (e.g.,to add backdoors) or tampering withaudit log contents (e.g., to concealevidence). Such activities are ex-posed to the storage system evenwhen the client system’s OS is com-promised.

Second, after an intrusion has beendetected and stopped, self-securingstorage provides a wealth of infor-mation to security administratorswho wish to analyze an intruder's ac-tions. In current systems, little infor-mation is available for estimating

S U R V I V A B L E S T O R A G E S Y S T E M S

Figure 2: Data distribution scheme selec-tion surface plotted in trade-off space.

… continued from pg. 13

File Server or

Block Store

IDS

RPC orDeviceDriver

SystemFile

RPC orDeviceDriver

SystemFile

RPC orDeviceDriver

SystemFile

Configurationand Alerts

AdminConsole

Client Systems Storage IDSComponents

Requests

Storage

Operating System

Operating System

Operating System

Network link or storage interconnect physically restricts client access to servers

Figure 3: The compromise independence ofself-securing storage. The storage interfaceprovides a physical boundary between astorage server and client OSes. Note thatthis samd picture works for block proto-co ls , such as SCSI or IDE/ATA, anddistributed file system protocols such asNFS or CIFS.

… continued on pg. 16

Page 15: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

F A L L 2 0 0 2 15

5 neat guys

Jay demonstrates his research to Sony

Greg introduces self-securing devices to the media

Open House poster session

P D L S P R I N G O P E N H O U S E

THESIS PROPOSAL:

A System for MatchingApplication Resource Supplyand Demand

Department of Electrical and Com-puter Engineering, Carnegie MellonUniversity, Pittsburgh, PA, May 22,2002.

David Petrou, ECE

My thesis will show how to ease themanagement of several classes ofadaptive applications. The motiva-tion is that many important applica-tions offer the user a stupefyingnumber of parameters, like whethera data mining workload should befunction- or data-shipping, the reso-lution of a graphics renderer, the siz-es of the caches in a web browser,etc. Further, when running morethan one application, the user has theadditional opportunity (or burden) ofdeciding how resources should be al-

located among running applications.My work will demonstrate situationsin which these decisions can be auto-mated.

THESIS PROPOSAL:

Matching Access Patterns toStorage Device Characteristics

Department of Electrical and Com-puter Engineering, Carnegie MellonUniversity, Pittsburgh, PA, June 24,2002.

Jiri Schindler, ECE

While both operating systems (OS-es) and storage devices are complexsystems with advanced algorithmsattempting to improve I/O perfor-mance, there is very little or no com-munication between the two aboutdevice strengths and weaknesses. Asa result, both systems make deci-sions in isolation that do not realizethe full potential of the storage de-vice’s capabilities. I propose to in-

vestigate what information about de-vice's characteristics, and in whatformat, should be communicated sothat the OS can appropriately adjustapplication access patterns. Thesemodified access patterns will enablethe storage device to service the re-quests much more efficiently.

As part of my research, I want toidentify a minimal set of attributesthat can effectively describe charac-teristics of various classes of storagedevices (e.g., disk drives or high-endstorage arrays). These attributesshould not include any storage-spe-cific or proprietary information thatwould break the model of a singlestorage manager controlling differ-ent device classes. Finally, I willdemonstrate the benefits of this ap-proach on two concrete examples: ablock-based file system (e.g., theFreeBSD implementation of FFS)and query evaluation inside a data-base system.

P R O P O S A L S & D E F E N S E S

Page 16: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

16 T H E P D L P A C K E T

damage (e.g., what information theintruder might have seen or whatdata was modified) and determininghow he gained access. Because in-truders can directly manipulate dataand metadata in conventional sys-tems, they can remove or obfuscatetraces of such activity. With self-se-curing storage, intruders lose thisability–in fact, attempts to do thesethings become obvious red flags forintrusion detection and diagnosis ef-forts. Although technical challengesremain in performing such analyses,they will start with much more infor-mation than forensic techniques canusually extract from current systems.

Third, self-securing storage canspeed up and simplify the intrusionrecovery process. In today's systems,

full recovery usually involves refor-matting, reinstalling the OS fromscratch, and loading user data fromback-up tapes. These steps are takento remove backdoors or Trojan hors-es that may have been left behind bythe intruder. Given server-main-tained versions, on the other hand,an administrator can simply copy-forward the pre-intrusion state (bothsystem binaries and user data) in asingle step. Further, all work doneby the user since the security breachremains in the history pool, allowingincremental (albeit potentially dan-gerous) recovery of important data.

In continuing work, we are explor-ing administrative interfaces for con-figuring and utilizing the features ofself-securing storage. We will also

explore how they complement secu-rity functionality embedded in otherdevices, such as network interfacecards and network switches/routers.

References

[1] Wylie, J., et al. Survivable Infor-mation Storage Systems. IEEE Com-puter, Aug 2000.

[2] Wylie, J., et al. Selecting theRight Data Distribution Scheme fora Survivable Storage System. CMUSCS Technical Report CMU-CS-01-120, May 2001.

[3] Strunk, J.D., et al. Self-SecuringStorage: Protecting Data in Compro-mised Systems. Proc. of the 4thSymposium on Operating SystemsDesign and Implementation, Octo-ber, 2000.

S U R V I V A B L E S T O R A G E S Y S T E M S

F A C U L T Y

Three new faculty members havejoined the PDL this academic year.Please see brief biographies forChris Long, Adrian Perrig and DawnSong beginning on page 3.

S T A F F

Stan Bielski joined the PDL staff asa systems programmer in May, fol-lowing his graduation from PennState University with a B.S. in Com-puter Engineering.

Semih Oguz left his position as a re-search programmer to accompanyhis wife to California where she hastaken up a faculty position at Stan-ford. Semih himself is visiting hisfamily in Turkey before beginninghis search for employment.

Before joining the PDL as office as-sistant to Greg, Mike and Chenxi,Linda Whipkey worked for the Hill-man Company in Pittsburgh, whereshe worked at the Help Desk, main-tained their computer inventory andmanaged their software contracts.

G R A D S T U D E N T S

Mehmet Bakkaloglu completed hisMaster’s research and submitted histhesis entitled “On Correlated Fail-ures in Survivable Storage Systems”in May. He is now at IBM.

Angela Demke Brown is now an As-sistant Professor of Computer Sci-ence at the University of Toronto inToronto, Ontario, Canada.

David Friedman completed his Mas-ter ’s degree in ECE and is nowworking at Oracle.

James Hendricks comes to us fromUC Berkeley. He is pursuing hisPh.D. in Computer Science and willbe researching intrusion-tolerantsoftware with Greg and Adrian.

Mike Mesnier, on educational sab-batical from Intel, joined the PDL asa graduate student in August. Greg isadvising his research on object-based storage and iSCSI.

Vijay Pandurangan has completedhis Masters Degree in ECE and isnow working with Google.

Adam Pennington began his M.S. inECE this spring. He is working onself-securing storage with Greg.

Brandon Salmon is beginning workon his M.S. with Greg, working oncontinuous reorganization. He joinsus from Stanford University.

Eno Thereska has joined the PDL tobegin his Masters Degree in ECE.He will be working with Greg Gang-er on freeblock scheduling and con-tinuous reorganization.

Monica Ullagaddi joined the PDL asan undergraduate programmer andthis fall is beginning her Master’sdegree in ECE, working on self-se-curing NICs with Greg.

U N D E R G R A D U A T E S

Two new undergrads, Chris Costaand Vinod Das Krishnan, joined usover the summer. Greg is advisingChris on AFS tracing and Vinod onfreeblock scheduling. Cory Williamshas completed his degrees and hasmoved on to Microsoft. Russ Koenigis focusing on his studies in ECE.

C O M I N G S & G O I N G S

… continued from pg. 14

Page 17: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

F A L L 2 0 0 2 17

R E C E N T P U B L I C A T I O N S

versions is slower than conventionalversioning systems, checkpointing isshown to mitigate this effect.

Decentralized StorageConsistency via VersioningServers

Goodson, Wylie, Ganger & Reiter

Carnegie Mellon University Techni-cal Report CMU-CS-02-180, Sep-tember 2002.

This paper describes a consistencyprotocol that exploits versioningstorage nodes. The protocol provideslinearizability with the possibility ofread aborts in an asynchronous sys-tem that may suffer client and stor-ag e - n o d e c r as h f a i l u r e s . T h eprotocol supports both replicationand erasure coding (which precludesposthoc repair of partial-writes), andavoids the excess work of two phasecommits. Versioning storage-nodesallow the protocol to avoid excesscommunication in the common caseof no write sharing and no failures ofwriting clients.

Delegation of CryptographicServers for Capture-ResilientDevices

MacKenzie & Reiter

DIMACS Technical Report 2001-37,November 2001. DIMACS is a part-nership of Rutgers University, Princ-e t on Uni ver s i ty, AT& T Labs-Research, Bell Labs, NEC ResearchInst. and Telcordia Technologies.

A device that performs private keyoperations (signatures or decryp-tions), and whose private key opera-tions are protected by a password,can be immunized against offlinedictionary attacks in case of captureby forcing the device to confirm apassword guess with a designated re-mote server in order to perform a

B+-Trees: a factor of 1.1-1.8 im-provement for search, up to a factorof 4.2 improvement for range scans,and up to a 20-fold improvement forupdates, all without significant deg-radation of I/O performance. In ad-dition, fpB+-Trees accelerate I/Operformance for range scans by us-ing jump-pointer arrays to prefetchleaf pages, thereby achieving aspeed-up of 2.5-5 on IBM’s DB2Universal Database.

Blurring the Line BetweenOSes and Storage Devices

Ganger

Carnegie Mellon University Techni-c a l R ep o r t CM U - CS -0 1 - 1 6 6 ,December 2001.

This report makes a case for moreexpressive interfaces between oper-ating systems (OSes) and storagedevices. In today’s systems, the stor-age interface consists mainly of sim-ple read and write commands; as aresult, OSes operate with li tt leunderstanding of device-specificcharacteristics and devices operatewith little understanding of systempriorities. More expressive interfac-es, together with extended versionsof today’s OS and firmware special-izations, would allow the two tocooperate to achieve performanceand functionality that neither canachieve alone.

Metadata Efficiency in aComprehensive VersioningFile System

Soules, Goodson, Strunk & Ganger

Carnegie Mellon University Techni-cal Report CMU-CS-02-145, May2002.

A comprehensive versioning filesystem creates and retains a new fileversion for every write or other mod-ification request. The resulting histo-

ry of file modifications provides adetailed view to tools and adminis-trators seeking to investigate a sus-pect system state. Conventionalversioning systems do not efficientlyrecord the many prior versions thatresult. In particular, the versionedmetadata they keep consumes almostas much space as the versioned data.This paper examines two space-effi-cient metadata structures for ver-sioning file systems and describestheir integration into the Compre-hensive Versioning File System(CVFS). Journal-based metadata en-codes each metadata version into asingle journal entry; CVFS uses thisstructure for inodes and indirectblocks, reducing the associatedspace requirements by 80%. Multi-version b-trees extend the per-entrykey with a timestamp and keep cur-rent and historical entries in a singletree; CVFS uses this structure for di-rectories, reducing the associatedspace requirements by 99%. Experi-ments with CVFS verify that its cur-rent-version performance is similarto that of non-versioning file sys-tems. Although access to historical

… continued from pg. 7

… continued on pg. 18

Journal-based metadata system. This fig-ure shows a single logical block of the file“log.txt” being overwritten several times.Journal-based metadata retains all ver-sions of the data block. However, eachblock is tracked using journal entries. Eachentry points to both the new block and theblock that was overwritten. Only the cur-rent version of the inode and indirect blockare kept, significantly reducing the amountof space required for metadata.

...

...

Tim

e

VersionedData Blocks

CurrentInode

CurrentIndirectBlock

Journal

“log.txt”

Page 18: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

18 T H E P D L P A C K E T

all of these clients. Although diffi-cult, it is extremely desirable tobuild a multi-protocol network filesystem, that is, a storage solutionthat can be used simultaneously byclients of different protocols and se-mantic sets. A semantic mismatch isa major complexity in building amulti-protocol network file system.These are situations that arise whenthe normal behavior of a server, ex-pected by a client using a particularsemantic set, does not occur becauseof the effects of a client from a sepa-rate semantic set. To achieve thegoal of building a multi-protocol filesystem, the file system semantic setsof the targeted file systems must becarefully examined to determinewhere semantic mismatches will oc-cur. Next, the possible means of re-solving a semantic mismatch can beanalyzed for their particular trade-offs. Finally, data from file systemtraces can be used to determine thefrequency of possible semantic mis-matches. The data collected from thefile system traces, when examined inthe context of a cost-benefit analy-sis, can provide designers of multi-protocol network file systems withimportant information for examiningand resolving semantic differences.

Exploring CongestionControl

Akella, Seshan, Shenker & Stoica

Carnegie Mellon University Techni-cal Report CMU-CS-02-139, May2002.

From the early days of modern con-gestion control, ushered in by the de-velopment of TCP’s and DECbit’scongestion control algorithm and bythe pioneering theoretical analysis ofChiu and Jain, there has been wide-spread agreement that linear addi-t i ve - i nc rea se -m u l t i p l i c a t i ve -decrease (AIMD) control algo-rithms should be used. However, theearly congestion control design deci-

private key operation. Recent pro-posals for achieving this allow un-trusted servers and require no serverinitialization per device. In this pa-per we extend these proposals to en-able dynamic delegation from oneserver to another; i.e., the device cansubsequently use the second serverto secure its private key operations.One application is to allow a userwho is traveling to a foreign countryto temporarily delegate to a serverlocal to that country the ability toconfirm password guesses and aidthe user’s device in performing pri-vate key operations, or in the limit,to temporarily delegate this ability toa token in the user's possession. An-other application is proactive securi-ty for the device’s private key, i.e.,proactive updates to the device andservers to eliminate any threat of of-fline password guessing attacks dueto previously compromised servers.

Self-Securing NetworkInterfaces: What, Why andHow

Ganger, Economou & Bielski

Carnegie Mellon University Techni-cal Report CMU-CS-02-144, May2002.

Self-securing network interfaces(NIs) examine packets as they movebetween network links and host soft-ware, looking for and potentiallyblocking malicious network activity.This paper describes self-securingnetwork interfaces, their features,and examples of how these featuresallow administrators to more effec-tively spot and contain maliciousnetwork activity. We present a soft-ware architecture for self-securingNIs that separate scanning softwareinto applications (called scanners)running on an NI kernel. The result-ing scanner API simplifies the con-struction of scanning software andallows its powers to be containedeven if it is subverted. We illustrate

the potential via a prototype self-se-curing NI and two example scanners:one that identifies and blocks knowne-mail viruses and one that identifiesand inhibits rapidly-propagatingworms, e.g. Code-Red.

Examining Semantics InMulti-Protocol Network FileSystemsHogan, Gibson & Ganger

Carnegie Mellon University Techni-cal Report CMU-CS-02-103, Janu-ary 2002.

Network file systems provide a ro-bust tool that can be used by manyphysically dispersed clients. Theyprovide clients with a means of per-manent storage and communication.In order to exploit the resourcesavailable on a network file systemserver, a client must use the protocolof the server’s file system. Althoughthe goal of any protocol is to guaran-tee that the client and server cancommunicate, the introduction ofnew protocols divides clients into in-compatible sets. Soon clients can nolonger cooperate and share becausethey are using different protocols. Inaddition, each network file system isconstructed with a different set of se-mantics. The result is that it is in-creasingly difficult to provide asingle storage solution that supports

… continued from pg. 17

R E C E N T P U B L I C A T I O N S

… continued on pg. 19

... ...E-MailScanner

WebScanner

DNS UsageScanner

Decision Makers

Scanner API

Transport Protocol Reconstruction

NI kernel

hostlink

networklink

HOST NETWORK

Sel

f-S

ecur

ing

NI

Self-securing NI software architecture. An“NI kernel” manages the host and net-work links. Scanners run as applicationprocesses. Scanner access to network traf-fic is limited to the API exported by the NIkernel.

Page 19: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

F A L L 2 0 0 2 19

sions were made in a context whereloss recovery was fairly primitive(e.g. TCP Reno) and often timed-outwhen more than a few losses oc-curred and routers were FIFO drop-tail. In subsequent years, there hasbeen significant improvement inTCP's loss recovery algorithms. Forinstance, TCP SACK can recoverfrom many losses without timingout. In addition, there have beenmany proposals for improved routerqueueing behavior. For example,RED active queue management andExplicit Congestion Notification(ECN) can tolerate bursty ow behav-ior. Per-flow packet scheduling(DRR and Fair Queueing) can pro-vide explicit fairness.

In view of these developments, weseek to answer the following funda-mental question in this paper: DoesAIMD remain the sole choice forcongestion avoidance and controleven in these modern settings? Ifnot, can other mechanism(s) pro-vide better performance? We evalu-ate the four linear congestion controls t y l e s–A IM D, AIA D, M IM D,MIAD–in the context of these vari-ous loss recovery and router algo-rithms. We show that while AIMD isan unambiguous choice for the tradi-tional setting of Reno-style loss re-covery and FIFO drop-tail routers, itfails to provide the best goodput per-formance in the more modern set-tings. Where AIMD fails, AIADproves to be a reasonable alternative.

Web Servers UnderOverload: How SchedulingCan Help

Schroeder & Harchol-Balter

Carnegie Mellon University Techni-cal Report CMU-CS-02-143, May2001.

Most well-managed web servers per-form well most of the time. Occa-sionally, however, every popular

web server experiences transientoverload. An overloaded web servertypically displays signs of its afflic-tion within a few seconds. Work en-ters the web server at a greater ratethan the web server can complete it,causing the number of connectionsat the server to build up. This im-plies large delays for clients access-ing the server. This paper provides asystematic performance study of ex-actly what happens when a webserver is run under transient over-load, both from the perspective ofthe server and from the perspectiveof the client. Second, this paper pro-poses and evaluates a particular ker-nel-level solution for improving theperformance of web servers underoverload. The solution is based onSRPT connection scheduling. Weshow that SRPT-based schedulingimproves overload performanceacross a variety of client and server-oriented metrics.

Cuckoo: Layered clusteringfor NFS

Klosterman & Ganger

Carnegie Mellon University Techni-cal Report CMU-CS-02-183, Sep-tember 2002.

Layered clustering allows unmodi-fied distributed file systems to enjoymany of the benefits of cluster-basedfile services. By interposing betweenclients and servers, layered cluster-ing requires no changes to clients,servers, or the client-server protocol.Cuckoo demonstrates one particularuse of layered clustering: spreadingload among a set of otherwise inde-pendent NFS servers. Specifically,Cuckoo replicates frequently-read,rarely-updated files from each serveronto others. When one server has aqueue of requests, read requests toits replicated files are offloaded toother servers. No client-server proto-col changes are involved. Sitting be-tween clients and servers, the

interposer simply modifies selectedfields of NFS requests and respons-es. Cuckoo provides this load shed-ding with only 2000 semicolons of Ccode. Further, analyses of NFS trac-es indicate that replicating only1000-10,000 objects allows 42-77%of all operations to be offloaded.

On Correlated Failures inSurvivable Storage Systems

Bakkaloglu, Wylie, Wang & Ganger

Carnegie Mellon University Techni-cal Report CMU-CS-02-129, May2002.

The design of survivable storagesystems involves inherent trade-offsamong properties such as perfor-mance, security, and availability. Atoolbox of simple and accurate mod-els of these properties allows a de-signer to make informed decisions.This report focuses on availabilitymodeling. We describe two ways ofextending the classic model of avail-ability with a single “correlation pa-rameter” to accommodate correlatedfailures. We evaluate the efficacy ofthe models by comparing their re-sults with real measurements. Wealso show the use of the models asdesign decision tools: we analyze theeffects of availability and correlationon the ordering of data distributionschemes and we investigate theplacement of related files.

R E C E N T P U B L I C A T I O N S

… continued from pg. 18

… continued on pg. 20

Client 1

Client 2

Client n

Server 1

Server m

UnmodifiedClients

...

LAN ...

UnmodifiedServers

ClusteringSwitch

Layered clustering architecture. Clients,servers, and the client-server protocol areunmodified. The “clustering switch” is theonly change, and it's role is to add theclustering functionality by transparentlytranslating some client requests into redi-rected server requests. The same role canbe played by a collection of small interme-diaries at the front-ends of the servers.

Page 20: THE NEWSLETTER ON PDL ACTIVITIES AND EVENTS FALL T Y … · Sutherland in the northeastern part of Scotland. Both ‘Skibo’ and ‘Sutherland’ are names whose roots are from Old

20 T H E P D L P A C K E T

Storage-based IntrusionDetection: Watching StorageActivity for SuspiciousBehavior

Pennington, Strunk, Griffin, Soules, Goodson & Ganger

Carnegie Mellon University Techni-cal Report CMU-CS-02-179, Sep-tember 2002.

Storage-based intrusion detection al-lows storage systems to transparent-

ly watch for suspicious activity.Storage systems are well-positionedto spot several common intruder ac-tions, such as adding backdoors, in-serting Trojan horses, and tamperingwith audit logs. Further, an intrusiondetection system (IDS) embedded ina storage device continues to operateeven after client systems are com-promised. This paper describes anumber of specific warning signsvisible at the storage interface. It de-scribes and evaluates a storage IDS,

… continued from pg. 19

R E C E N T P U B L I C A T I O N S

embedded in an NFS server, demon-strating both feasibility and efficien-cy of s torage -bas ed i n t ru s io ndetection. In particular, both the per-formance overhead and memory re-quired 40 KB for a reasonable set ofrules) are minimal. With small exten-sions, storage IDSs can also be em-bedded in block-based s toragedevices.

At Carnegie Mellon we are tacklingthe important problem of creatingeducation in storage systems, a sub-ject that is at least as broad and deepas other computer systems topics onwhich universities teach class se-quences (e.g., processor architecture,operating systems, networking, data-bases, and compilers). Ever under-appreciated, storage systems are atthe core of the Information Age, of-fering unmatched opportunities forcurrent and future computing profes-s iona ls . Seemingly boundlessgrowth comes with the transitionfrom paper to digital storage anddigital video, and storage systemsoffer fascinating design and imple-mentation challenges. Their compo-nents ’ inner workings requi reamazing feats of engineering. Build-ing efficient, scalable, reliable, se-cure, cost-effective, manageablestorage systems from these compo-nents requires a storage-orientedcombination of architecture, operat-ing systems, networking, and distrib-uted computing knowledge. Further,storage systems usually dominatethe performance of a system, makingthem one of the few remaining plac-es for performance engineers tothrive. Within the field of computer

systems and computer engineering,there is no area whose demand forbright people and better solutions ismore robust.

Sadly, storage systems are amongthe least understood areas of com-puter systems. The field is rife withbuzzwords, like RAID and NAS andSAN, and bold claims of novelty,scalability, and manageability. But,many seem not to understand the de-tails of storage systems, their conse-quences, or even the fact that thebuzzwords rarely describe new tech-nologies (just new names for oldideas). Historically, universities haveprovided little education in thisspace and there have been few usefulbooks.

For two years, we have taught stor-age systems as a full-semester, 4th-year course focused on storage’s in-corporation and role in computersystems. Topics span the design, im-plementation, and use of storage sys-tems, from the characteristics andoperation of individual storage de-vices to the OS, database, and net-working approaches involved intying them together and makingthem useful. Along the way, we ex-amine several real case studies, the

D E V E L O P I N G S T O R A G E S Y S T E M S E D U C A T I O N

demands placed on storage systemsby important applications, and theimpact of trends and emerging tech-nologies on future storage systems.In the Spring 2002 offering, desig-nated “18-546: Storage Systems,”base lecture material was comple-mented by real-world expertise gen-erously shared by 8 guest speakersfrom industry (including 2 CTOsand 4 of the 8 members of the SNIATechnical Council). The studentswho have taken the class will now bebetter prepared to contribute to thestorage industry today and in the fu-ture. More information can be foundat http://www.ece.cmu.edu/~ganger/ece546.spring02/.

This course and an associated bookon storage systems are an attempt tofill the gaping hole in computer sys-tems education. The book, which isevolving as the class is taught, willmake it much easier for other univer-sities to start offering storage sys-tems education. Hopefully, it willalso be useful to graduates who didnot have access to a storage systemsclass.

Stay tuned!

Greg Ganger, David Nagle & Joan Digney


Recommended