Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | amos-fletcher |
View: | 215 times |
Download: | 0 times |
Slide 1
Recovery Oriented Computing (ROC)
David Patterson
2002 Grad Visit Day
Slide 2
Berkeley’s Research Goals• Have Impact, not just count Journal
Papers– Some universities have bad benchmarks– Recently realized that when goal is not
impact, you rarely have impact (but lots of papers)
• Produce Great Students, not # Journal Papers– Try to create projects that if I were a student,
I would almost kill myself to try to join – Not all projects equally successful in research
impact, but all can produce great students
– As you get further in career, you realize thatStudents are the coin of the academic realm
Slide 3
(One) Berkeley Approach to Systems
• Find an important problem crossing HW/SW Interface, with HW/SW prototype at end,(usually in a graduate course)
• Assemble a band of 3-6 faculty, 12-20 grad students, 1-3 staff to tackle it over 4 years
• Meet twice a year for 3-day retreats with invited outsiders– Builds team spirit, advice on direction, change
course– Offers milestones for project stages– Grad students give 6 to 8 talks Great Speakers
• Write papers, go to conferences, get PhDs, jobs
• End of project party, reshuffle faculty, go to 1
Slide 4
Patterson’s Projects, Faculty, Commercial Impact
• Reduced Instruction Set Computer (RISC)– What: simplified instructions to exploit VLSI: ‘80-’84– With: Sequin@UC, Hennessy@Stanford, Cocke@IBM– Direct Impact: Sun, RISC >90% embedded MPUs
• Symbolic Processing Using RISCs (SPUR)– What: desktop multiprocessor for AI: ‘84 - ‘89 – With: Fateman, Hilfinger, Hodges, Katz, Ousterhout– Direct Impact: PLL => fast serial lines => Silicon
Image
• Redundant Arrays of Inexpensive Disks (RAID)– What: many PC disks for speed, reliability: ‘88 - ‘93– With: Katz, Ousterhout, Stonebraker– Direct Impact:$25B/yr(EMC) 80% nonPC disks RAID
Slide 5
Symbolic Processing Using RISCs: ‘85-’89
• Before Commercial RISC chips• Built Workstation Multiprocessor and
Operating System from scratch(!)• Sprite Operating System• 3 chips: Processor, Cache Controller, FPU
– Coined term “snopping cache protocol”– 3C’s cache miss: compulsory, capacity, conflict
Slide 6
SPUR 10 Year Reunion, January ‘99
• Everyone from North America came!• 19 PhDs: 9 to Academia
– 8/9 got tenure, 4 full professors (already)– 2 Romme fellows (3rd, 4th at Wisconsin)– 3 NSF Presidential Young Investigator Winners– 2 ACM Dissertation Awards– They in turn had produced 30 PhDs (1/99)
• 10 to Industry– Founders of 6 startups, (1 failed, 1 acquired, 1
public)– 2 Department heads (AT&T Bell Labs, Microsoft)
• Very successful group; SPUR Project “gave them a taste of success, lifelong friends”
• “Berkeley is on lunatic fringe on multi-faculty projects”
Slide 7
Group Photo (in souvenir jackets)
• See www.cs.berkeley.edu/Projects/ARC to learn more about Berkeley Systems
Garth GibsonCMU, Founder Panasas
Dave Lee Founder Si. Image
MendelRosen-blum,
Stanford,FounderVMWare
Ben Zorn Colorado,
M/S
David Wood,Wisconsin
Jim Larus, Wisconsin, M/S
MarkHill
Wisc.
SusanEggersWash-ington
Brent Welch Founder, Scriptics
George Taylor, Founder, ?
Shing Kong Si. Image
John Ouster-hout
Founder, Scriptics,E. Cloud
Slide 8
• Networks of Workstations (NOW)– What: big server via switched network of WS ’94-’98 – With: Anderson, Brewer, Culler– Direct Impact: Inktomi + many Internet companies
• Tertiary Disk (TD: a NOW subset project)– What: big, cheap, disk-NOW (for SF Museum) ’96-’99 – Direct Impact: Scale8 (big, reliable Internet storage)
• Intelligent RAM (IRAM)– What: media processor inside DRAM chip: ‘97 - ‘02 – With: Yelick (and Wawrzynek)
• ISTORE/Recovery-Oriented Computing (ROC)– What: Available, Maintainable Servers: HW,SW,LW – With: Yelick/Fox (and Kubiatowicz)
Patterson’s Projects, People, Impact
Slide 9
Network of Workstations (NOW) ‘94 -’98
Construction of 2 HW/SW prototypes: NOW-1 with 32 SuperSPARCs and NOW-2 with 100 UltraSPARC 1s
NOW-2 cluster held the world record for the fastest Disk-to-Disk Sort for 2 years, 1997-1999
NOW-2 cluster 1st to crack the 40-bit key as part of a key-cracking challenge offered by RSA, 1997
NOW-2 made list of Top 200 supercomputers 1997 NOW technology is a foundation of Virtual Interface
(VI) Architecture, a proposed standard that allows fully protected, direct user-level access to the network interface, promoted by Compaq, Intel, & M/S
NOW technology led directly to one Internet startup company (Inktomi), and many other Internet companies rely on clusters
Slide 10
Network of Workstations (NOW) ‘94 -’98
12 PhDs. Note that 3/4 of them went into academia, and that 1/3 are female:
Andrea Arpaci-Desseau, Asst. Professor, Wisconsin, Madison
Remzi Arpaci-Desseau, Asst. Professor, Wisconsin, Madison Mike Dahlin, Assoc. Professor, University of Texas, Austin Jeanna Neefe Matthews, Asst. Professor, Clarkson Univ. Douglas Ghormley, Researcher, Los Alamos National Labs Kim Keeton, Researcher, Hewlett Packard Labs Steve Lumetta, Assistant Professor, U. Illinois, Urbana-Ch. Alan Mainwaring, Researcher, Intel Berkeley Labs Rich Martin, Assistant Professor, Rutgers University Nisha Talagala, Researcher, Network Storage, Sun Micro. Amin Vahdat, Assistant Professor, Duke University Randy Wang, Assistant Professor, Princeton University
Slide 11
Research in Berkeley Courses• RISC, SPUR, RAID, NOW, IRAM, ROC all
started in advanced graduate courses• Make transition from undergraduate student
to researcher in first year graduate courses– First year architecture, operating systems
courses: select topic, do research, write paper, give talk
– Prof meets each team 1-on-1 ~3 times, + TA help – Some papers get submitted and published– Same time to Ph.D. as places with no required
courses
• Requires class size 20 - 40 (e.g., Berkeley)– If 100-200 students / course (school offers
combined BS/MS or professional MS over TV broadcast) => cannot do research in grad courses
Slide 12
Retreat Research Style• Project Reviews with Outsiders
– Twice a year: 3-day retreat@Tahoe– Faculty, students, staff + guests– Key piece is feedback at end– Can change minds of faculty– Breaks enable valuable discussion– Builds team spirit (all play&work)– Helps create deadlines– Helps with technology transfer– Always amazed of value at end
• By far, most important idea to run 10-25 person project– Cost ~ 1 grad student– Visitors donate $ = 4 to 6 grads
Slide 13
Background: Tertiary Disk (part of NOW)
• Tertiary Disk (1997) – cluster of 20 PCs
hosting 364 3.5” IBM disks (8.4 GB) in 7 19”x 33” x 84” racks, or 3 TB. The 200MHz, 96 MB P6 PCs run FreeBSD and a switched 100Mb/s Ethernet connects the hosts. Also 4 UPS units. – Hosts world’s largest art
database:72,000 images in cooperation with San Francisco Fine Arts Museum:Try www.thinker.org
Slide 14
Tertiary Disk HW Failure Experience
Reliability of hardware components (20 months)7 IBM SCSI disk failures (out of 364, or 2%)6 IDE (internal) disk failures (out of 20, or 30%)1 SCSI controller failure (out of 44, or 2%)1 SCSI Cable (out of 39, or 3%)1 Ethernet card failure (out of 20, or 5%)1 Ethernet switch (out of 2, or 50%)3 enclosure power supplies (out of 92, or 3%)1 short power outage (covered by UPS)
Did not match expectations:SCSI disks more reliable than SCSI cables!
Difference between simulation and prototypes
Slide 15
Lessons from Tertiary Disk Project
• Maintenance is hard on current systems– Hard to know what is going on, who is to
blame
• Everything can break– Its not what you expect in advance– Follow rule of no single point of failure
• Nothing fails fast– Eventually behaves bad enough that
operator “fires” poor performer, but it doesn’t “quit”
• Most failures may be predicted
Slide 16
The past: research goals andassumptions of last 15 years
• Goal #1: Improve performance• Goal #2: Improve performance• Goal #3: Improve cost-performance• Assumptions
– Humans are perfect (they don’t make mistakes during installation, wiring, upgrade, maintenance or repair)
– Software will eventually be bug free (Hire better programmers!)
– Hardware MTBF is already very large (~100 years between failures), and will continue to increase
– Maintenance costs irrelevant vs. Purchase price (maintenance a function of price, so cheaper helps)
Slide 17
Learning from others: Bridges•1800s: 1/4 iron truss railroad
bridges failed!•Safety is now part of
Civil Engineering DNA•Techniques invented since
1800s: –Learn from failures vs. successes –Redundancy to survive some
failures–Margin of safety 3X-6X vs.
calculated load– (CS&E version of safety margin?)
•What will people of future think of our computers?
Slide 18
Recovery-Oriented Computing Philosophy
“If a problem has no solution, it may not be a problem, but a fact, not to be solved, but to be coped with over time”
— Shimon Peres (“Peres’s Law”)• People/HW/SW failures are facts, not
problems• Improving recovery/repair improves availability– UnAvailability = MTTR
MTTF– 1/10th MTTR just as valuable as 10X MTBF
(assuming MTTR much less than MTTF)
• Recovery/repair is how we cope with above facts
• Since major Sys Admin job is recovery after failure, ROC also helps with maintenance/TCO• Since Cost of Ownership is 5-10X HW/SW, if necessary, use disk/DRAM space and processor performance for ACME
Slide 19
Approach to ROC• Failure data collection: why do Internet
services fail? What do failures look like?– Collected data from 3 Internet sites:
operator error > 50% of time
• Recovery Benchmarks– Do Recovery Experiments where trigger
faults and see how long to recover– SW RAID: Solaris v. Linux v. Windows 1:5:30
• Margin of Safety: to recover from surprises
• Construct an clustered Email service as an example
Slide 20
Approach To Email Service• Recovery experiments while develop code
and after deployed– E.g., glibc with script to trigger errors (FIG)
• Automated diagnosis– E.g., trace all modules used per request, log if
fail or succeed, put into database, use data mining to find faulty module (Pinpoint: 60% to 90% accurate)
• Fast restart (with Fox)– Partition software so only have to restart subset
of system (5X reduction in Mercury example)
• Reversible Email service: Undo for operators– Rewind, Repair, and Redo: remove virus via time
travel
• See http://roc.cs.berkeley.edu/294fall01/
Slide 21
Interested in ROCing? • Many research opportunities, low hanging
fruit– Failure data collection, analysis, and
publication– Create/Run recovery, Maintainability
benchmarks: compare (by vendor) databases, files systems, routers, …
– Invent, evaluate techniques to reduce MTTR and TCO in computation, storage, and network systems “If it’s important, how can you say it’s impossible if you don’t
try?”Jean Monnet, a founder of European Union
http://ROC.cs.berkeley.edu