David P. AndersonSpace Sciences Laboratory
University of California – [email protected]
Designing Middleware for Volunteer Computing
Why volunteer computing?
● 2006: 1 billion PCs, 55% privately owned● If 100M people participate:
– 100 PetaFLOPs, 1 Exabyte (10^18) storage● Consumer products drive technology
– GPUs (NVIDIA, Sony Cell)
your computers
academicbusine
ss
home PCs
Volunteer computing history
95 96 97 98 99 00 01 02 03 04 05 06
GIMPS, distributed.net
SETI@home, folding@home
commercial projects
climateprediction.net
BOINC
Einstein@homeRosetta@homePredictor@homeLHC@homeBURPPrimeGrid...
Scientific computing paradigms
Grid computing
Supercomputers
Volunteer computing
Cluster computing
ControlBang/buckleast
leastmost
most
BOINC
SETI physicsClimate biomedical
Joe Alice Jens
volunteers
projects
Participation in >1 project
● Better short-term resource utilization– communicate/compute in parallel– match applications to resources
● Better long-term resource utilization– project A works while project B thinks
projectcomputingneeds think
work
think
work
time
Server performance
How many clients can a project support?
Task server architecture
MySQLTransitionerScheduler
Feeder
File deleter
DB purger
Assimilator
Validator
Work creator
Shared mem
clients
Server load (CPU)
Create Send Validate Assimilate File delete DB purge0
50
100
150
200
250
300
350
CPU seconds per 100,000 tasks
ApplicationMySQL
Server load (disk I/O)
Create Send Validate Assimilate File delete DB purge0
50
100
150
200
250
300
350
400
Disk I/O (MB) per 100,000 tasks
Server limits
● Single server (2X Xeon, 100 Mbps disk)– 8.8 million tasks/day– 4.4 PetaFLOPS (if 12 hrs on 1 GFLOPS CPU)– CPU is bottleneck (2.5% disk utilization)– 8.2 Mbps network (if 10K request/reply)
● Multiple servers (1 MySQL, 2 for others)– 23.6 million tasks/day– MySQL CPU is bottleneck– 21.9 Mbps network
Credit
Credit display
Credit system goals
● Retain participants– fair between users, across projects– understandable– cheat-resistant
● Maximize utility to projects– hardware upgrades– assignment of projects to computers
Credit system● Computation credit
– benchmark-based– application benchmarks– application operation counting– cheat-resistance: redundancy
● Other resources– network, disk storage, RAM
● Other behaviors– recruitment– other participation
Benchmarks not whole story
Limits of Volunteer Computing
● How much processing/disk/RAM is out there?
● Combinations of resources● Data from 330,000 SETI@home
participants
Operating system
Number of hosts
GFLOPS per host
GFLOPS total
Windows total 292,688 1.676 490,545 XP 229,555 1.739 399,196 2000 42,830 1.310 56,107 2003 10,367 2.690 27,887 98 6,591 0.680 4,482 Millennium 1,973 0.789 1,557 NT 1,249 0.754 942 Longhorn 86 2.054 177 95 37 0.453 17 Linux 21,042 1.148 24,156 Darwin 15,830 1.150 18,205 SunOS 1,091 0.852 930 Others 1,134 1.364 1,547 Total 331,785 1.613 535,169
Goals of BOINC
● > 100 projects, some churn● Handle big data better
– BitTorrent integration– Use GPUs and other resources– DAGs
● Participation– 10-100 million– multiple projects per participant