David P. Anderson Space Sciences Laboratory University of California – Berkeley...

Post on 13-Dec-2015

214 views 1 download

Tags:

transcript

David P. AndersonSpace Sciences Laboratory

University of California – Berkeleydavea@ssl.berkeley.edu

Designing Middleware for Volunteer Computing

Why volunteer computing?

● 2006: 1 billion PCs, 55% privately owned● If 100M people participate:

– 100 PetaFLOPs, 1 Exabyte (10^18) storage● Consumer products drive technology

– GPUs (NVIDIA, Sony Cell)

your computers

academicbusine

ss

home PCs

Volunteer computing history

95 96 97 98 99 00 01 02 03 04 05 06

GIMPS, distributed.net

SETI@home, folding@home

commercial projects

climateprediction.net

BOINC

Einstein@homeRosetta@homePredictor@homeLHC@homeBURPPrimeGrid...

Scientific computing paradigms

Grid computing

Supercomputers

Volunteer computing

Cluster computing

ControlBang/buckleast

leastmost

most

BOINC

SETI physicsClimate biomedical

Joe Alice Jens

volunteers

projects

Participation in >1 project

● Better short-term resource utilization– communicate/compute in parallel– match applications to resources

● Better long-term resource utilization– project A works while project B thinks

projectcomputingneeds think

work

think

work

time

Server performance

How many clients can a project support?

Task server architecture

MySQLTransitionerScheduler

Feeder

File deleter

DB purger

Assimilator

Validator

Work creator

Shared mem

clients

Server load (CPU)

Create Send Validate Assimilate File delete DB purge0

50

100

150

200

250

300

350

CPU seconds per 100,000 tasks

ApplicationMySQL

Server load (disk I/O)

Create Send Validate Assimilate File delete DB purge0

50

100

150

200

250

300

350

400

Disk I/O (MB) per 100,000 tasks

Server limits

● Single server (2X Xeon, 100 Mbps disk)– 8.8 million tasks/day– 4.4 PetaFLOPS (if 12 hrs on 1 GFLOPS CPU)– CPU is bottleneck (2.5% disk utilization)– 8.2 Mbps network (if 10K request/reply)

● Multiple servers (1 MySQL, 2 for others)– 23.6 million tasks/day– MySQL CPU is bottleneck– 21.9 Mbps network

Credit

Credit display

Credit system goals

● Retain participants– fair between users, across projects– understandable– cheat-resistant

● Maximize utility to projects– hardware upgrades– assignment of projects to computers

Credit system● Computation credit

– benchmark-based– application benchmarks– application operation counting– cheat-resistance: redundancy

● Other resources– network, disk storage, RAM

● Other behaviors– recruitment– other participation

Benchmarks not whole story

Limits of Volunteer Computing

● How much processing/disk/RAM is out there?

● Combinations of resources● Data from 330,000 SETI@home

participants

Operating system

Number of hosts

GFLOPS per host

GFLOPS total

Windows total 292,688 1.676 490,545 XP 229,555 1.739 399,196 2000 42,830 1.310 56,107 2003 10,367 2.690 27,887 98 6,591 0.680 4,482 Millennium 1,973 0.789 1,557 NT 1,249 0.754 942 Longhorn 86 2.054 177 95 37 0.453 17 Linux 21,042 1.148 24,156 Darwin 15,830 1.150 18,205 SunOS 1,091 0.852 930 Others 1,134 1.364 1,547 Total 331,785 1.613 535,169

Goals of BOINC

● > 100 projects, some churn● Handle big data better

– BitTorrent integration– Use GPUs and other resources– DAGs

● Participation– 10-100 million– multiple projects per participant