+ All Categories
Home > Documents > David P. Anderson Space Sciences Laboratory University of California – Berkeley

David P. Anderson Space Sciences Laboratory University of California – Berkeley

Date post: 21-Jan-2016
Category:
Upload: myrrh
View: 22 times
Download: 0 times
Share this document with a friend
Description:
David P. Anderson Space Sciences Laboratory University of California – Berkeley [email protected]. Public Distributed Computing with BOINC. Public-resource computing. home PCs. your computers. academic. business. 95 96 97 98 99 00 01 02 03 04. - PowerPoint PPT Presentation
27
David P. Anderson Space Sciences Laboratory University of California – Berkeley [email protected] Public Distributed Computing with BOINC
Transcript
Page 1: David P. Anderson Space Sciences Laboratory University of California – Berkeley

David P. AndersonSpace Sciences Laboratory

University of California – [email protected]

Public Distributed Computingwith BOINC

Page 2: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Public-resource computing

95 96 97 98 99 00 01 02 03 04

GIMPS, distributed.net

SETI@home, folding@home

fight*@home

climateprediction.net

names:public-resource computingpeer-to-peer computing (no!)public distributed computing“@home” computing

your computers

academic

business

home PCs

Page 3: David P. Anderson Space Sciences Laboratory University of California – Berkeley

The potential of public computing

● SETI@home: 500,000 CPUs, 65 TeraFLOPs

● 1 billion Internet-connected PCs in 2010, 50% privately owned

● If 100M participate:

– ~ 100 PetaFLOPs

– ~ 1 Exabyte (10^18) storagepublic

computing

Grid

computingcluster

computingsupercomput

ing

pCPU power,storage capacity

cost

Page 4: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Public/Grid differences

Public GridManaged resources? no yesSecure resources? no yesAlways on? no yesAlways connected? no yesNetwork bandwidth Expensive, scarce abundantNetwork connection 1 way (pull) 2 way (pull or push)Must be unobtrusive yes noCredit system yes maybeHow to get resources? complex complexEPO? yes noself-upgrading? yes no

Page 5: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Economics (0th order)cluster/Grid computing public-resource computing

resources ($$)

resources (free)

you

Internet ($$)

Network (free)

$1 buys 1 computer/day or 20 GB data transfer on commercial InternetSuppose processing 1 GB data takes X computer daysCost of processing 1 GB:

cluster/Grid: $XPRC: $1/20

So PRC is cheaper if X > 1/20(SETI@home: X = 1,000)

Page 6: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Economics revisited

Underutilized free Internet (e.g. Internet2)

you

commodity Internet

... other institutions

Bursty, underutilized flat-rate ISP connectionTraffic shapers can send at zero priority

==> bandwidth may be free also

Page 7: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Why isn't PRC more widely used?

● Lack of platform– jxta, Jabber: not a solution– Java: apps are in C, FORTRAN– commercial platforms: business issues– cosm, XtremWeb: not complete

● Need to make PRC technology easy to use for scientists

Page 8: David P. Anderson Space Sciences Laboratory University of California – Berkeley

BOINC: Berkeley Open Infrastructure for Network Computing

● Goals for computing projects

– easy/cheap to create and operate projects

– wide range of applications possible

– no central authority

● Goals for participants

– easy to participate in multiple projects

– invisible use of disk, CPU, network

● NSF-funded; open source; in beta test

– http://boinc.berkeley.edu

Page 9: David P. Anderson Space Sciences Laboratory University of California – Berkeley

SETI@home requirements

ideal:current:

commercialInternet

Berkeley

participants

tapesInternet2

commercialInternet

Berkeley Stanford USC

participants

50 Mbps

0.3 MB = 8 hrs CPU

Page 10: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Climateprediction.net

● Global climate study (Oxford Univ.)● Input: ~10MB executable, 1MB data● CPU time: 2-3 months (can't migrate)● Output per workunit:

– 10 MB summary (always upload)– 1 GB detail file (archive on client, may

upload)● Chaotic (incomparable results)

Page 11: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Einstein@home (planned)

● Gravity wave detection; LIGO; UW/CalTech

● 30,000 40 MB data sets● Each data set is analyzed w/ 40,000

different parameter sets; each takes ~6 hrs CPU

● Data distribution: replicated 2TB servers

● Scheduling problem is more complex than “bag of tasks”

Page 12: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Intel/UCB Network Study (planned)

● Goal: map/measure the Internet● Each workunit lasts for 1 day but is

active only briefly (pings, UDP)● Need to control time-of-day when

active● Need to turn off other apps● Need to measure system load indices

(network/CPU/VM)

Page 13: David P. Anderson Space Sciences Laboratory University of California – Berkeley

General structure of BOINC

● Project:

● Participant:

Scheduling server (C++)

BOINC DB(MySQL) Work

generation

data server (HTTP)

App App

App

data server (HTTP)data server

(HTTP)

Web interfaces

(PHP)

Core client (C++)

Project back end

Retry generation

Result validation

Result processing

Garbage collection

Page 14: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Project web site features

● Download core client● Create account● Edit preferences

– General: disk usage, work limits, buffering– Project-specific: allocation, graphics– venues (home/school/work)

● Profiles● Teams● Message boards, adaptive FAQs

Page 15: David P. Anderson Space Sciences Laboratory University of California – Berkeley

General preferences

Page 16: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Project-specific preferences

Page 17: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Data architecture

● Files

– immutable, replicated– may originate on client or project– may remain resident on client

● Executables are digitally signed● Upload certificates: prevent DOS

<file_info><name>arecibo_3392474_jun_23_01</name><url>http://ds.ssl.berkeley.edu/a3392474</url><url>http://dt.ssl.berkeley.edu/a3392474</url><md5_cksum>uwi7eyufiw8e972h8f9w7</md5_cksum><nbytes>10000000</nbytes>

</file_info>

Page 18: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Computation abstractions

● Applications● Platforms● Application versions

– may involve many files● Work units: inputs to a computation

– soft deadline; CPU/disk/mem estimates● Results: outputs of a computation

Page 19: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Scheduling: pull model

scheduling server

core client

data server

request X seconds of work

host description

result 1

...

result n

download

upload...compute...

Page 20: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Redundant computing

replicator

assimilator

validator

work generator

canonical result

clients

scheduler

select canonical result

assign credit

Page 21: David P. Anderson Space Sciences Laboratory University of California – Berkeley

BOINC core client

core client

file transfersrestartableconcurrentuser limited

program executionsemi-sandboxedgraphics controlcheckpoint control% done, CPU time

app

APIapp

API

shared mem

Page 22: David P. Anderson Space Sciences Laboratory University of California – Berkeley

User interface

screensaver

control panel

core client

RPC

Page 23: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Client control panel

Page 24: David P. Anderson Space Sciences Laboratory University of California – Berkeley
Page 25: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Anonymous platform mechanism

● User compiles applications from source, registers them with core client

● Report platform as “anonymous” to scheduler

● Purposes:– obscure platforms– security-conscious participants– performance tuning of applications

Page 26: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Project management tools

● Python scripts for project creation/start/stop

● Remote debugging– collect/store crash info (stack trace)– web-based browsing interface

● Strip charts– record, graph system performance metrics

● Watchdogs– detect system failures; dial pager

Page 27: David P. Anderson Space Sciences Laboratory University of California – Berkeley

Conclusion

● Public-resource computing is a distinct paradigm from Grid computing

● PRC has tremendous potential for many applications (computing and storage)

● BOINC: enabling technology for PRC

– http://boinc.berkeley.edu


Recommended