+ All Categories
Home > Documents > TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Date post: 02-Jan-2016
Category:
Upload: risa-leblanc
View: 17 times
Download: 1 times
Share this document with a friend
Description:
TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned. Armando Fox University of California,Berkeley [email protected]. Vision: “The Content You Want”. What do above apps have in common? Adapt (collect, filter, transform) existing content… - PowerPoint PPT Presentation
30
Building Internet Services With TACC Armando Fox, UC Berkeley TACC Retrospective: TACC Retrospective: Contributions, Non- Contributions, Non- Contributions, and What Contributions, and What We Really Learned We Really Learned Armando Fox Armando Fox University of University of California,Berkeley California,Berkeley [email protected] [email protected]
Transcript
Page 1: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

TACC Retrospective:TACC Retrospective:Contributions, Non-Contributions, Contributions, Non-Contributions,

and What We Really Learnedand What We Really Learned

Armando FoxArmando FoxUniversity of California,BerkeleyUniversity of California,Berkeley

[email protected]@cs.berkeley.edu

Page 2: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Vision: “The Content You Want”Vision: “The Content You Want”

What do above apps have in common?What do above apps have in common? Adapt (collect, filter, transform) existing Adapt (collect, filter, transform) existing

content…content…according to client constraintsaccording to client constraintsrespecting network limitationsrespecting network limitationsaccording to per-user preferencesaccording to per-user preferences

But:But: Lack of unified framework for designing Lack of unified framework for designing apps that exploit this observationapps that exploit this observation

Page 3: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

ContributionsContributions TACC, TACC, a model for structuring servicesa model for structuring services

TTransformation, ransformation, AAggregation, ggregation, CCaching, aching, CCustomization of Internet contentustomization of Internet content

Scalable TACC serverScalable TACC serverBased on clusters of commodity PC’sBased on clusters of commodity PC’sEasy to author “industrial strength” servicesEasy to author “industrial strength” servicesScalable Network ServiceScalable Network Service (SNS) platform maps app (SNS) platform maps app

semantics onto cluster-based availability mechanismssemantics onto cluster-based availability mechanisms Experience with real usersExperience with real users

~15,000 today at UCB~15,000 today at UCB

Page 4: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

What’s TACC?What’s TACC? Transformation (“local”, “one-to-one”)Transformation (“local”, “one-to-one”)

TranSend, AnonymizerTranSend, Anonymizer Aggregation (“nonlocal”, “many-to-one”)Aggregation (“nonlocal”, “many-to-one”)

Search engines, crawlers, newswatchers Search engines, crawlers, newswatchers CachingCaching

Both original and locally-generated contentBoth original and locally-generated content CustomizationCustomization

Per user: for content generationPer user: for content generationPer device: data delivery, content “packaging”Per device: data delivery, content “packaging”

Page 5: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

TACC Example: TranSendTACC Example: TranSend Transparent HTTP proxyTransparent HTTP proxy On-the-fly, lossy compression of On-the-fly, lossy compression of

specific MIME types (GIF, JPG...)specific MIME types (GIF, JPG...) Cache both original & transformedCache both original & transformed User specifies aggressiveness and User specifies aggressiveness and

“refinement” UI“refinement” UIParameters to HTML & image Parameters to HTML & image

transformerstransformers

TT

$$

C

Page 6: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Top Gun WingmanTop Gun Wingman PalmPilot web browserPalmPilot web browser Intermediate-form page layoutIntermediate-form page layout Image scaling & transcodingImage scaling & transcoding

Controlled by layout engineControlled by layout engine Device-specific ADU Device-specific ADU

marshallingmarshallingIncluding client versioningIncluding client versioningOriginals and device-specific Originals and device-specific

pages cachedpages cached

C

$$

AA

ADUADU

TT

htmlhtml

Page 7: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Application PartitioningApplication Partitioning Client competenceClient competence

Styled text, images, widgets are fineStyled text, images, widgets are fineBitmaps unnecessaryBitmaps unnecessary

Client responsivenessClient responsivenessScrolling, etc. shouldn’t require roundtrip to serverScrolling, etc. shouldn’t require roundtrip to server

Client independenceClient independenceVery late conversion to client-specific formatVery late conversion to client-specific format

Page 8: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

TACC Conceptual Data FlowTACC Conceptual Data Flow

C $WWWA

WWWTFEUser

request

To Internet

Front endFront end accepts RPC-like user requests accepts RPC-like user requests User’s customization profile retrievedUser’s customization profile retrieved Original data fetched from cache or InternetOriginal data fetched from cache or Internet Aggregation/transformation workers operate on Aggregation/transformation workers operate on

data according to customization profiledata according to customization profile

Page 9: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

TACC Model SummaryTACC Model Summary Mostly stateless, composable workersMostly stateless, composable workers Unifies previously Unifies previously ad hocad hoc applications under applications under

one frameworkone framework Encourages re-use through modularizationEncourages re-use through modularization

Composition enables both new services and new Composition enables both new services and new clientsclients

TACC breakdown provides unified way to TACC breakdown provides unified way to think about app structurethink about app structure

Page 10: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Services Should Be Easy To WriteServices Should Be Easy To Write Rapid prototypingRapid prototyping

Insulate workers from “mundane” detailsInsulate workers from “mundane” details Easy to incorporate existing/legacy codeEasy to incorporate existing/legacy code

Few assumptions about code structureFew assumptions about code structureMust support variety of languagesMust support variety of languagesMay be fragileMay be fragile

Composition to leverage existing codeComposition to leverage existing code

Page 11: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Building a TACC ServerBuilding a TACC Server Challenge: Scalable Network Service (SNS) Challenge: Scalable Network Service (SNS)

requirementsrequirementsScalability to 100K’s of users with high availabilityScalability to 100K’s of users with high availabilityCost effective to deploy & administerCost effective to deploy & administer

But, services should remain easy to writeBut, services should remain easy to writeServer provides some bug robustnessServer provides some bug robustnessServer provides availabilityServer provides availabilityServer handles load balancing and scalingServer handles load balancing and scalingPreserve modularity (& componentwise Preserve modularity (& componentwise

upgradability) when deployingupgradability) when deploying

Page 12: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Layered Model of Internet ServicesLayered Model of Internet Services TACC LayerTACC Layer

Programming model based on Programming model based on composable building blockscomposable building blocks

SNS Layer: “large virtual server”SNS Layer: “large virtual server”Implements SNS requirementsImplements SNS requirementsCluster computing for hardware F/T Cluster computing for hardware F/T

and incremental scalingand incremental scaling

httpdhttpd, etc., etc.httpdhttpd, etc., etc.

TACCTACCTACCTACC

ScalableScalableNetwork SvcNetwork Svc

ScalableScalableNetwork SvcNetwork Svc

Exploit TACC model semantics for software F/T Exploit TACC model semantics for software F/T SNS layer is SNS layer is reusablereusable and and isolated isolated from TACCfrom TACC

Application “content” orthogonal to SNS mechanismsApplication “content” orthogonal to SNS mechanismsKey to making apps easy to writeKey to making apps easy to write

Page 13: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Why Use a Cluster?Why Use a Cluster? Incremental scalability, low cost componentsIncremental scalability, low cost components High availability through hardware redundancyHigh availability through hardware redundancy

Goals:Goals: Demonstrate that clusters and TACC fit well Demonstrate that clusters and TACC fit well

togethertogether Separate SNS from TACCSeparate SNS from TACC

Page 14: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Cluster-Based TACC ServerCluster-Based TACC Server Component replication for scaling and availabilityComponent replication for scaling and availability High-bandwidth, low-latency interconnectHigh-bandwidth, low-latency interconnect Incremental scaling: commodity PC’sIncremental scaling: commodity PC’s

C$

LB/FT

Interconnect

FE

$ $

WWWT

FE

FE

WWWA

GUI

Front EndsFront EndsFront EndsFront Ends CachesCachesCachesCaches User ProfileUser ProfileDatabaseDatabase

User ProfileUser ProfileDatabaseDatabase

WorkersWorkersWorkersWorkersLoad Balancing &Load Balancing &Fault ToleranceFault Tolerance

Load Balancing &Load Balancing &Fault ToleranceFault Tolerance

AdministrationAdministrationInterfaceInterface

AdministrationAdministrationInterfaceInterface

Page 15: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

““Starfish” Availability: LB DeathStarfish” Availability: LB DeathFE detects via broken pipe/timeout, restarts LBFE detects via broken pipe/timeout, restarts LB

C$

Interconnect

FE

$ $

WWWT

FE

FE

LB/FT

WWWA

Page 16: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

““Starfish” Availability: LB DeathStarfish” Availability: LB DeathFE detects via broken pipe/timeout, restarts LBFE detects via broken pipe/timeout, restarts LB

C$

Interconnect

FE

$ $

WWWT

FE

FE

LB/FT

WWWA

LB/FT

New LB announces itself (multicast), contacted by workers, gradually rebuilds load tablesNew LB announces itself (multicast), contacted by workers, gradually rebuilds load tables

If partition heals, extra LB’s commit suicideIf partition heals, extra LB’s commit suicideFE’s operate using cached LB info during failureFE’s operate using cached LB info during failure

Page 17: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

““Starfish” Availability: LB DeathStarfish” Availability: LB DeathFE detects via broken pipe/timeout, restarts LBFE detects via broken pipe/timeout, restarts LB

C$

Interconnect

FE

$ $

WWWT

FE

FE

LB/FT

WWWA

New LB announces itself (multicast), contacted by workers, gradually rebuilds load tablesNew LB announces itself (multicast), contacted by workers, gradually rebuilds load tables

If partition heals, extra LB’s commit suicideIf partition heals, extra LB’s commit suicideFE’s operate using cached LB info during failureFE’s operate using cached LB info during failure

Page 18: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Fault Recovery LatencyFault Recovery Latency

Tas

k qu

eue

leng

thT

ask

queu

e le

ngth

Page 19: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Behavior in the LargeBehavior in the Large TranSend: 160 image transformations/sec = 10 TranSend: 160 image transformations/sec = 10

Ultra-1 serversUltra-1 serversPeak seen during UCB traces on 700-modem bank: Peak seen during UCB traces on 700-modem bank:

15/sec15/secAmortized hardware cost <$0.35/user/month (one Amortized hardware cost <$0.35/user/month (one

$5K PC serving ~15,000 subscribers)$5K PC serving ~15,000 subscribers) Wingman: factor of 6-8 worseWingman: factor of 6-8 worse Administration: one undergraduate part-timeAdministration: one undergraduate part-time

Page 20: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Building a Big SystemBuilding a Big System Restartable, atomic workersRestartable, atomic workers

Read-only data from other origin server(s)Read-only data from other origin server(s) Orthogonal separation of scalability/availability Orthogonal separation of scalability/availability

from application “content”from application “content”Multiple lines of defenseMultiple lines of defenseApp modules agree to obey semantics compatible App modules agree to obey semantics compatible

with these mechanismswith these mechanismsCommon-case failure behavior compatible with Common-case failure behavior compatible with

users’ Internet experienceusers’ Internet experienceEnables reuse of whole workers, however diverseEnables reuse of whole workers, however diverse

Page 21: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Availability & Scalability Availability & Scalability SummarySummary

Pervasive strategy: timeout, retry, restartPervasive strategy: timeout, retry, restartTransient failures usually Transient failures usually invisibleinvisible to user to userProcess peers watch each otherProcess peers watch each otherMostly stateless workers, xact support possibleMostly stateless workers, xact support possible

Simplicity from exploiting soft stateSimplicity from exploiting soft statePiggyback status info on multicast beaconsPiggyback status info on multicast beaconsUse of stale LB info fine in practiceUse of stale LB info fine in practice

““Starfish” availability works in practiceStarfish” availability works in practice

Page 22: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Service AuthoringService Authoring Keyword hiliting: < 1 dayKeyword hiliting: < 1 day Wingman: 2-3 weeksWingman: 2-3 weeks Various apps from graduate seminar projectsVarious apps from graduate seminar projects

Safe worker uploadSafe worker uploadAnnotate the WebAnnotate the Web““Channel aggregators”Channel aggregators”

Page 23: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

New Services By CompositionNew Services By Composition Compose existing services to create a new oneCompose existing services to create a new one

~2.5 hours to implement~2.5 hours to implementComposes with TranSend or WingmanComposes with TranSend or Wingman

TranSendMetasearch

Internet

Page 24: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Experience With Real UsersExperience With Real Users Transparent enhancementsTransparent enhancements Minimal downtimeMinimal downtime Low administration costLow administration cost

Multicast-based administration GUIMulticast-based administration GUI Virtually no dedicated resources at UCBVirtually no dedicated resources at UCB

““Overflow pool” of ~100 UltraSPARC serversOverflow pool” of ~100 UltraSPARC servers Users don’t mind relying on middleware proxyUsers don’t mind relying on middleware proxy

Page 25: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Why Now?Why Now? Internet’s critical massInternet’s critical mass Commercial push for many device types Commercial push for many device types

(transistor curves)(transistor curves) Cluster computing economically viableCluster computing economically viable A good time for infrastructural servicesA good time for infrastructural services

Page 26: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Related WorkRelated Work Transformational proxy services: WBI, StrandsTransformational proxy services: WBI, Strands Application partitioning: Wit, InfoPad, PARC Application partitioning: Wit, InfoPad, PARC

Ubiquitous ComputingUbiquitous Computing Computing in the infrastructure: Active Computing in the infrastructure: Active

NetworksNetworks Soft state for simplicity and robustness: Soft state for simplicity and robustness:

Microsoft Tiger, multicast routing protocolsMicrosoft Tiger, multicast routing protocols

Page 27: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Summary of ContributionsSummary of Contributions TACC, a composition-based Internet services TACC, a composition-based Internet services

programming modelprogramming modelcaptures rich variety of appscaptures rich variety of appsone view of customizationone view of customization

No-hassle deployment on a clusterNo-hassle deployment on a clusterAutomatic and robust partial-failure handlingAutomatic and robust partial-failure handlingAvailability & scaling strategies work in practiceAvailability & scaling strategies work in practice

New apps are easy to write, deploy, debugNew apps are easy to write, deploy, debugSNS behaviors are freeSNS behaviors are freeCompose existing services to enable new clientsCompose existing services to enable new clients

Page 28: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Non-Contributions Non-Contributions (a/k/a Future Work)(a/k/a Future Work)

Accidental contributions:Accidental contributions: Legacy code glueLegacy code glue Cheap test rig for next project (prototyping path Cheap test rig for next project (prototyping path

discovery; a bare bones “cluster OS”)discovery; a bare bones “cluster OS”)

Non-contributions:Non-contributions: Fair resource allocation over clusterFair resource allocation over cluster Built-in security abstractionsBuilt-in security abstractions Rich state management abstractionsRich state management abstractions

Page 29: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

What We Really LearnedWhat We Really Learned Design for failureDesign for failure

It will fail anywayIt will fail anywayEnd-to-end argument applied to availabilityEnd-to-end argument applied to availability

Orthogonality is even better than layeringOrthogonality is even better than layeringNarrow interface vs. no interfaceNarrow interface vs. no interfaceA great way to manage system complexityA great way to manage system complexityThe price of orthogonalityThe price of orthogonalityTechniques: Refreshable soft state; Techniques: Refreshable soft state;

watchdogs/timeouts; sandboxingwatchdogs/timeouts; sandboxing

Page 30: TACC Retrospective: Contributions, Non-Contributions, and What We Really Learned

Building Internet Services With TACC Armando Fox, UC Berkeley

Future WorkFuture Work TACC as test rig for NinjaTACC as test rig for Ninja Taxonomy of app structure and platformsTaxonomy of app structure and platforms

What is the “big picture” of different types of What is the “big picture” of different types of Internet services, and where does TACC fit in?Internet services, and where does TACC fit in?

Joint work with Dr. Murray Mazer at the Open Joint work with Dr. Murray Mazer at the Open Group Research InstituteGroup Research Institute

Apply TACC lessons to building reliable Apply TACC lessons to building reliable distributed systemsdistributed systems

Formalize programming modelFormalize programming model


Recommended