APleaforGreaterAttentiontoData-IntensiveDiscovery,GreaterInvestmentinIntellectualandSoftwareInfrastructure,and
GreaterUseoftheCommercialCloud
RemarkstotheCSTBColloquiumonFutureCyberinfrastructureforScientificDiscovery
EdLazowskaBill&MelindaGatesChairin
ComputerScience&Engineeringand
FoundingDirectoroftheeScienceInstitute
UniversityofWashington
September2016http://lazowska.cs.washington.edu/Cyberinfrastructure.pptx,pdf
“It’sdéjàvualloveragain”
2
Thoughts on “The Future of Advanced Cyberinfrastructure for Science and Engineering Research and Education”
Ed LazowskaBill & Melinda Gates Chair in
Computer Science & EngineeringFounding Director, eScience InstituteUniversity of Washington
National Science Board
September 2013
http://lazowska.cs.washington.edu/NSB.pdf 3
This morning …
❚ Why must America remain the world leader in computer science?
❚ How did we gain the lead, and how can we retain it?❚ How should our competitiveness be defined?❚ The coming decade: Dramatic improvements in technology
and algorithms enable “smart everything”❚ Cyberinfrastructure to support 21st century “smart
discovery”❙ Implications for academia❙ Implications for research policy❙ Implications for K-12 education
4
APleaforGreaterAttentiontoData-IntensiveDiscovery,GreaterInvestmentinIntellectualandSoftwareInfrastructure,and
GreaterUseoftheCommercialCloud
RemarkstotheCSTBCommitteeonFutureDirectionsforNSFAdvancedComputingInfrastructuretoSupportUSSciencein2017-2020
EdLazowskaBill&MelindaGatesChairin
ComputerScience&Engineeringand
FoundingDirectoroftheeScienceInstitute
UniversityofWashington
December2014http://lazowska.cs.washington.edu/CSTB.pptx,pdf 5
Thismorning
• Data-intensivediscovery• TheUniversityofWashingtoneScienceInstitute• Implicationsforacademia• Implicationsforresearchpolicy• Thecommercialcloud• Somepossibleactions
[PartiallyanadaptationofmaterialpresentedtotheNationalScienceBoardinOctober2013]
6
“Thereyougoagain”
7
“Itain’t over‘tilit’sover”
8
Relevantbiographicalinformation
• A.B.,1972,BrownUniv.,independentconcentrationin“Non-NumericalComputerScience”;M.Sc.,1974,Ph.D.,1977,Univ.ofToronto,inComputerScience
• Univ.ofWashingtonfacultymembersincethattime• Relevantnationalroles
– ChairofNSFCISEAC(1998-99),DARPAISAT(2005-06)– Co-Chair(withMarcBenioff)ofthe(late)PITAC,2003-05– Co-Chair(withDavidE.Shaw)ofthePCASTWorkingGrouptoreviewthe
FederalNITRDProgram,2010– MemberofCSTB(1996-2002),DoEPacificNorthwestNationalLaboratory
Fundamental&ComputationalSciencesDirectorateAC(2009-15),NASAACInformationTechnologyInfrastructureCommittee(2012-13)
• FoundingDirector,Univ.ofWashingtoneScienceInstitute,2008
9
Thismorning
• Data-intensivediscovery• TheUniversityofWashingtoneScienceInstitute• Implicationsforacademia• Implicationsforresearchpolicy• Thecommercialcloud• Somepossibleactions
10
Exponentialimprovementsintechnologyandalgorithmsareenablingarevolutionindiscovery
• Aproliferationofsensors• Evermorepowerfulmodelsproducingdatathatmustbeanalyzed• Thecreationofalmostallinformationindigitalform• Dramaticcostreductionsinstorage• Dramaticincreasesinnetworkbandwidth• Dramaticcostreductionsandscalabilityimprovementsin
computation• Dramaticalgorithmicbreakthroughsinareassuchasmachine
learning
11
Nearlyeveryfieldofdiscoveryistransitioningfrom“datapoor”to“datarich”
Astronomy:LSSTPhysics:LHC
Oceanography:OOI
Sociology:TheWeb
Biology:Sequencing
Economics:POSterminals
Neuroscience:EEG,fMRI
12
TheFourthParadigm
1. Empirical+experimental2. Theoretical3. Computational4. Data-Intensive
JimGray
Eachaugments, vs.supplants,itspredecessors– “anotherarrowinthequiver”
13
“Fromdatatoknowledgetoaction”
• Theabilitytoextractknowledgefromlarge,heterogeneous,noisydatasets– tomove“fromdatatoknowledgetoaction”– liesattheheartof21stcenturydiscovery
• Toremainattheforefront,researchersinallfieldswillneedaccesstostate-of-the-artdatasciencemethodologiesandtools
• Thesemethodologiesandtoolswillneedtoadvancerapidly,drivenbytherequirementsofdiscovery
• Datascienceisdrivenmorebyintellectualinfrastructure(humancapital)andsoftwareinfrastructure(sharedtoolsandservices–digitalcapital)thanbyhardware
• Datascienceisinextricablylinkedtothecommercialcloud:cost-effectivescalablecomputingandstorageforeveryone
14
Mypersonalstory,andthestoryoftheUWeScienceInstitute
Early1980s
Late1990s
15
Credit:JohnDelaney,UniversityofWashington 16
2004MarkEmmert
EdLazowska,ComputerScience&Engineering TomDaniel,Biology WernerStuetzle,Statistics
“WhenIwasatLSUIporked measupercomputercenter.IwasthinkingI’ddothathere.”
17
UWeScienceInstitute
• “Allacrossourcampus,theprocessofdiscoverywillincreasinglyrelyonresearchers’abilitytoextractknowledgefromvastamountsofdata...Inordertoremainattheforefront,UWmustbealeaderinadvancingthesetechniquesandtechnologies,andinmaking[them]accessibletoresearchersinthebroadestimaginablerangeoffields.”
18
Thiswasnotasobvious~2006asitistoday
• ButweaskedUW’sleadingfaculty– acrossallagesandfields,andregardlessof“label”– andtheyconfirmedthisviewofthefuture– Fromitsinception,thisefforthas
beenbottom-up,needs-based,grass-roots,drivenbythescientists
• Therewasvociferousnationalknuckle-dragginguntilseveralyearsafterthe2010PCASTreport
• Low-levelUniversityofWashingtonknuckle-draggingcontinuestothisday
19
• UniversityofWashington– $720,000/yearforstaffsupport– $750,000/yearforfacultysupport
• NationalScienceFoundation– $2.8millionover5yearsforgraduateprogram
developmentandPh.D.studentfunding(IGERT)
• GordonandBettyMooreFoundationandAlfredP.SloanFoundation– $37.8millionover5yearstoUW,Berkeley,NYU
• WashingtonResearchFoundation– $9.3millionover5yearsforfacultyrecruitingpackages,
postdocs• Also$7.1milliontotheclosely-alignedInstitutefor
Neuroengineering 20
Over-archingobjective
WorkwithourBerkeley,NYU,andFoundationpartnerstocarryoutadistributedcollaborativeexperimentincreatinguniversityenvironmentsinwhichdata-intensivediscoveryflourishes
21
Originalcorefacultyteam
EdLazowskaCSE
Datasciencemethodology
Biologicalsciences
Environmentalsciences
Socialsciences
Physicalsciences
CeciliaAragonHumanCenteredDesign&Engr.
MagdaBalazinskaComputerScience&Engineering
CarlosGuestrinCSE
BillHoweiSchool
RandyLeVequeAppliedMathematics
WernerStuetzleStatistics
TomDanielBiology
GingerArmbrustOceanography
AndyConnollyAstronomy
JohnVidaleEarth&SpaceSciences
JoshBlumenstockiSchool
MarkEllisGeography
TylerMcCormickSociology,Statistics,CSSS
Thom.RichardsonStatistics,CSSS
EmilyFoxStatistics
JeffHeerCSE
BillNobleGenomeSciences
DavidBeckChemicalEngr.
22
Originalcorefacultyteam
EdLazowskaCSE
Datasciencemethodology
Biologicalsciences
Environmentalsciences
Socialsciences
Physicalsciences
CeciliaAragonHumanCenteredDesign&Engr.
MagdaBalazinskaComputerScience&Engineering
CarlosGuestrinCSE
BillHoweiSchool
RandyLeVequeAppliedMathematics
WernerStuetzleStatistics
TomDanielBiology
GingerArmbrustOceanography
AndyConnollyAstronomy
JohnVidaleEarth&SpaceSciences
JoshBlumenstockiSchool
MarkEllisGeography
TylerMcCormickSociology,Statistics,CSSS
Thom.RichardsonStatistics,CSSS
EmilyFoxStatistics
JeffHeerCSE
BillNobleGenomeSciences
DavidBeckChemicalEngr.
23
We’reatthedawnofarevolutionaryneweraofdiscoveryandoflearning
24
Implicationsforacademia
25
• ComputerScienceisafieldthatisuniqueinitssocietalandinstitutionalimpact
Implicationsforresearchpolicy
• NSFhasauniqueroleindrivingadvancesinComputerScience– ComputerSciencedoesnothaveanNIHoraDepartmentofEnergy
– NSFprovides82%ofFederalsupportforbasicresearchinComputerScienceinacademia• 53%ofFederalsupportforallresearchinComputerScienceinacademia
• Otherfieldsarebecominginformation fields,notjustcomputationalfields– TheintellectualapproachesofComputerScienceareasimportantto
advancesasiscyberinfrastructure
– Newapproacheswillenablenewdiscoveries– “Firstwedofaster…thenlaterwedodifferent/smarter/better”
26
• Meetingevolvingcyberinfrastructureneedsrequiresresearch,notmerelyprocurement– ThisistrueforHPC…andfordata-intensivediscovery…andforcyber-
enabledadvancesineducationandassessment
• Meetingevolvingcyberinfrastructureneedsrequiresinvestmentinintellectual aswellasphysicalinfrastructure– Wehaveacrazyobsessionwithbuyingshinyobjects– thebiggerand
moreexpensive,thebetter!
27
• Advancingdata-intensivediscoveryrequiresbroad-basedprogramsthatstrivetocreatea“virtuouscycle”– andthatdriveinstitutionalchange
28
• Nationallyandinstitutionally,therearevariouspoliciesthatdistortbehavior– andthatshouldbechanged– Oneexample:Useofcommercialcloudresourcesisdiscouragedby
• Indirectcostonoutsourcedservices(andnot onequipmentpurchases)– Thisistotallynuts!
• MRIviewedasapotseparatefromDirectorates/Divisions• Institutionalsubsidies(power,cooling,space)
• We’reinvesting9:1inhardwareoversoftware1 – itoughttobethereverse!
1 AccordingtoEdSeidelwhenhewasatNSF 29
• In1984,throughtheestablishmentoftheOfficeofScientificComputingandthelaunchoftheSupercomputerCentersProgram,NSFleadershipcatalyzedthewidespreadadoptionofnumericalcomputationalscience– Althoughthefocuswasfartoogreatonhardware,fartoosmallonsoftware
andoncomputerscienceresearch
• NSFshouldbeexercisingthesamesortofleadershipandcatalysisfordata-intensivediscovery,butitislargelywhiffing– IMPORTANT:Thisisnotasuggestionthat“nationalcenters”arecalledfor!
30
Thecommercialcloud
31
Wehaveadoggedresistancetoutilizingcommercialsoftware,services,andsystems
• Wepurchaseourown• Weoperateourown• Werollourown• Oftenwithamateurs• Why?
– Outmodedpolicies– Subsidies– Defenseofturf– Politics– Peoplewhosepaychecksdependonconvincing
youthatyourneedsaresospecialthatnocommercialofferingcouldpossiblybesuitable
– Failuretodohard-nosedcost-benefitanalyses
CanacommercialRDBMShostlarge-scalesciencedata?
32
SomeAmazoncustomers
What’ssospecialaboutourrequirements,comparedtotheirs,thatcausesustodoggedlyadheretotheoldworld?
33
1. Essentiallyinfinitecapacity2. Youpayforexactly whatyouuse:Instantaneousexpansionand
contraction3. Zero capitalcost:Theuseravoidsinvestmentininfrastructure
that’sredundant,under-utilized,andhasashortlifetime4. Burstcapacity:1,000processorsfor1daycoststhesame(or
less)as1processorfor1,000days– totallyrevolutionary!5. 7x24x365operationssupport6. Reliability:Auxiliarypower,redundantnetworkconnections,
geographicaldiversity7. Formanyservices,someoneelsehandlesbackup,someoneelse
handlessoftwareupdates
Keyattributesofthecommercialcloud
34
8. Sharingandcollaborationareeasy9. Thisenhancesreproducibility– investigatorsusethesametools
anddata… exactlythesamecomputationalenvironment10. Itcontinuouslygetsbigger,faster,lessexpensive11. Capabilitiesevolveatarapidpace
35AWSStack Credit:JamieKinney,Amazon
12. Configurationchoicesevolveatarapidpace
36
AWSInstanceTypeHistory Credit:JamieKinney,Amazon
13. Purchasemodelsevolveatarapidpace
37
AWSPurchaseModels Credit:JamieKinney,Amazon
14. Competitionisgrowingatarapidpace
– Includingcompetitionforacademicandcommercialscienceworkloadsanddatasets!
38
Credit:WernerVogels,Amazon
39
Somepossibleactions
• Eliminateoverheadonoutsourcedcloudservices– TheUniversityofWashingtonhasunilaterallydonethis!
• AttributeMRIstoDirectorates/Divisions• Takestepstoencourageandevolvedata-intensivediscoverythat
areatleastasaggressiveasthestepstakendecadesagotoencouragenumericalcomputationalscience
• Establishtheuseofcommercialcloudservicesasthestrongdefaultforscienceatallscales.Everyrequesttopurchasecomputingequipmentthatwon’tfitonadesktopshouldberigorouslyjustified.Investinintellectualinfrastructure,softwareinfrastructure,andoutsourcedservices,notbigshinyobjects!
40
• Donotallow agroupwithoutarock-solidtrackrecordtoberesponsibleforthecreationofcomplexmission-criticalsoftwareinfrastructure(e.g.,forMREFCs)
• Majornationalfacilities– totheextentthatthesearenecessaryatall– shouldbeusedonlybyapplicationsthattrulyrequirethem
• Takeadditionalstepstoencouragereproducibleresearchandtheuseful/usablesharingofcodeanddata
• Recognizethatdatahasbothvalueandcost.Howshouldthecostsbecovered?
41
“Itain’t over‘tilit’sover”
42
“Itain’t over‘tilit’sover”“Threestrikesandyou’reout”
Thanksforinvitingme!
http://lazowska.cs.washington.edu/Cyberinfrastructure.pptx,pdf 43