+ All Categories
Home > Documents > A Plea for Greater Attention to Data-Intensive Discovery...

A Plea for Greater Attention to Data-Intensive Discovery...

Date post: 17-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
43
A Plea for Greater Attention to Data-Intensive Discovery, Greater Investment in Intellectual and Software Infrastructure, and Greater Use of the Commercial Cloud Remarks to the CSTB Colloquium on Future Cyberinfrastructure for Scientific Discovery Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering and Founding Director of the eScience Institute University of Washington September 2016 http://lazowska.cs.washington.edu/Cyberinfrastructure.pptx,pdf
Transcript
Page 1: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

APleaforGreaterAttentiontoData-IntensiveDiscovery,GreaterInvestmentinIntellectualandSoftwareInfrastructure,and

GreaterUseoftheCommercialCloud

RemarkstotheCSTBColloquiumonFutureCyberinfrastructureforScientificDiscovery

EdLazowskaBill&MelindaGatesChairin

ComputerScience&Engineeringand

FoundingDirectoroftheeScienceInstitute

UniversityofWashington

September2016http://lazowska.cs.washington.edu/Cyberinfrastructure.pptx,pdf

Page 2: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

“It’sdéjàvualloveragain”

2

Page 3: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Thoughts on “The Future of Advanced Cyberinfrastructure for Science and Engineering Research and Education”

Ed LazowskaBill & Melinda Gates Chair in

Computer Science & EngineeringFounding Director, eScience InstituteUniversity of Washington

National Science Board

September 2013

http://lazowska.cs.washington.edu/NSB.pdf 3

Page 4: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

This morning …

❚ Why must America remain the world leader in computer science?

❚ How did we gain the lead, and how can we retain it?❚ How should our competitiveness be defined?❚ The coming decade: Dramatic improvements in technology

and algorithms enable “smart everything”❚ Cyberinfrastructure to support 21st century “smart

discovery”❙ Implications for academia❙ Implications for research policy❙ Implications for K-12 education

4

Page 5: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

APleaforGreaterAttentiontoData-IntensiveDiscovery,GreaterInvestmentinIntellectualandSoftwareInfrastructure,and

GreaterUseoftheCommercialCloud

RemarkstotheCSTBCommitteeonFutureDirectionsforNSFAdvancedComputingInfrastructuretoSupportUSSciencein2017-2020

EdLazowskaBill&MelindaGatesChairin

ComputerScience&Engineeringand

FoundingDirectoroftheeScienceInstitute

UniversityofWashington

December2014http://lazowska.cs.washington.edu/CSTB.pptx,pdf 5

Page 6: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Thismorning

• Data-intensivediscovery• TheUniversityofWashingtoneScienceInstitute• Implicationsforacademia• Implicationsforresearchpolicy• Thecommercialcloud• Somepossibleactions

[PartiallyanadaptationofmaterialpresentedtotheNationalScienceBoardinOctober2013]

6

Page 7: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

“Thereyougoagain”

7

Page 8: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

“Itain’t over‘tilit’sover”

8

Page 9: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Relevantbiographicalinformation

• A.B.,1972,BrownUniv.,independentconcentrationin“Non-NumericalComputerScience”;M.Sc.,1974,Ph.D.,1977,Univ.ofToronto,inComputerScience

• Univ.ofWashingtonfacultymembersincethattime• Relevantnationalroles

– ChairofNSFCISEAC(1998-99),DARPAISAT(2005-06)– Co-Chair(withMarcBenioff)ofthe(late)PITAC,2003-05– Co-Chair(withDavidE.Shaw)ofthePCASTWorkingGrouptoreviewthe

FederalNITRDProgram,2010– MemberofCSTB(1996-2002),DoEPacificNorthwestNationalLaboratory

Fundamental&ComputationalSciencesDirectorateAC(2009-15),NASAACInformationTechnologyInfrastructureCommittee(2012-13)

• FoundingDirector,Univ.ofWashingtoneScienceInstitute,2008

9

Page 10: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Thismorning

• Data-intensivediscovery• TheUniversityofWashingtoneScienceInstitute• Implicationsforacademia• Implicationsforresearchpolicy• Thecommercialcloud• Somepossibleactions

10

Page 11: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Exponentialimprovementsintechnologyandalgorithmsareenablingarevolutionindiscovery

• Aproliferationofsensors• Evermorepowerfulmodelsproducingdatathatmustbeanalyzed• Thecreationofalmostallinformationindigitalform• Dramaticcostreductionsinstorage• Dramaticincreasesinnetworkbandwidth• Dramaticcostreductionsandscalabilityimprovementsin

computation• Dramaticalgorithmicbreakthroughsinareassuchasmachine

learning

11

Page 12: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Nearlyeveryfieldofdiscoveryistransitioningfrom“datapoor”to“datarich”

Astronomy:LSSTPhysics:LHC

Oceanography:OOI

Sociology:TheWeb

Biology:Sequencing

Economics:POSterminals

Neuroscience:EEG,fMRI

12

Page 13: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

TheFourthParadigm

1. Empirical+experimental2. Theoretical3. Computational4. Data-Intensive

JimGray

Eachaugments, vs.supplants,itspredecessors– “anotherarrowinthequiver”

13

Page 14: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

“Fromdatatoknowledgetoaction”

• Theabilitytoextractknowledgefromlarge,heterogeneous,noisydatasets– tomove“fromdatatoknowledgetoaction”– liesattheheartof21stcenturydiscovery

• Toremainattheforefront,researchersinallfieldswillneedaccesstostate-of-the-artdatasciencemethodologiesandtools

• Thesemethodologiesandtoolswillneedtoadvancerapidly,drivenbytherequirementsofdiscovery

• Datascienceisdrivenmorebyintellectualinfrastructure(humancapital)andsoftwareinfrastructure(sharedtoolsandservices–digitalcapital)thanbyhardware

• Datascienceisinextricablylinkedtothecommercialcloud:cost-effectivescalablecomputingandstorageforeveryone

14

Page 15: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Mypersonalstory,andthestoryoftheUWeScienceInstitute

Early1980s

Late1990s

15

Page 16: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Credit:JohnDelaney,UniversityofWashington 16

Page 17: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

2004MarkEmmert

EdLazowska,ComputerScience&Engineering TomDaniel,Biology WernerStuetzle,Statistics

“WhenIwasatLSUIporked measupercomputercenter.IwasthinkingI’ddothathere.”

17

Page 18: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

UWeScienceInstitute

• “Allacrossourcampus,theprocessofdiscoverywillincreasinglyrelyonresearchers’abilitytoextractknowledgefromvastamountsofdata...Inordertoremainattheforefront,UWmustbealeaderinadvancingthesetechniquesandtechnologies,andinmaking[them]accessibletoresearchersinthebroadestimaginablerangeoffields.”

18

Page 19: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Thiswasnotasobvious~2006asitistoday

• ButweaskedUW’sleadingfaculty– acrossallagesandfields,andregardlessof“label”– andtheyconfirmedthisviewofthefuture– Fromitsinception,thisefforthas

beenbottom-up,needs-based,grass-roots,drivenbythescientists

• Therewasvociferousnationalknuckle-dragginguntilseveralyearsafterthe2010PCASTreport

• Low-levelUniversityofWashingtonknuckle-draggingcontinuestothisday

19

Page 20: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

• UniversityofWashington– $720,000/yearforstaffsupport– $750,000/yearforfacultysupport

• NationalScienceFoundation– $2.8millionover5yearsforgraduateprogram

developmentandPh.D.studentfunding(IGERT)

• GordonandBettyMooreFoundationandAlfredP.SloanFoundation– $37.8millionover5yearstoUW,Berkeley,NYU

• WashingtonResearchFoundation– $9.3millionover5yearsforfacultyrecruitingpackages,

postdocs• Also$7.1milliontotheclosely-alignedInstitutefor

Neuroengineering 20

Page 21: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Over-archingobjective

WorkwithourBerkeley,NYU,andFoundationpartnerstocarryoutadistributedcollaborativeexperimentincreatinguniversityenvironmentsinwhichdata-intensivediscoveryflourishes

21

Page 22: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Originalcorefacultyteam

EdLazowskaCSE

Datasciencemethodology

Biologicalsciences

Environmentalsciences

Socialsciences

Physicalsciences

CeciliaAragonHumanCenteredDesign&Engr.

MagdaBalazinskaComputerScience&Engineering

CarlosGuestrinCSE

BillHoweiSchool

RandyLeVequeAppliedMathematics

WernerStuetzleStatistics

TomDanielBiology

GingerArmbrustOceanography

AndyConnollyAstronomy

JohnVidaleEarth&SpaceSciences

JoshBlumenstockiSchool

MarkEllisGeography

TylerMcCormickSociology,Statistics,CSSS

Thom.RichardsonStatistics,CSSS

EmilyFoxStatistics

JeffHeerCSE

BillNobleGenomeSciences

DavidBeckChemicalEngr.

22

Page 23: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Originalcorefacultyteam

EdLazowskaCSE

Datasciencemethodology

Biologicalsciences

Environmentalsciences

Socialsciences

Physicalsciences

CeciliaAragonHumanCenteredDesign&Engr.

MagdaBalazinskaComputerScience&Engineering

CarlosGuestrinCSE

BillHoweiSchool

RandyLeVequeAppliedMathematics

WernerStuetzleStatistics

TomDanielBiology

GingerArmbrustOceanography

AndyConnollyAstronomy

JohnVidaleEarth&SpaceSciences

JoshBlumenstockiSchool

MarkEllisGeography

TylerMcCormickSociology,Statistics,CSSS

Thom.RichardsonStatistics,CSSS

EmilyFoxStatistics

JeffHeerCSE

BillNobleGenomeSciences

DavidBeckChemicalEngr.

23

Page 24: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

We’reatthedawnofarevolutionaryneweraofdiscoveryandoflearning

24

Page 25: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Implicationsforacademia

25

• ComputerScienceisafieldthatisuniqueinitssocietalandinstitutionalimpact

Page 26: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Implicationsforresearchpolicy

• NSFhasauniqueroleindrivingadvancesinComputerScience– ComputerSciencedoesnothaveanNIHoraDepartmentofEnergy

– NSFprovides82%ofFederalsupportforbasicresearchinComputerScienceinacademia• 53%ofFederalsupportforallresearchinComputerScienceinacademia

• Otherfieldsarebecominginformation fields,notjustcomputationalfields– TheintellectualapproachesofComputerScienceareasimportantto

advancesasiscyberinfrastructure

– Newapproacheswillenablenewdiscoveries– “Firstwedofaster…thenlaterwedodifferent/smarter/better”

26

Page 27: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

• Meetingevolvingcyberinfrastructureneedsrequiresresearch,notmerelyprocurement– ThisistrueforHPC…andfordata-intensivediscovery…andforcyber-

enabledadvancesineducationandassessment

• Meetingevolvingcyberinfrastructureneedsrequiresinvestmentinintellectual aswellasphysicalinfrastructure– Wehaveacrazyobsessionwithbuyingshinyobjects– thebiggerand

moreexpensive,thebetter!

27

Page 28: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

• Advancingdata-intensivediscoveryrequiresbroad-basedprogramsthatstrivetocreatea“virtuouscycle”– andthatdriveinstitutionalchange

28

Page 29: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

• Nationallyandinstitutionally,therearevariouspoliciesthatdistortbehavior– andthatshouldbechanged– Oneexample:Useofcommercialcloudresourcesisdiscouragedby

• Indirectcostonoutsourcedservices(andnot onequipmentpurchases)– Thisistotallynuts!

• MRIviewedasapotseparatefromDirectorates/Divisions• Institutionalsubsidies(power,cooling,space)

• We’reinvesting9:1inhardwareoversoftware1 – itoughttobethereverse!

1 AccordingtoEdSeidelwhenhewasatNSF 29

Page 30: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

• In1984,throughtheestablishmentoftheOfficeofScientificComputingandthelaunchoftheSupercomputerCentersProgram,NSFleadershipcatalyzedthewidespreadadoptionofnumericalcomputationalscience– Althoughthefocuswasfartoogreatonhardware,fartoosmallonsoftware

andoncomputerscienceresearch

• NSFshouldbeexercisingthesamesortofleadershipandcatalysisfordata-intensivediscovery,butitislargelywhiffing– IMPORTANT:Thisisnotasuggestionthat“nationalcenters”arecalledfor!

30

Page 31: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Thecommercialcloud

31

Page 32: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Wehaveadoggedresistancetoutilizingcommercialsoftware,services,andsystems

• Wepurchaseourown• Weoperateourown• Werollourown• Oftenwithamateurs• Why?

– Outmodedpolicies– Subsidies– Defenseofturf– Politics– Peoplewhosepaychecksdependonconvincing

youthatyourneedsaresospecialthatnocommercialofferingcouldpossiblybesuitable

– Failuretodohard-nosedcost-benefitanalyses

CanacommercialRDBMShostlarge-scalesciencedata?

32

Page 33: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

SomeAmazoncustomers

What’ssospecialaboutourrequirements,comparedtotheirs,thatcausesustodoggedlyadheretotheoldworld?

33

Page 34: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

1. Essentiallyinfinitecapacity2. Youpayforexactly whatyouuse:Instantaneousexpansionand

contraction3. Zero capitalcost:Theuseravoidsinvestmentininfrastructure

that’sredundant,under-utilized,andhasashortlifetime4. Burstcapacity:1,000processorsfor1daycoststhesame(or

less)as1processorfor1,000days– totallyrevolutionary!5. 7x24x365operationssupport6. Reliability:Auxiliarypower,redundantnetworkconnections,

geographicaldiversity7. Formanyservices,someoneelsehandlesbackup,someoneelse

handlessoftwareupdates

Keyattributesofthecommercialcloud

34

Page 35: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

8. Sharingandcollaborationareeasy9. Thisenhancesreproducibility– investigatorsusethesametools

anddata… exactlythesamecomputationalenvironment10. Itcontinuouslygetsbigger,faster,lessexpensive11. Capabilitiesevolveatarapidpace

35AWSStack Credit:JamieKinney,Amazon

Page 36: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

12. Configurationchoicesevolveatarapidpace

36

AWSInstanceTypeHistory Credit:JamieKinney,Amazon

Page 37: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

13. Purchasemodelsevolveatarapidpace

37

AWSPurchaseModels Credit:JamieKinney,Amazon

Page 38: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

14. Competitionisgrowingatarapidpace

– Includingcompetitionforacademicandcommercialscienceworkloadsanddatasets!

38

Page 39: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Credit:WernerVogels,Amazon

39

Page 40: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Somepossibleactions

• Eliminateoverheadonoutsourcedcloudservices– TheUniversityofWashingtonhasunilaterallydonethis!

• AttributeMRIstoDirectorates/Divisions• Takestepstoencourageandevolvedata-intensivediscoverythat

areatleastasaggressiveasthestepstakendecadesagotoencouragenumericalcomputationalscience

• Establishtheuseofcommercialcloudservicesasthestrongdefaultforscienceatallscales.Everyrequesttopurchasecomputingequipmentthatwon’tfitonadesktopshouldberigorouslyjustified.Investinintellectualinfrastructure,softwareinfrastructure,andoutsourcedservices,notbigshinyobjects!

40

Page 41: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

• Donotallow agroupwithoutarock-solidtrackrecordtoberesponsibleforthecreationofcomplexmission-criticalsoftwareinfrastructure(e.g.,forMREFCs)

• Majornationalfacilities– totheextentthatthesearenecessaryatall– shouldbeusedonlybyapplicationsthattrulyrequirethem

• Takeadditionalstepstoencouragereproducibleresearchandtheuseful/usablesharingofcodeanddata

• Recognizethatdatahasbothvalueandcost.Howshouldthecostsbecovered?

41

Page 42: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

“Itain’t over‘tilit’sover”

42

“Itain’t over‘tilit’sover”“Threestrikesandyou’reout”

Page 43: A Plea for Greater Attention to Data-Intensive Discovery ...lazowska.cs.washington.edu/Cyberinfrastructure.pdf · • National Science Foundation – $2.8 million over 5 years for

Thanksforinvitingme!

http://lazowska.cs.washington.edu/Cyberinfrastructure.pptx,pdf 43


Recommended