+ All Categories
Home > Documents > Data-Intensive Computing at NSF Corporate Alliance June 18, 2008 Jeannette M. Wing Assistant...

Data-Intensive Computing at NSF Corporate Alliance June 18, 2008 Jeannette M. Wing Assistant...

Date post: 29-Dec-2015
Category:
Upload: nicholas-welch
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
18
Data-Intensive Computing at NSF Corporate Alliance June 18, 2008 Jeannette M. Wing Assistant Director Computer and Information Science and Engineering Directorate Thanks to the NSF team: Dan Atkins, Debbie Crawford, Haym Hirsh, Jim French, Stephen Meacham, …
Transcript

Data-Intensive Computing at NSF

Corporate AllianceJune 18, 2008

Jeannette M. WingAssistant Director

Computer and Information Science and Engineering Directorate

Thanks to the NSF team: Dan Atkins, Debbie Crawford,Haym Hirsh, Jim French, Stephen Meacham, …

2Data-Intensive Computing Jeannette M. Wing

Science Story

3Data-Intensive Computing Jeannette M. Wing

How Much Data?

• NOAA has ~1 PB climate data (2007)• Wayback machine has ~2 PB (2006)• CERN’s LHC will generate 15 PB a year (2008)• HP building WalMart a 4PB data warehouse (2007)• Google processes 20 PB a day (2008)• “all words ever spoken by human beings” ~ 5 EB• Int’l Data Corp predicts 1.8 ZB of digital data by

2011640K ought to be enough for anybody.

4Data-Intensive Computing Jeannette M. Wing

Convergence in Trends

• Drowning in data

• Data-driven approach in computer science research– graphics, animation, language translation, search, …,

computational biology

• Cheap storage– Seagate Barracuda 1TB hard drive for $195

• Growth in huge data centers

• Open Source “MapReduce” programming model

5Data-Intensive Computing Jeannette M. Wing

“Work”

w1 w2 w3

r1 r2 r3

“Result”

“worker” “worker” “worker”

Partition

Combine

Master

Divide and Conquer

6Data-Intensive Computing Jeannette M. Wing

Data-Intensive ComputingSample Research Questions

Science– What are the fundamental capabilities and limitations of

this paradigm? – What new programming abstractions (including models,

languages, algorithms) can accentuate these fundamental capabilities?

– What are meaningful metrics of performance and QoS?Engineering

– How can we automatically manage the hardware and software of these systems at scale?

– How can we provide security and privacy for simultaneous mutually untrusted users, for both processing and data?

– How can we reduce these systems’ power consumption?Users

– What (new) applications can best exploit this computing paradigm?

7Data-Intensive Computing Jeannette M. Wing

NSF’s Interest in Data-Intensive Computing• Broad interest, (potentially) long-term

• CISE– Cross-directorate: CCF, CNS, IIS– Short-term: CluE

• To provide the broad academic community access to large-scale computing cluster and massive data sets

– Longer-term: Look for cross-cutting theme in FY09 solicitation

• NSF– Potentially cross-foundational, e.g., via Cyber-enabled

Discovery and Innovation (CDI); CISE, OCI, MPS, ENG, …– Why? Scientists are drowning in data!

8Data-Intensive Computing Jeannette M. Wing

CluE: Cluster Exploratory

• Google+IBM cluster software and services– Same as Academic Computing Cluster provided

for six universities (announced last October)

• Seed program by NSF– $5M will fund SGERs and regular awards– Solicitation released; July 17 proposal deadline.– Jim French (IIS Program Director)

• Hope: CluE will be a wild success and community interest and demand will be high

9Data-Intensive Computing Jeannette M. Wing

Google+IBM Cluster

• Cluster– 1600+ processors, terabytes of memory, hundreds

of terabytes of storage, internal networking– External network connection

• Software– Linux– Hadoop (written by Yahoo!): Open Source version of

Google’s MapReduce, Google File System– IBM Tivoli: management, monitoring and dynamic

resource provisioning of the cluster

• Services– Operations and maintenance, including staff, loading

data and programs, energy costs

10Data-Intensive Computing Jeannette M. Wing

Legal Issues

11Data-Intensive Computing Jeannette M. Wing

The Partnership: Roles

• Google and IBM– Provide data cluster, user support,

scheduling,

• NSF– Review proposals, identify awardees,

funding

• Universities– Propose and execute research plans on data

cluster

12Data-Intensive Computing Jeannette M. Wing

The MOU

• Codify the roles• Establish restrictions to comply with

export law• Prescribe the need for “usage

agreement”– Remove NSF from this industry/university

process and raise awareness of university sensitivities

13Data-Intensive Computing Jeannette M. Wing

The Usage Agreement

• Sets out terms and conditions for use of the hardware/software suite

• Three significant issues– Indemnification

• State universities prevented by constitution or law from signing

• Private universities will not sign as a matter of policy– Export control

• Barrier to university mission. May prohibit access by some students.

– Intellectual Property• Jury is out on this. Part of 1 on 1 negotiation.

14Data-Intensive Computing Jeannette M. Wing

Indemnification Example

• University and Corporation each agree to defend, indemnify and hold harmless the other respective parties for and against any losses damages or claims for damages arising from the wrongful acts or omissions of their respective officers, employees, students or agents (including, without limitation, University Students and University Personnel) in connection with the exercise of their rights and the performance of their obligations under this Agreement, including but not limited to …

Asymmetric: We agree not to sue each other but University pays costof defending Corporation should it be sued based on something a University person did.

15Data-Intensive Computing Jeannette M. Wing

Export Control Example

• Specifically, unless authorized by appropriate government license or regulations, you agree not to export, directly or indirectly, any technology, software or commodities provided by Corporation or their direct product (including software developed by you on the Corporate systems) to any of the following countries or to the nationals of any of the following countries, wherever they may be located: Cuba, Iran, Sudan, Syria, and North Korea.

Explicit Country List discriminates against students from those countries who may be enrolled in University.

17Data-Intensive Computing Jeannette M. Wing

Academia-Industry-Government Partnership

• Win-win-win for all

• New model for NSF– CISE is breaking new ground at NSF (in many

ways)

• NSF/CISE welcomes– Other corporations to participate in Data-Intensive

Computing effort and other efforts in the future– This and other new models of A-I-G partnerships

18Data-Intensive Computing Jeannette M. Wing

Thank you!

19Data-Intensive Computing Jeannette M. Wing

Credits

• Copyrighted material used under Fair Use. If you are the copyright holder and believe your material has been used unfairly, or if you have any suggestions, feedback, or support, please contact: [email protected]

• Except where otherwise indicated, permission is granted to copy, distribute, and/or modify all images in this document under the terms of the GNU Free Documentation license, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation license” (http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License)


Recommended