The NSF Cyberinfrastructure for the 21st Century Program
CIF21Rob PenningtonProgram DirectorOffice of CyberinfrastructureNational Science Foundation
1
The Shift Towards a “Sea of Data”
Implications All science is becoming data-dominated
Experiment, computation, theoryFourth paradigm
Classes of dataCollections, observations, experiments,
simulationsSoftwarePublications
Totally new methodologiesAlgorithms, mathematics, culture
Data become the medium forMultidisciplinarity, communication, publication…
science2
Fundamental questions become focused around data: How to remove boundaries? How to incentivize sharing?
How do we attribute credit for this new publication form? How are data peer reviewed? What is a publication in the modern data-rich world?
Scientific Data Challenges
3
Byt
es p
er d
ay
2012 2020
Genomics
LHC
TeraGrid, BlueWaters
SquareKilometer
Array
Genomics
LHC
Climate, Environment
LSST
ExaBytes
PetaBytes
TeraBytes
GigaBytes
Climate, Environment
Volume
Useful
Lifetime
Distribution
Data Access
Many smaller datasets…
DataNet
4
Sof
twar
e
Ana
lytic
Too
ls
Com
pute
,M
odel
ing
Com
mun
ities
Exp
ertis
e,
rese
arch
Net
wor
ks
Sea of Data
CIF21
Science, innovation, discovery, economic competitiveness
Grand Challenges
EarthCube, Understanding the Phenome, Clean Energy, Climate prediction, Social networking, Complex networks, Health records, cybersecurity, Matter-by-design, disaster recovery, etc
Multi-disciplinary & multi-scale integration
CIF21 and Transforming Research
DiscoveryCollaboration
Education
NSF CIF21 Major Areas
Organizations Universities, schools Government labs, agencies Research and Medical Centers Libraries, Museums Virtual Organizations Communities
Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and operations Cyberscience
Networking Campus, national, international networks Research and experimental networks End-to-end throughput Cybersecurity
Computational Resources Supercomputers Clouds, Grids, Clusters Visualization Compute services Data Centers
Data Databases, Data repositories Collections and Libraries Data Access; storage, navigation management, mining tools, curation, privacy
Scientific Instruments Large Facilities, MREFCs,,telescopes Colliders, shake Tables Sensor Arrays - Ocean, environment, weather, buildings, climate. etc
Software Applications, middleware Software development and supportCybersecurity: access, authorization, authentication
Advanced Computational Infrastructure
DataInfrastructure Program
Broad Principles to Lead CIF21 Builds national infrastructure for S&E Leverages common methods,
approaches, and applications – focus on interoperability
Catalyzes other CI investments across NSFProvides focus and is a vehicle for
coordinating efforts and programs Based upon a shared governance model
involving all parts of NSFManaged as a coherent program by OCI
Spiral development methodology 6
Evolution of CIF21 and NSF Data Programs
7
ACCI Task Force
NSB
DataNet Awards
Community Input
NSFCIF21 Data
Programs
On-going input
Science &Engineering Research
+ Cyberinfrastructure
Data Related Context National Science and Technology Council
(NSTC)http
://www.whitehouse.gov/blog/2012/01/30/your-comments-access-federally-funded-scientific-research-results
Networking and Information Technology Research and Development (NITRD)http://www.nitrd.gov/subcommittee/bigdata.aspx
National Science Board Data Policies Task Forcehttp://www.nsf.gov/nsb/committees/tskforce_dp.jsp
Advisory Committee for Cyberinfrastructure (ACCI)www.nsf.gov/od/oci/taskforces/
8
NSTC RFIs for Public Comment - Context
Two Requests for Information (RFIs) – Nov 2011Public Access to Digital Data Resulting from
Federally Funded Scientific Research• Preservation, Discovery and Access• Standards for Interoperability, Re-Use and Re-
PurposingRFI for Scholarly Publications http://www.whitehouse.gov/blog/2011/11/07/
request-information-public-access-digital-data-and-scientific-publications
Comment period closed on 12 Jan 2012Digital Data: 118 responsesScholarly Publications: 377 responses Individual and institutional responses
9
NSB Data Policy Task Force - Context
Dec 2011: NSB 11-79 Recommendations http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf
#1: Provide leadership … in the development and implementation of digital research data policies ...
#2: … require grantees to make both the data and the methods and techniques used in the creation and analysis of the data accessible … Data should be shared using persistent electronic identifiers …
#3: Continue to expand the support of computational and data-enabled science and engineering …
#4: Convene a panel .. to explore and develop a range of viable long-term business models…
#5: Further the expansion of sustainable data management, including preservation and curation of pre-existing and newly generated long-lived data …
10
NSF Advisory Committee for Cyberinfrastructure (ACCI)
Task Force - Context
GrandChallenges
CampusBridgingData and Viz
Cyberlearning
HPC
HIGH P ERFORMANCE COMPUTING
Software
Grand Challenges, HPC, Data/Viz, Software, Campus Bridging, Cyberlearning
More than 25 workshops and Birds of a Feather sessions and more than 1300 people involved
Final reports: http://www.nsf.gov/od/oci/taskforces/
11
ACCI Data Task Force Recommendations
Recognize data infrastructure and services as essential research assets fundamental to today’s science and as long-term investments in national prosperity
Create new citation models in which data and software tool providers are credited with their data contributions
Develop and publish realistic cost models to underpin institutional/national business plans for research repositories/data services
Identify and share best-practices for the critical areas of data management
12
CIF21 and Data Enabled Science Provide critical tools and services for
data mining, integration, analysis, modeling and visualization.
Overcome barriers to scaling, synthesis, and interoperability to promote effective use of large scale, shared data resources.
Strategic investments that concentrate tools, resources and expertise in support of compelling grand challenge science questions.
13
Data Infrastructure: A Multi-tiered and Multi-Disciplinary
Landscape
14
Observational Communities
Modeling and Simulation Communities
Population, Climate, Environment Communities
Data Content
Data Storage
Data-enabled Science
DataNet supported
CIF21: Data-Enabled Science Data-intensive Science Program
(knowledge) Intensive disciplinary efforts, multi-disciplinary
discovery and innovation Data Analysis and Tools Program
(information)Data mining, manipulation, modeling,
visualization, decision-making systems Data Services Program (data)
Provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline
15Dumped On by Data: Scientists Say a Deluge Is Drowning Research
Data Curation Sustainable, community-based networks for
management of critical scientific data resources in a life-cycle context.
Overcome challenges of culture change, policy development and implementation, sustainable operations, quality and usability control.
Strategic awards that address heterogeneity in formats, complexity, semantics of data collections that are valued by science communities of significant breadth.
Operate as a network of data services that promote interoperability, multidisciplinarity, and scalability. 16
Data Storage National storage infrastructure for scientific
dataAccommodate scale and heterogeneity through
robust, open, and broadly accepted standardsBusiness model implemented with
governmental, academic, non profit, and commercial stakeholders
Make strategic investments that:Leverage existing resources in XSEDE,
commercial clouds, federal data centersMeet growing capacity needs at optimum costProvide coordinating and integrative functions
for integrity, access control, availability, persistence
Catalyze a national data infrastructure
17
Cross Cutting Challenges Balancing Research into Next Generation
infrastructure with operation & maintenance of current capacity
Sustainability through technical design, development of business models, and integration with the research cycle
IntegrationVertical – Linking low-level bit storage
infrastructure to data collections, and to applications
Horizontal– Achieving connectivity and interoperability between activities that vary in scale, disciplinarity, and funding source
18
Summary CIF21 is focused on effective ways to
approach and respond to the challengesCritical concepts and goalsRealistic and innovativeSpiral process with strong, on-going
feedback
Structure for longevity Scalable open inclusive governanceLong term business models International collaborations and programs
19