www.geongrid.org
GEOSCIENCE NEEDS & CHALLENGES
Dogan SeberSan Diego Supercomputer Center
University of California, San Diego, USA
www.geongrid.org
Earth science research is moving towards a “systems approach”. To understand the Earth we need to look at it as a whole. Scientists have expertise in specific areas in their sub-disciplines and knowledge about sister disciplines is limited.
Can cyberinfrastructure help?
www.geongrid.org
Some common IT problems in the Geosciences
• Exponential increase in data volumes• Diversity and complexity of data sets• Data storage, access and preservation• Data integration (semantic and syntactic)• Computational challenges and access to HPC • Advance visualization (3D/4D)• Archiving publications with reusable
components
www.geongrid.org
A Scientific Effort Vector
Background Background ResearchResearch
Data Collection and Data Collection and Compilation/ Compilation/ Software issuesSoftware issues
ScienceScience
Back-Back- groundground
ResearchResearch
Data Collection Data Collection and and
Compilation/Compilation/Software IssuesSoftware Issues
ScienceScience
ScienceScience - Analysis, Modeling, Interpretation, Discovery
Source: R. Keller
www.geongrid.org
Enabling Scientific Discoveries:Pathway to Discovery
Access Process Analyze Interpret Discovery
Data Knowledge
www.geongrid.org
Large/Complex Data Volumes
• National/International Observatories/projects EarthScope
ES is a US project to collect data across the entire US over the next 10 years. Includes seismic, GPS and drill hole data
LiDAR dataAirborne and ground based data collection (large volumes of data sets)
Global Observations A variety of satellites gathering data at different resolutions
Hydrology, Environmental, Natural resource development projects, etc.
• Small projectsIndividual researchers maintain a lot of data sets, such as geology maps,
geochemistry databases, earthquake catalogues, etc. Collectively reusable data reach large volumes and complex dimensions
Challenge: How to manage these data so that vast amounts of datacan be used by all scientists in an easy-to-use environment
www.geongrid.org
Data Storage, Access and Preservation
• Preservation of digital and legacy data sets• Since research needs and styles of each
scientist vary, each researcher has his/her own data with their own “flavors”
• Access to other scientist’s data is limited• When scientists do not continue to maintain
their data, it is lost forever!
Challenge: How to build a framework to exchange data and helppreserving collected data sets
www.geongrid.org
Data Integration Issues
Integration requires both syntactic and semantic level integration. e.g., How can a geologist merge multiple geology maps to make a seamless (“integrated”) map that overlaps with national and international boundaries.
www.geongrid.org
Integrate Geologic Data From Multiple Sources
What is available is multiple distinct data sets
www.geongrid.org
Integration Across DisciplinesIntegration Across Disciplines
Earthquakes
Aquifers
Tectonics
Moho depth
Geology
Faults
Magnetics
Mines
Topography
Focal Mechanisms
Sediment thickness
Gravity
www.geongrid.org
Computational Challenges in Geosciences
• Developing/Accessing community codes• Parallelizing software for efficient runs• Accessing small to very large clusters • Technical expertise to use high-end
systems/clusters
Challenges: How to build a system that helps scientists run advance software without having access to significant resources (computers and technical)
How to build a system that helps scientists to focus on science rather than technological challenges/problems
www.geongrid.org
(Goldstein 2001)
Example:
Can we build a system that not only a few privileged, but also the entire community could use to run 3D seismic modeling?
www.geongrid.org
Geosciences are Visualization Oriented
• Once large volume data sets are accessed, how can we visualize them to get a better understanding of each data set?
• To build an effective visualization environment powerful software and hardware needed.
Challenge: How to build a visualization system that helps scientists analyze large and complex data sets dynamically.
www.geongrid.org
Archiving results and publications with reusable components
• Science progresses incrementally. New knowledge is built on top of existing knowledge.
• Scientific validity is shown by repeatability.
Challenges: How to preserve scientific results and help others to repeatthe analysis as efficiently as possible?
How to share algorithms and processing flows with others?
www.geongrid.org
Efforts underway…
• Numerous projects are funded to address these questions• E.g., GEON, SCEC ITR, CUAHSI, EarthChem• NSF funding opportunities in GEO and CISE directorates
• Professional societies getting involved in CI• GSA Geoinformatics Division• AGU Earth and Space informatics focus group
• Extensive level of outreach and learning activities taking place
www.geongrid.org
Lessons Learned 1/2• Building cyberinfrastructure resources is a
“social experimentation”• Equal partnerships between domain and IT is a
must• Understand the needs of the domain sciences• Community outreach is critical (workshops,
seminars, scientific meetings, etc)• Get it right the first time!• Define the goals clearly, and publicize them• Learn to differentiate “a system that works” and
“a system that is usable”
www.geongrid.org
• Work with those who are willing and interested • Identify “killer apps”, use them to attract more
interest• Teach! Help building a community of users and
resource builders• Problems are similar. Work with other
communities, solutions may be out there
Lessons Learned 2/2