Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | elvin-york |
View: | 215 times |
Download: | 3 times |
http://esd.lbl.gov/BWC/
Deb Agarwal (UCB and LBNL)Catharine van Ingen (MSFT)
Berkeley Water CenterMicrosoft TCI
IndoFlux Meeting, Chennai, India, July 13, 2006
Designing CyberInfrastructure to Support End Science
Project Motivation
Data is now being gathered into common data archives
Data archives provide an opportunity for cross-discipline and cross-site investigations
Data analysis techniques which worked well on small data sets often do not scale
Current CS tools have evolved in support of other disciplines – Investigate their ability to facilitate data analysis
Distributed Data Sets
Building BWC Water Cyberinfrastructure to
Connect Data, Resources, and People
Science Portal
Data Harvesting and
TransformationsData Cleaning,
Models, Analysis Tools
ComputationalResources
Web Service Interface to Data and Tools
Data Providers:Host AmerifluxClimate DataStatsgo Soils DataMODIS products
Web-basedWorkbench access
Tools:StatisticalGraphical
LAITempFparVeg IndexSurf ReflNPP Albedo
Choose Ameriflux Area/Transect, Time Range, Data Type
Gap Fill, A technique
Gap Fill, B technique
Design Workflow
Statistical &graphical analysis
Canoak Model Site 9
Data harvest Sites 1-16
Canoak Model Site 1
Version control
Network display LAI
Statistical & Graphical analysis
Data Cleaning Tools
Data Mining and
Analysis Tools
Modeling Tools
Visualization Tools
Ecology Toolbox
Compute Resources
Carbon Community Workbench
ClimateStatsgoMODIS
Import other Datasets
Knowledge Generation Tools
Approach
Work closely with the end scientists to define, prototype, and test the system
Provide a solution that leverages both server-based and local desktop/laptop environments
Leverage commercial tools to the extent possible
Some Critical Capabilities
Support for versioning of data sets Work with multiple data sets Advanced data selection and plotting
capabilities Select data relative to an event Simple calculation across any specified date
range Statistical information available Plots - scatter, diurnal, time series, probability
density function, tiled, correlation Ability to access capabilities from desktop
Data Pipeline
ORNL AmerifluxSite
CSV Files
BWC SQL Server Database
Data Cube
Excel Pivot Table and Chart
Data Cleaning and Versioning
BWC SQL Server Database
Excel spreadsheet of current data
Investigator updated spreadsheet
Analysis Services Data Cube An organized view of the data A multi-dimensional view into the data Can integrate multiple data sources Define measures and dimensions
Measure – a value you want to be able to plotDimension – An axis you want to be able to
use to select data and as axis Calculations – define new measures
Precipitation trends and totals
Summer precipitation:Tonzi and Vaira ~ 2% of totalMetolius ~ 24% of totalWalker Branch ~ 40% of total
Precipitation Trends for 2004
0
50
100
150
200
250
300
1 3 5 7 9 11
Month
Pre
cip
itat
ion
(m
m)
Tonzi
Vaira
Metolius
Walker
*Plot created by Gretchen Miller of UC Berkeley
Temperature at North American Sites
-10
0
10
20
30
20 30 40 50 60 70 80
Latitude
Ave
rag
e T
emp
mer
atu
re in
oC
`
Other applications
*Plot created by Gretchen Miller of UC Berkeley
Temperature at North American Sites
-30
-20
-10
0
10
20
30
Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec
Month
Ave
rag
e T
emp
mer
atu
re i
n oC
31.5 40.0
49.9 70.5
Observations by latitude
*Plot created by Gretchen Miller of UC Berkeley
Average NEE
-6
-5
-4
-3
-2
-1
0
1
2
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
NE
E (
mm
ol
m-2
s-1
)
Deciduous broadleafforest
Evergreen needleleafforest
Mixed forest
Observations by ecosystem type
*Plot created by Gretchen Miller of UC Berkeley
Some Lessons Learned so Far
Data naming and unit consistency is critical to easy ingest of large amounts of data
Commercial tools do not necessarily provide all the right analysis capabilities directly
Scaling capabilities of the tools not yet clear We will need tools to aid in notification of
PIs
Portal Deployment Behind the portal are a
collection of databases and data cubes
Distribution for ease of use Only see the data of interest Private data remains stable
Distribution for scaling Smaller queries on smaller
databases take less resources Larger databases and cubes
can be replicated across machines
Batch job like infrastructure for managing very long running queries
Acknowlegements Science Team
Dennis Baldocchi Bev Law Gretchen Miller
Cyberinfrastructure Matt Rodriguez Monte Goode
Microsoft Tony Hey Nolan Li
Oak Ridge National Lab CDIAC personnel Berkeley Water Center
Yoram Rubin Susan Hubbard
URLs and Connection Coordinates
Web Sitehttp://esd.lbl.gov/BWC
Bloghttp://dsd.lbl.gov/BWC/amfluxblog
http://esd.lbl.gov/BWC/