Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 230 times |
Download: | 1 times |
Biological OceanographyScientific Domain
Ed DeLong MITDepartment of Biological Engineering
Department of Civil and Environmental Engineering
DataSpace
1
• Coupling of physical & biological oceanographic processes
• Comparative ecosystem analysis
• Biodiversity, biomass and productivity
• C-N-P cycling and energy flow
• Production, consumption of greenhouse gases: climate
• Measurement, modeling and experiments with microbial communities in the sea
• Education, training and knowledge exchange
BIOLOGICAL OCEANOGRAPHY
Microbial and sampling scales, based on Dickey (1991) and Allen (2000): Ricardo Letelier
Oceanagraphic sampling approaches in the context of scales
Scope & Scale : Challenges in Biological Oceanography(Genomes to Biomes…)
ADVANCEDINSTRUMENTATION
Continuous, autonomous collection of 4D physical , chemical and
bio-optical datasets
• 2 Eddies• 1 frontal system• Sub-mesocale features?
• Higher Chla bellow cyclone• DCM constant• Patchy distribution of small particles• Advection/local production of small particles in the Ze
Further specialization: Marine Metagenomics
• Traditional microbiology and microbial genome sequencing studies rely on cultivated cultures
• Marine metagenomics: DNA sequences of microbial assemblages from the environment
• Metagenomic data is used by scientists across multiple disciplines, e.g., • Biological engineering & biotechnology
• Genomics and computation biology
• Ecology and environmental science
• Climate: relationship between marine microbes & the ocean’s carbon cycle, productivity, greenhouse gases
6
2ND Gen Sequencing Platforms
Cost per run ~$50 <$12K <$5K
Bases read/run
72 Kbp 100 Mbp
500 Mbp
>2 Gbp
> 200 Gbp !!!
Bases per read
750 250
450
>36 (> 100 +
Paired end reads)
Reads per run
96 reads/run 400K reads/run 20M reads/run
$ per Mbp $ 694 $ 120 $ 7
AB3730 work equivalent
- 100x AB3730/dy 300x AB3730/dy
Errors Diverse
(cloning bias)
Homopolymeric runs
Diverse
(base subn.)
Run time 1 hour 6.5 hours 2-14 days*
AB3730 454 FLX/titan. ILLUMINA
Biological Oceanography Data Challenges
• Wide variety and heterogeneity of data types•Oceanographic cruise data•Oceanographic time series data•Laboratory & field experiments•Remote sensing datasets•Data from gliders, AUVs & moorings•Genomics, metagenomics, gene expression data•Numerical simulations & synthesis products
• Distributed data (multi-institution & researchers)
• Need to balance PI, project & public data accessibility
• Data visualization & analysis needs
• Long term archiving requirements
DataSpace partners: MIT-OSU
Oceanographic Science PartnersEd DeLong (MIT) & Ricardo Letelier (OSU)
Library IT PartnersMacKenzie Smith (MIT) & Terry Reese (OSU)
DeLong and Letelier Co-PIs on three major projects:
Center for Microbial Oceanography: Research and Education (C-MORE)
Microbial Oceanography of Oxygen Minimum Zones (MOOMZ)
Microbial diversity and activity in seasonal hypoxic waters (MI-LOCO)
Existing Data Portal
Currently a distributed approach. Consists of weblinks to individually managed heterogeneous datasets.
http://cmore.soest.hawaii.edu/data.htm
Biological and Chemical Oceanography Data Management Office database
http://osprey.bcodmo.org/index.cfm
BCO-DMOWhere is the data now ? (Oceanographic data)
Public Databases: NCBI and CAMERANational Center for Biotechnology Information
Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis
http://camera.calit2.net/
http://www.ncbi.nlm.nih.gov/
Where is the data now? (Genomic/metagenomic)
In-house Databases …
Why do biological oceanographers need DataSpace ?
•Data access, storage, search not centralized
•Large heterogeneous datasets
•Complex data management/sharing requirements
•Shared multiple Institutions & Investigator
•Long term requirements (2017)
•Need cross-investigator,institution,project search
•Currently lots of data is “lost”, e.g. not utilizable
Why do biological oceanographers need DataSpace ?
How many autonomous surveys, cruises, mooring datasets, hydrocasts, deckboard experiments had chlorophyll concentrations than X ?
Of those data, how many had light levels and oxygen concentrations corresponding Y and Z ?
Of those data, how many have corresponding microbial community taxonomic composition and gene content data ? (retrieve)
What is the relationship between light, chlorophyll, oxygen and microbial community taxonomic composition and gene content, across all datasets ? How do taxa and gene content relate to oxygen levels and the balance of production and consumption ? Greenhouse (GHG) gas levels ?
Are there specific gene proxies that predict oxygen or GHG levels ?Note: centralized data access, search and storage will also drive the way we (sceintists) ask our questions, collect, and annotate our data. = A collaboration between scientists, IT, curators and database managers.
The DataSpace Project & Biological Oceanography
• Provide infrastructure for digital archiving & preservation at appropriate scales matching scope/complexity of data
• Enable more integrated intra- & inter-project collaborations, analyses, data encoding, documentation, sharing, visualizing, and preservation
• Establish standards & best practices to capture, express, encode and publish the policies related to archived data
• Enable new discoveries by facilitating access, search storage of large, complex heterogeneous datasets