+ All Categories
Home > Documents > The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and...

The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and...

Date post: 18-Dec-2015
Category:
View: 214 times
Download: 1 times
Share this document with a friend
Popular Tags:
25
The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and Evolutionary Studies (IBEST) Biological Sciences, Bioinformatics and Computational Biology University of Idaho
Transcript

The data flood: We need a bigger boat

James A. Foster

The Initiative for Bioinformatics and Evolutionary Studies

(IBEST)

Biological Sciences, Bioinformatics and Computational

Biology

University of Idaho

JAF INBRE Data Flood 8/4/09

Outline

✦Where is this flood of data coming from?✦What kind of tool is appropriate for this

amount of data?✦What kind of a tool is “bioinformatics”?✦How about an example?

JAF INBRE Data Flood 8/4/09

DNA sequencing data flood

Year bp/day

1977 7.35

1986 50-ish

1995 19,000

1998 400,000

2008 1,600,000,000

2009 3,200,000,000

2012 ??

ABI 3700

454

454/FLX

???

Technology

ABI 370

ABI 377

Gels

JAF INBRE Data Flood 8/4/09

The data flood: DNA example

Year bp/day Notes

1977 7.35 Manual: φx174

1986 50-ish Gel: ABI 370

1995 19,000 Gel: ABI 377

1998 400,000 Cap: ABI 3700

2008 1600000000

454

2009 3200000000

454/FLX

2012 ?? ??

Water

1L

Barrel (176 gallons)

Big pool (2x6x12m)

football field, 20m deep

Lakes Michigan/Huron

all Great Lakes (nearly)

ocean?

JAF INBRE Data Flood 8/4/09

Bioinformatics tools

Year Data volume

1977 1L

1986 barrel

1995 big pool

1998 football field

2008 Lake Michigan

2009 Great Lakes

2012 ocean?

Technology

hose

pfd

Kayak

Orca?

bigger boat?

Glomar?

spoon

JAF INBRE Data Flood 8/4/09

Bioinformatics: bigger boat?

Your thesis

Data

The Computer(bioinformatics)

Hypo

You

Your hypothesis

JAF INBRE Data Flood 8/4/09

Reflection on the metaphor

✦ At some point, you can use fundamentally different techniques: spoons versus boats

✦ At some point, you can test fundamentally new hypotheses: not “we need a smaller shark”

✦ Sometimes the old technology is still good: the kayak was appropriate in this picture

✦ The new technology may be for a different purpose: fishing versus deep sea exploration

JAF INBRE Data Flood 8/4/09

Technology quiz!

JAF INBRE Data Flood 8/4/09

What does this do?

JAF INBRE Data Flood 8/4/09

What does this do?

Not that!

THIS!

A Bigger BoatWhatever you tell it to do!

JAF INBRE Data Flood 8/4/09

What is Bioinformatics?

Bioinformatics is what you tell the computer to do with your data

JAF INBRE Data Flood 8/4/09

Of Boats and Bioinformatics

Bioinformatics is what you do with the boat you are in during the data flood

You might be able to do more with a bigger boat

JAF INBRE Data Flood 8/4/09

Sampling emergent diversity

✦Get ALL DNA along a age-variant transect• 10 samples per site• time since exposure:

5y, 19y, 40y, 63y, 100y, and 150y

• “chronoclines” sample ecosystems by age

✦Who’s there?✦How does ecosystem

change over time?

JAF INBRE Data Flood 8/4/09

Bioinformatics problems

✦Estimate α diversity: number of “species” in each sample and age group

✦Estimate β diversity: amount of variation in “species” between age groups

✦Determine which species (no quotes) are present in each sample (not part of this talk)

Biological questions: How do soil bacterial respond to retreating glaciers? How do microbial soil communities change?

JAF INBRE Data Flood 8/4/09

Lots of data (post QC)

Age Samples Sequences DNA Mbp

5y 9 35,092 8.77

19y 10 41,494 10.37

40y 8 33,665 8.42

63y 9 41,767 10.44

100y 8 41,178 10.29

150y 8 40,210 10.05

Total 52 233,406 58.35

Note: A SMALL run, max is 37GB/8hr run max, 1.6 Bbp/day

JAF INBRE Data Flood 8/4/09

Bioinformatics objectives

determine species

cluster by species

cluster by age

Explain data in terms of biological processes and age (tell a story)

Too much data: 233K sequences!

JAF INBRE Data Flood 8/4/09

Trick: Turn it upside down

Cluster each of 52 samples (approx. 6k each), choose a proxy sequence

Cluster proxies by age (approx. 40k each)

Cluster combined sequences to get species (quantify richness)

Build +/- matrix

++ + ++

++ - ++

+ - -

+ - +

- +++ +

JAF INBRE Data Flood 8/4/09

Bioinformatics challenges

✦Move data between computers (IGS, laptop, IBEST Core)

✦File the data in a retrievable way✦Associate metadata with data✦Cluster sequences within/between samples✦Associate clusters with species✦Compute diversity statistics✦Prepare publications and talks✦(much more)

JAF INBRE Data Flood 8/4/09

Conclusions

✦Biology• There are thousands of species of bacteria in

arctic soil• Number of bacterial species increases as time of

post-glacial exposure increase

✦Algorithmics (want a job?)• “Quantity has a quality all it’s own” (V.I.Lenin)• Need new algorithms to use new hardware• Database/dataset management is crucial

JAF INBRE Data Flood 8/4/09

Thanks!

Ursel Schüette

Zaid Abdo

Jacob Pierson

Larry Forney

Rob Lyon

The Forney-Top lab

John Bunge, Cornell

The Relational Database project, MSU

to INBRE for the excuse

to IBEST for the science

to NIH, NSF, and UI for the money (P20RR16448, P20RR016454, EPS080935)

JAF INBRE Data Flood 8/4/09

Discussion?

JAF INBRE Data Flood 8/4/09

Extra stuff

Intentionally blank

JAF INBRE Data Flood 8/4/09

Roche 454: a genome a day

JAF INBRE Data Flood 8/4/09

Metagenomics

✦Harvest approximately first 300bp of every 16s rRNA molecule, all samples• Ribosome: required to

translate DNA (conserved)

• Common marker for microbial species

✦Cluster by evolutionary relationships (“species”)

✦Analyze by chronocline

JAF INBRE Data Flood 8/4/09

Future work: same tune, new lyrics

✦Data from human microbiomeHow do microbial communities vary between healthy and sick people?

✦Data from polluted soil (Yangtzee river, PRC)How do microbial communities vary as pollution increases?

✦Data from longitudinal transectsHow does microbial diversity change with latitude?


Recommended