Download - An Overview of Intersect’s GDM System

An Overview of Intersect’s GDM System

Joe Thurbon Intersect

Prelude

The Brief• The research activities and communities it supports

• The data management and analysis issues it seeks to resolve

• The types and volumes of data/metadata it will manage

• The philosophy behind its design (e.g. why use semantic web technologies, if you are)

Supported Communities

Issues Addressed

Issues Addressed

Data / Metadata

• We keep– All the short reads– All the bioanalyser output– All the trimmed reads– All the tertiary analysis output ‘that works’– All parameters used to generate all of the above• Chemistry versions• Command line parameters• Species, Locations, etc

Data CountsPlatform / Metric Tissue Samples

Per YearChunks of Data Per Year

Fields of Metadata Per Year

Illumina with standard multiplexing

~2050 > 12000 ~ 100000

454 with standard multiplexing

~3500 > 14000 ~ 120000

Data SizesPlatform / Metric Untrimmed / Year Trimmed / year

Illumina with standard multiplexing

1.2TB 300GB

454 with standard multiplexing

~2.5TB (?) 600GB

Philosophy

• Anthropology• Epistemology• History and Philosophy of Science• Bio-informatics

• GDM

What do people do?

What to Researchers do?

What do Scientists do?

What do Bio-Informaticians do?

What does GDM Do?

• Allows bio-informaticians to– Stand on the shoulders of giants (including

themselves)– Record their observations– Record their experimental parameters– Manage their data

• Iteratively

Demo

A results corresponds to a single experiment

• What experiment did I do? (steps)• What parameters did I set? (parameters)• What observations did I make? (outputs)• When did I end up with (data)• What did I start from? (parents)

Metadata + Data + Parents = One Results

GDM Repository

SCU Ramaciotti

ANU

VELVET

Tcoffeee

VELVET

AC3BLAST

BLASTBLASTVELVET

NCBI EMBLDDBJ