IBU
'A bioinformatic Problem Solving Environment in the e-BioLab'
VL-e Sub Program 1.5: Bioinformatics
Timo Breit
Micro-Array Department &
Integrative Bioinformatics Unit Faculty of Science,
University of Amsterdam
IBU
Dutc
h te
lesc
ienc
e
Data
inte
nsive
scie
nce
Med
ical
diag
nosis
Where in the Virtual Laboratory for e-Science?
Generic Virtual Laboratory e-science layer
Application Layer
Bio
info
rmat
ics
Bio
dive
rsity
Food
Info
rmat
ics
Grid Layer
‘BI-PSE’
BioInformaticsProblemSolvingEnvironment
IBUWhy in the VL-e?Data explosion in life sciences research.
RNA analysis by Northern blot: 1-15 genes
Analyzedgenes
A B C D E F G H I J K L M N O P Q R S T
Samples of cellular experiments
RNA analysis by micro-array: 1.000-40.000 genes
A B C D E F G H I J K L M N O P Q R S T
IBULife sciences research today:whole system –omics data.
Biology
Genomics
Transcriptomics
Proteomics
Metabolomics
Integrative biologyor Systems biology
ExperimentRNA
protein
metabolite
DNA
Biotechnology
Results
Bioinformatics
Data storage
Data handling
Data preprocessing
Data analysis
Data integration
Data interpretation
Biologist
Informatics
ICT infrastructure
IBUHow in VL-e?A bioinformatics problem solving environment (BI-PSE)
a.o.: security (AAA)ICT infrastructure
Life sciencesdomain
e-bioscience
Genericvirtuallaboratory
Grid-layer
a.o.: analysis methodsinformation management
semantic modelingadaptive inf. disclosure
a.o.: domain knowledgedomain information
domain data
a.o.: semantic modeling
Hypothesisgeneration In-silico
experiment
Decisionprocess
Experimentdesign
HypothesesWet-lab
experiment Enhancingknowledge
model
Results
X
ICE
ICE: Interactive & Creative Environment
RESULT: Rauwerda et al: The Promise of a virtual lab. Drug Discov Today. 2006 Mar;11(5-6):228-36.
ESE
ESE: Experiment Support Environment
DSE
DSE: Decision Support Environment
Problem solving environment
IBUParts of the BI-PSE we work on
VL-e
Biological use caseHuntington Disease
Biological use caseToxicogenomics
Gridcomputing
Resource ICE Resources
identification modelStaffPD Christian HenkelPD Ramin Monajemi
e-BioLabStaffSS Han RauwerdaSP vacancy
StaffPD Scott MarshallPD Tessa PronkSP Frans Verster
StaffPD Marco RoosAIO Lennart Post
IB-ICE IB-ESEIntegrative bioinformatics
knowledge model experiment designStaffMAD Martijs JonkerMAD Oskar Bruning
StaffPD Marcia ad IndaSP vacancy
M-A ESEMicroarray analyses
methods workflows
VL-e Use case SigWin VL-e Use case Histone
IBUBasic configuration of e-BioLabVL-e use case SigWin finder
Goal: A workflow to find significant windows in data related to a given sequence (of any type).
Motivation: Find sets of genes (windows) with increased overall gene expression (significance) in expression data ordered by gene location on the chromosomes (sequence).
gene
ex
pre
ss
ion
pro
file
IBUBasic configuration of e-BioLabSigWin: Significant Windows*
Márcia Alves de Inda, Dimitri, Frans Verster, Marco Roos
Given a data set we compute Sliding Window (SW) Medians for a given window size.
Using the SW Medians data we compute a False Discovery Rate (FDR) threshold.
Windows with values above the FDR threshold are called significant windows (or Windows Beyond the Threshold)
*R. Versteeg et al. Genome Res 2003 13: 1998-2004.
IBUBasic configuration of e-BioLabVLAM SigWin-finder workflow
1) Read sequence
2) Rank sequence
3) SW Medians
4) Sample to Frequency
5) SW Medians Prob
6) FDR Threshold
7) WinBeTs
8) GnuPlot
Modules
IBUBasic configuration of e-BioLabSigWins and periodic data
IBUBasic configuration of e-BioLabExample periodic data: Temperature in Amsterdam
IBUBasic configuration of e-BioLabIntegration genomic & transcriptomics data
IBUBasic configuration of e-BioLabIntegration genomic & transcriptomics data (zoom)
IBUBasic configuration of e-BioLabVL-e use case Histone code and semantic modeling
Lennart Post, Scott Marshall, Marco Roos
HypothesisA relationship exists between histone modification and
transcription factor binding sites
Histones
Histone modification
Transcriptionfactor binding site
Transcription factor
Transcription
IBUBasic configuration of e-BioLabDesign ‘myModel’: Protégé - OWL plug-in
http://protege.stanford.edu
IBUBasic configuration of e-BioLabData integration through semantic modeling
IBUBasic configuration of e-BioLabResult data integration via semantic modeling
L
L
UCSC genome browser snapshot
Result:Correlation between histone modification and
transcription factor binding sites
etc…
Overlap
IBU
BioinformaticsProblem Solving Environment
Domain interaction: Basic concept of an e-BioScience Laboratory (e-BioLab)
non formalized knowledge + ideas + intuition + discussion
BiologistsBiologists
e-BioScientist
Tools GridMethods Workflows
Basic model of problem area
e-BioOperator
Readily accessible data + models data mining
Easy visualization
Small integration experiments+ integration methods
Vague results
IBUBasic configuration of e-BioLab
a glass switch g video cameras m tables on wheels s printer tableb tiled display h desktop PCs n tablet PCs t file cupboardc barebones i plasma touch screen o lab chairs u headsetsd plasterboard wall j electronic whiteboard p desk chairse mobile tiled display k speakers q desk blocksf mobile barebones l office desk r A3 inkjet printer
k
q
q
bc
a
d
e
fg
g
k
h
l
l
m
m
j
n
n
i
r
t
o
ph
h
s
VL e-bio laboratory k
u
Basic set-up of the e-BioLab
IBUBasic configuration of e-BioLabAnticipated tiled display in e-BioLab
SO
M
P1 cluster 1
P2 cluster 1
P3 cluster 1
1
2
P2 cluster 2
P3 cluster 2
P1 cluster 2
3
P2 cluster 3
P3 cluster 3
P1 cluster 3
Hie
r.cl
ust
.
Vid
eo
re
mo
te c
olla
bo
ratio
n
Gene lists
Chrom.map 1
Chrom.map 2 Chrom.map 3
Remote whiteboard
Pathways displayed
IBUBasic configuration of e-BioLabAcknowledgements
Within SP1.5:Marco Roos Molecular biologist Han Rauwerda BioinformaticiaRoel van Driel Biochemist Christiaan Henkel Molecular BiologistLennart Post AIO (vDriel) Martijs Jonker BioinformaticianMarcia Alves de Inda Computational scientists Oskar Brunning BioinformaticianScott Marshall Informatician Tessa Pronk Molecular biologistFrans Verster Scientific programmer Ramin Monajemi InformaticianTimo Breit Molecular biologist
Within VL-eSP1.2; use ontologies in semantic modelingSP1.4; use case R on Grid, e-bioscienceSP2.2; AID; ontologies and semantic modelingSP2.4; information managementSP2.5; workflow methods and toolsSp3.3; e-BioLabSP4.1: VLEIT team
More information:www.micro-array.nl
Outside VL-eBioRange, NBIC; Dutch bioinformatics- Content driven data modeling (Kok-LUMC, Adriaans,-UvA etc…)- Test case systems biology (RUG, CMBI, TNO, UvA, etc…)- SigWin (vKampen-AMC etc…)- E-BioLab (vdVeer-VU, vd Vet-UT, Nikhef, SARA,etc…)BioAssist- Microarray workflow (many….)- Reannotatie (Leunissen-WU, Neerincx-WU etc…)
Vacancies @ IBU:Bioinformatician: micro-array data analysis (HBO/WO, 2 years)Scientific Programmers: building the e-BioLab
IBUWhere in the Virtual Laboratory for e-Science?
Dutc
h te
lesc
ienc
e
Data
inte
nsive
scie
nce
Med
ical
diag
nosis
Generic Virtual Laboratory e-science layer
Application LayerB
ioin
form
atic
sA
SP
Bio
dive
rsity
Food
Info
rmat
ics
Grid Layer
‘IB-PSE’
IntegrativeBioinformaticsProblemSolvingEnvironment
BioR
ange
&Bi
oAss
ist
IBU
SP1. Bioinformatics for Microarray Technology
1. Experimental design
2. Understanding biological processes
3. Genotype-phenotype analysis
4. Dissemination of bioinformatics tools and expertise, and education
SP2. Bioinformatics for Proteomics and Metabolomics
5. Preprocessing and identification tools
6. Analysis and modeling tools
7. Molecular interactions tools
SP3. Integrative Bioinformatics.
8. Structural genomics
9. Comparative genomics
10. Phenotype-genotype modelling
11. Pathway modelling and visualisation
12. Content driven data modelling
13. Content driven text mining
SP4. VL-E Informatics for Bioinformatics Applications.
14. Adaptive information disclosure
15. User interface and visualization
16. Collaborative information management
SP5. Test bed with “Real-Life Applications”.
17. Selection of bioinformatics applications, scaling approach, & real-life test applications
18. Dedicated scaling and validation approach
19. Integrated scaling and validation approach
Dissemination
Subprograms & research themes in national bioinformatics initiative BioRange.
Bio
info
rmat
ics
Info
rmat
ics
ICT
in
fras
truc
ture
IBUUse cases (user scenarios)
1. R on grid (IUC1.5.1) (finished)
• Creation of a web service that executes an R-script that invokes a LAM-MPI distributed calculation on the grid on a number of nodes that can be chosen by the user.
2. R in workflows (IUC1.5.4) (started)
• Proof of principle of a micro-array analysis workflow by invocation of web services. Requirements are visualization of intermediate results and enabling human interaction.
3. Re-annotation of micro-array libraries (IUC1.5.5) (started, with J. Leunissen WU)
• Re-annotation from sequence by invocation of remotely hosted web services in a workflow environment.
4. ‘SigWin’ (IUC1.5.3) => Significant Window Finder (proof of principle given)
• Generalization of method that finds ‘Regions of IncreaseD Gene Expression’ (RIDGEs) into workflow in VLAM environment that finds significant windows in sequences of values.
5. Histone Code case 1 (IUC1.5.2) (proof of principle given)
• Proof-of-concept data integration via semantic models
6. Scaling problems semantic data integration (RUC1.5.1) (Finished, lead to 2 new IUCs)
• Provide guidelines for the infrastructure to use for semantic data integration
IBUA view on bioinformatics research and IBU
Informaticsresearch
Appliedbioinformatics
Bioinformaticsresearch
Biologyresearch
Bio - - informaticsIBU
IBUOutline of presentation.
- Where are we in positioned in the VL-e project?
- Why do we need a Integrative Bioinformatics Problem Solving Environment?
- What do we want to do with a IB-PSE?
- How do we think to create a functional IB-PSE?
- Who are we?
- Where do we start?
- When do we think we will have a functional IB-PSE?
- Who are our collaborators?
IBUWhat do we want to do with a IB-PSE?Concept of integrative bioinformatics
Analysismethods
ICT infra-structure
Experiment design
VL-e
Integrative & computationalbioinformatics
experiment
Model
Visualization
Biologicalsolutions
Biologicalphenomena
Biologicalknowledge
Omicsdata
Data-driven
hypothesis
Problem-driven
hypothesis
biologicalproblem
Biological research domain e-bioscience core domain Enabling science domain
IBUComputational experimentation through advanced data integration.
17 1 195.9 96.75 142.49 71.95 245.36 150.33 309.75 219.68 2.024806 1.980403 1.632143 1.410005 1.31657317 1 297.89 140.18 135.29 72.31 299.44 208.34 316.12 163.49 2.125054 1.870972 1.437266 1.933574 1.18546918 1 258.88 133.89 198.39 99.32 269.61 152.15 600.04 501.95 1.933528 1.997483 1.772001 1.195418 1.32472418 1 343.7 182.82 185.06 93.88 223.53 131.69 381.29 256.01 1.879991 1.97124 1.697395 1.489356 1.20851319 1 420.56 246.45 242.37 117.64 313.9 198.39 362.91 209.43 1.706472 2.060269 1.582237 1.732846 1.13624319 1 356.92 203.84 239.09 121.24 230.15 134.61 379.83 219.32 1.750981 1.972039 1.709754 1.731853 1.08176820 1 917.96 550.93 744.69 312.29 715.53 381.94 1012.41 692.51 1.666201 2.38461 1.873409 1.461943 1.21450820 1 929.84 495.35 722.07 270.12 534.66 288.89 723.47 381.34 1.877137 2.673145 1.850739 1.897178 1.21408321 1 633.48 443.86 316.97 166.45 295.89 231 431.29 281.97 1.427207 1.904296 1.280909 1.52956 1.1853921 1 491.55 296.56 305.4 147.29 275.9 191.24 355.25 192.53 1.657506 2.073461 1.44269 1.845167 1.13477222 1 1695.87 800.25 2772.45 458.42 516.05 435.22 450.85 337.16 2.119175 6.047838 1.185722 1.337199 3.23712622 1 1670.69 501.76 3217.71 395.69 410.16 335.64 422.77 288.3 3.32966 8.131896 1.222024 1.466424 4.2632621 2 1394.88 757.09 908.91 549.26 940.94 542.6 681.48 651.54 1.842423 1.65479 1.734132 1.045953 1.2579521 2 2002.18 1155 863.68 509.28 926.69 507.4 801.16 817.44 1.733489 1.695884 1.82635 0.980084 1.2219682 2 316.65 157.76 182.51 90.46 316.21 195.7 351.78 218.51 2.007163 2.017577 1.615789 1.609903 1.2477132 2 694.9 442.9 210.75 106.14 282.96 166.7 364.46 286.64 1.568977 1.985585 1.697421 1.27149 1.1972613 2 197.62 95.71 163.28 67.84 241.07 126.59 326.33 142.05 2.064779 2.40684 1.904337 2.29729 1.0642593 2 508.98 303.32 176.03 75.66 240.2 122.34 292.99 137.14 1.67803 2.326593 1.963381 2.13643 0.9767824 2 223.91 124.2 169.5 83.46 341.53 207.71 653.56 594.13 1.802818 2.030913 1.644264 1.100029 1.3969844 2 600.34 311.24 191.92 92.05 239.36 139.3 327.7 173.77 1.928865 2.084954 1.718306 1.885826 1.1136725 2 204.66 91.38 153.4 69.72 364.43 222.92 310.61 141.53 2.239659 2.200229 1.634802 2.194658 1.1594035 2 306.11 156.82 172.75 73.17 217.55 102.43 328.43 183.87 1.951983 2.36094 2.123889 1.786208 1.1030226 2 1721.28 759.11 1359.44 580.02 869.26 577.31 911.75 516.62 2.267497 2.343781 1.505708 1.764837 1.4099426 2 1911.88 791.52 1263.34 526.91 831.29 548.07 897.03 556.1 2.415454 2.397639 1.516759 1.613073 1.5378127 2 330.21 177.54 217.86 97.29 299.83 168.65 403.61 236.38 1.859919 2.239285 1.777824 1.707463 1.1761457 2 428.87 208.77 232.24 103.04 368.87 194.28 310.05 145.07 2.05427 2.253882 1.898651 2.137244 1.0674598 2 488.6 366.88 432.12 314.93 455.17 330.7 520.66 412.96 1.331771 1.372114 1.376383 1.2608 1.0252938 2 702.34 487.23 455.52 313.5 625.33 527.92 468.02 388.88 1.441496 1.453014 1.184517 1.203508 1.2120949 2 263.17 174.35 186.88 126.12 247.88 174.32 281.02 194.52 1.509435 1.481763 1.421983 1.444684 1.0434419 2 511.42 316.47 204.19 129.8 446.28 377.22 292.81 187.43 1.616014 1.573112 1.183076 1.562237 1.161662
10 2 429.35 289.02 262.78 181.07 409.12 312.51 307.54 230.44 1.485537 1.451262 1.309142 1.334577 1.11085910 2 548.31 353.24 271.8 185.09 401.98 312.77 321.65 207.52 1.552231 1.468475 1.285226 1.549971 1.06543111 2 244.36 140.01 131.97 72.29 247.63 154.7 254.77 192.64 1.745304 1.825564 1.600711 1.322519 1.22154911 2 411.04 226.5 133.78 75.34 188.31 112.98 295.32 194.54 1.814746 1.775684 1.666755 1.518043 1.12736512 2 240.88 134.74 150.62 85.64 287.61 197.51 229.57 135.52 1.787739 1.758758 1.456179 1.693994 1.1258112 2 553.57 307.88 156.5 86.32 187.27 121.98 266.52 139.05 1.798006 1.813021 1.535252 1.916721 1.04607613 2 619.69 382.88 501.52 318.36 547.3 478.67 521.16 473.07 1.618497 1.575324 1.143376 1.101655 1.422617
Data source A
Randomiser
Source:‘RNA sequences’
N-pletdistributions
N-pletdistributions
Code-lengthdistributor
Code-lengthdistributor
Binomialtester
Coding lengthlikelihood distribution
Optimumextractor
Most likely:triplet
Computationalexperiment
Semanticmodelling Interface model A
17 1 195.9 96.75 142.49 71.95 245.36 150.33 309.75 219.68 2.024806 1.980403 1.632143 1.410005 1.31657317 1 297.89 140.18 135.29 72.31 299.44 208.34 316.12 163.49 2.125054 1.870972 1.437266 1.933574 1.18546918 1 258.88 133.89 198.39 99.32 269.61 152.15 600.04 501.95 1.933528 1.997483 1.772001 1.195418 1.32472418 1 343.7 182.82 185.06 93.88 223.53 131.69 381.29 256.01 1.879991 1.97124 1.697395 1.489356 1.20851319 1 420.56 246.45 242.37 117.64 313.9 198.39 362.91 209.43 1.706472 2.060269 1.582237 1.732846 1.13624319 1 356.92 203.84 239.09 121.24 230.15 134.61 379.83 219.32 1.750981 1.972039 1.709754 1.731853 1.08176820 1 917.96 550.93 744.69 312.29 715.53 381.94 1012.41 692.51 1.666201 2.38461 1.873409 1.461943 1.21450820 1 929.84 495.35 722.07 270.12 534.66 288.89 723.47 381.34 1.877137 2.673145 1.850739 1.897178 1.21408321 1 633.48 443.86 316.97 166.45 295.89 231 431.29 281.97 1.427207 1.904296 1.280909 1.52956 1.1853921 1 491.55 296.56 305.4 147.29 275.9 191.24 355.25 192.53 1.657506 2.073461 1.44269 1.845167 1.13477222 1 1695.87 800.25 2772.45 458.42 516.05 435.22 450.85 337.16 2.119175 6.047838 1.185722 1.337199 3.23712622 1 1670.69 501.76 3217.71 395.69 410.16 335.64 422.77 288.3 3.32966 8.131896 1.222024 1.466424 4.2632621 2 1394.88 757.09 908.91 549.26 940.94 542.6 681.48 651.54 1.842423 1.65479 1.734132 1.045953 1.2579521 2 2002.18 1155 863.68 509.28 926.69 507.4 801.16 817.44 1.733489 1.695884 1.82635 0.980084 1.2219682 2 316.65 157.76 182.51 90.46 316.21 195.7 351.78 218.51 2.007163 2.017577 1.615789 1.609903 1.2477132 2 694.9 442.9 210.75 106.14 282.96 166.7 364.46 286.64 1.568977 1.985585 1.697421 1.27149 1.1972613 2 197.62 95.71 163.28 67.84 241.07 126.59 326.33 142.05 2.064779 2.40684 1.904337 2.29729 1.0642593 2 508.98 303.32 176.03 75.66 240.2 122.34 292.99 137.14 1.67803 2.326593 1.963381 2.13643 0.9767824 2 223.91 124.2 169.5 83.46 341.53 207.71 653.56 594.13 1.802818 2.030913 1.644264 1.100029 1.3969844 2 600.34 311.24 191.92 92.05 239.36 139.3 327.7 173.77 1.928865 2.084954 1.718306 1.885826 1.1136725 2 204.66 91.38 153.4 69.72 364.43 222.92 310.61 141.53 2.239659 2.200229 1.634802 2.194658 1.1594035 2 306.11 156.82 172.75 73.17 217.55 102.43 328.43 183.87 1.951983 2.36094 2.123889 1.786208 1.1030226 2 1721.28 759.11 1359.44 580.02 869.26 577.31 911.75 516.62 2.267497 2.343781 1.505708 1.764837 1.4099426 2 1911.88 791.52 1263.34 526.91 831.29 548.07 897.03 556.1 2.415454 2.397639 1.516759 1.613073 1.5378127 2 330.21 177.54 217.86 97.29 299.83 168.65 403.61 236.38 1.859919 2.239285 1.777824 1.707463 1.1761457 2 428.87 208.77 232.24 103.04 368.87 194.28 310.05 145.07 2.05427 2.253882 1.898651 2.137244 1.0674598 2 488.6 366.88 432.12 314.93 455.17 330.7 520.66 412.96 1.331771 1.372114 1.376383 1.2608 1.0252938 2 702.34 487.23 455.52 313.5 625.33 527.92 468.02 388.88 1.441496 1.453014 1.184517 1.203508 1.2120949 2 263.17 174.35 186.88 126.12 247.88 174.32 281.02 194.52 1.509435 1.481763 1.421983 1.444684 1.0434419 2 511.42 316.47 204.19 129.8 446.28 377.22 292.81 187.43 1.616014 1.573112 1.183076 1.562237 1.161662
10 2 429.35 289.02 262.78 181.07 409.12 312.51 307.54 230.44 1.485537 1.451262 1.309142 1.334577 1.11085910 2 548.31 353.24 271.8 185.09 401.98 312.77 321.65 207.52 1.552231 1.468475 1.285226 1.549971 1.06543111 2 244.36 140.01 131.97 72.29 247.63 154.7 254.77 192.64 1.745304 1.825564 1.600711 1.322519 1.22154911 2 411.04 226.5 133.78 75.34 188.31 112.98 295.32 194.54 1.814746 1.775684 1.666755 1.518043 1.12736512 2 240.88 134.74 150.62 85.64 287.61 197.51 229.57 135.52 1.787739 1.758758 1.456179 1.693994 1.1258112 2 553.57 307.88 156.5 86.32 187.27 121.98 266.52 139.05 1.798006 1.813021 1.535252 1.916721 1.04607613 2 619.69 382.88 501.52 318.36 547.3 478.67 521.16 473.07 1.618497 1.575324 1.143376 1.101655 1.422617
Data source B
Semanticmodelling
Ontology B
Interface model B
Ontology A
IBUBioinformatics in the Netherlands
University of Amsterdam
NBIC- Bioinformatics
NBIC, national foundation for Dutch bioinformatics. Involves all academic and several industrial life sciences research organizations.
VL-e Consortium- Informatics
VL-e, informatics Bsik project by WTCW supporting BiOrange.
Local bioinformatics initiatives, mainly focused on directly supporting specific local life sciences research questions.
VL-E Experimental
(rapid prototyping) Environment
VL-E Proof-of- concept
Environment
VL-E Exploitation Environment
(SARA)
BiOrange Proof-of- concept
Environment
Life sciences researchers mainly focused on resolving
specific life sciences research questions.
BioRange Bioinformatics
Research
BiOrange, bioinformatics Bsik project by NBIC and “Nationaal Regieorgaan Genomics”.
BioASP, Bioinformatics Service Provider for life sciences researchers by NBIC and “Nationaal Regieorgaan Genomics”.
Bio-Application
SupportProgram
(Bio-ASP)
IBU
component interaction
stimuli
mechanism
program
history
response
presence state
Data integration: basic concept of any cell
Assumption: the complexity of life is organized via a limited number of general cellular mechanisms.
ED DA
DA
DI
LC