1
Computational Systems Biology 16106, 5 cr
Olli Yli-HarjaTampere University of Technology
Institute of Signal Processing
IntroductionSystems biologyIntroduction to Medicel integrator (SBSW)
Measurement systems in biosciencesArray techniquesModels of measurement systems
Microscopy and image analysisData analysis
Microarray dataSupervised / Unsupervised learning
Role of modeling in biosciencesQuantitative modelsDiscrete models
Stochastic simulationSequence analysis
Topics covered on CSB1
2
Course homepageSGN-6106 COMPUTATIONAL SYSTEMS BIOLOGY I.htm
Lecture and exercise schedule lectures08.txtPrerequisites
Basic vocabulary in biosciences ICSP_Program_2006-07.pdfBasic skills in computation and signal processing SGN-6106.html
ContinuationCSB2 – Genomic data, models and data analysis in depth
CSB1 requiredSome more computational skills required (e.g. statistics etc.)
CSB1 – and other courses
Computational biosciencesGenomic signal processing, SGN-6206 5crComputational models in complex systems, SGN-6467 5crComplex systems 1, SGN-6307
Graduate seminarsSeminar on Signal Processing for Systems BiologySignal Processing Graduate Seminar IV
Other useful coursesImage processingPattern recognitionStatistical signal processingKnowledge mining
Other course offerings at TUT/ISP
3
Introduction to the research environment: TUT / Institute of Signal Processing
Introduction to the CSB research team at TUT/ISP:Signal Processing for Systems BiologyTeaching provided by the research team
Curriculum in biotechnologyCurriculum in computer science
Research topics, examples
First lecture, 5.2.2007
Signal Processing for Systems Biology
Olli Yli-HarjaTampere University of Technology
Institute of Signal Processing
4
Institute of Signal Processing (TUT/ISP)research personnel 170, 10 professors
30-40% from abroad - also in facultytopics:
audio, transforms, image processing, multimedia, medical imaging, systems biology, cell phones, cellular signals, …
Academy of Finland Centre of Excellence, 1996-2000-2006-2011
Tampere International Centre of Signal ProcessingInvites researchers to engage in research at the TUT/ISP
~40 visits yearly since 1997Catalyst of research and international connections
A truly international environment for research
Research environment
Based on the availability of large scale measurement dataKnowledge of DNA & genes, microarray technology, single cell manipulation and measurement …
Advancing the use of regular scientific research approaches in biology:Mathematical models, Model-based prediction, …, control, …, designBiology becoming part of physics! General laws in biology?Interplay between models and experiments
Why is it necessary?Amount of measurement data increasesBiological knowledge increasesInterpretation without tools is impossible
ProblemsProhibitive complexity
cell, tissue, organism, environmentRepeatability
Controllability of the experimental setupHow to obtain a comparable set of cellsfor a repeat experiment?
Huge engineering effort needed in data integration
What is systems biology?
5
Signal processing:In the intersection of computer science, statistics, dynamic systems theoryManipulation, storing, retrieval, analysis of measurement dataSystematic modeling of observationsDesign of optimal algorithms based on mathematical model
Develop novel signal processing methods that can be applied in modern systems biology
Building on fundamental research of signal processingInter-disciplinary collaboration with research partners
Main directionsTool development
Quantitative models of biological systemsModelling of novel biosystem measurement technologiesImage processing and analysis in an important role
Applied research in medicineData analysis
Basic research in theoretical biologyDiscrete network models
What has signal processing to offerfor systems biology?
Statistical learning of gene regulationDr. Harri Lähdesmäki, Dr. Miika Ahdesmäki, Antti Larjo, Xiaofeng Dai, Kirsti Laurila
Stochastic models of gene networksDr. Andre Ribeiro, Tiina Rajala, Antti Häkkinen, Olli-PekkaSmolanderBoolean network dynamicsDr. Juha Kesseli, Manu Harju, Aku AntikainenComputational NeuroscienceAcademy research fellow *Marja-Leena Linne, Tiina Manninen, AnttiPettinen, Katri Hituri, Eeva Mäkiraatikka, Heidi Teppola, Antti Saarinen, Jukka Intosalmi
Computational systems biologyProfessor Olli Yli-Harja, Dr. Matti Nykter, Dr. Antti Niemistö, Jari Yli-Hietanen, Tommi Aho, Nikhil, *Jenni Seppälä, Kaisa-Leena Taattola, Kalle Leinonen, Antti Ylipää, Virpi Kivinen, Timo Erkkilä, Roger MallolParera
Image processingLecturer Heikki Huttunen, Pekka Ruusuvuori, Jyrki Selinummi, KalleRutanen, Tapio Manninen, Sharif Chowdhury, Hamid Dadkhahi
53 Journal papers and 8 PhD-theses in 2003-2007
Research team(started in 2001)
6
Teaching in computational bioscienceNew major in Bio- and environmental engineering in 2004:
Computational systems biologyfor Biotechnology students
New major in Institute of Signal Processingin 2005:
Computational systems biologyfor Computer Science students
Joint projects and joint guidance of students within Tampere area
Courses offered by the CSB research group
New courses, starting 2005 -> Introduction to computational systems biology, SGN-6056 5crComputational systems biology 1, SGN-6106 5crComputational systems biology 2, SGN-6156 5crComputational models in complex systems, SGN-6467 5crComplex systems 1, SGN-6307 5cr
Graduate seminarsSeminar on Signal Processing for Systems Biology, SGN-6906, 2-3 crSignal Processing Graduate Seminar IV, SGN-9406, 3-8 cr
Old coursesSignal processing for systems biology, 40 students, 2003-2004Seminars in microscopy arranged, 15-20 students, 2001-2005Weekly research seminar, 20-30 students, since 2001
7
Research visits in 2002-2007 (~ 178 person-months ~ 15 person-years)Institute of Systems Biology, Seattle: Dr. Harri Lähdesmäki, Dr. Antti Niemistö, Dr. Matti Nykter, Antti Larjo, Jyrki
Selinummi, Miika Ahdesmäki (74m)University of Texas, M. D. Anderson Cancer Center, Houston: Dr. Harri Lähdesmäki, Dr. Antti Niemistö, Dr. Matti
Nykter, Prof. Olli Yli-Harja, Dr. Daniel Nicorici (27m)Institute for Biocomplexity and Informatics, University of Calgary: Dr. Pauli Rämö, Juha Kesseli, Tiina Manninen,
Antti Lehmussola, Andre Ribeiro (12m)NHGRI, Washington & MIT Boston: Dr. Sampsa Hautaniemi (36m)University of Leipzig, Germany: Miika Ahdesmäki (3m)University of Jena, Germany: Tommi Aho (3m)Texas A&M University: Pekka Ruusuvuori (3m)University of Uppsala: Jenni Seppälä (2m)University of Antwerp: Katri Hituri (3m)University of York, UK: Kathryn Williams, X (6m) *Visiting Tampere from University of California in Santa Barbara : John Berger (12m)
International collaboration
TICSP Workshop on Computational SystemsBiology held annually 2003-2006 WCSB06.htm
IEEE Workshop on Genomic Signal Processing in June 2007 Gensips 2007.htm
Xochicalco, Mexico, 2006
8
Research directions
Simulation(quantitative
models)
Models and toolsfor measurement
systems
Image processingand analysis
Statisticalinference &Stochastic
models
Data analysis &Experiment
design
Discretemodels &
Informationapproach
Data integrationEngineering
Basic research
Applied research
Bioelectrical neuron models:Linne et al., A Model Integrating the CerebellarGranule Neuron Excitability and Calcium SignalingPathways, Neurocomputing 2004
Evaluation of existing simulation tools:***Pettinen et al., Simulation Tools for Biochemical Networks: Evaluation of Performance and Usability, Bioinformatics 2005
Using stochastic differentian equationsManninen et al., Developing Itô stochastic differential equation models for neuronal signal transduction pathways, Computational Biology and Chemistry 2006Saarinen et al.. Stochastic differential equation model for cerebellar granule cell excitability. PLoSComputational Biology, 2008.
SimulationQuantitative models of signalling and metabolic networks
Main collaboratorsAri Huovila, Institute of Medical Technology, University of TampereJaakko Puhakka, Institute of EnvironmentalEngineering and Biotechnology, Tampere University of Technology
Ordinary and stochastic differential equationsCase studies in mammal neurons, yeast, bacteria
PKC signalling network in mammal neurons
9
Joint research project with Jaakko Puhakka, TUT Bio-and environmental engineeringReactor simulation
r/minComputational research cycle
r/yExperimentalresearch cycle
r/min
Performingexperiments
Proposing new hypotheses
Analyzing results
Designingexperiments
Microbial hydrogen productionNikhil, et al., Application of the clustering hybrid regression approach to model xylose-based fermentative hydrogen production. Energy and Fuels, 2007Akhbardeh et al., Towards the experimental evaluation of novel supervised fuzzy adaptive resonance theory for pattern classification, Pattern Recognition Letters, 2007.
r/minComputational research cycle
r/yExperimentalresearch cycle
r/min
Performingexperiments
Proposing new hypotheses
Analyzingresults
Designingexperiments
10
Models of biosystem measurement systemsModeling and inversion of the effects of the measurement system:
Lähdesmäki et al., In silico microdissection of microarray data from heterogeneous cell populations, BMC Bioinformatics, 2005Ahdesmäki et al., Robust regression for periodicity detection in non-uniformly sampled time-course gene expression data, BMC Bioinformatics, 2007
Microarray model*** Nykter et al., Simulation of microarray data with realistic characteristics, BMC Bioinformatics, 2005Lehmussola et al., Evaluating the performance of microarray segmentation algorithms. Bioinformatics, 2006.
Main collaborators:Wei Zhang, University of Texas M. D. Anderson Cancer CenterStuart Kauffman, Institute forBiocomplexity and Informatics, University of Calgary
Signal processing approach:Systematic modeling of observationsDesign of optimal algorithms basedon mathematical modelsValidation of models and analysistoolsIntegrated image analysis
Main collaborators:Wei Zhang, University of Texas M. D. Anderson Cancer CenterStuart Kauffman, Institute forBiocomplexity and Informatics, University of Calgary
11
Gene expression is measured from a cell populationCell population gradually loses its synchronyThis corresponds to a time-varying low-pass filtering
Effects of Cell Population Asynchrony in Gene Expression Time-Series
Lähdesmäki et al., Estimation and Inversion of the Effects of Cell Population Asynchrony in Gene Expression Time-Series, Signal Processing, 2003Niemistö et al., Computational methods for estimation of cell cycle phase distributions of yeast cells. EURASIP Journal on Bioinformatics and Systems Biology, 2007
Subcellular Image Analysis
Analysis of dispersion of the Golgi apparatus after different treatments(SH-SY5Y neuroblastoma cells)
Jyrki Selinummi, Antti Lehmussola, Jertta-Riina Sarkanen, Jonna Nykky, Tuula O. Jalonen, Olli Yli-Harja“Automated Analysis of Golgi Apparatus Dispersion in Neuronal Cell Images”Proc. 4th TICSP Workshop on Computational Systems Biology (WCSB 2006), Tampere, 2006.
12
Image processing and analysis for biomeasurementsQuantitative measurements based on microscope images of cell populations
Analysis and synthesis of imagesSimulated of microscope images!?
Measurements based on imaging:Niemistö et al., Robust quantification of in vitro angiogenesis through image analysis, IEEE Transactions on Medical Imaging, 2005Niemistö et al., Analysis of angiogenesis using in vitro experiments and stochastic growth models, Physical Review E, 2005***Selinummi et al., Software for quantification of labeled bacteria from digital microscope images by automated image analysis, Biotechniques, 2005CellC-software, freely available
Main collaboratorsWei Zhang, University of Texas,
M. D. Anderson Cancer CenterJaakko Puhakka, Institute of Environmental
Engineering and Biotechnology, Tampere University of Technology
Olli-Pekka Kallioniemi, Technical Research Centreand University of Turku, Finland
v
13
Motivation for the Use of Simulation in Validation
Xiaobo Zhou, Stephen T.C. Wong, ”Informatics challenges of high-throughput microscopy” IEEE Signal Processing Magazine, May 2006:
”..when we detect or track lots of cells, proteins, and neuron spines, how can we validate the extracted results of automated analysis since it is almost impossible to perform similar manual analysis in high-throughput scale?””Manual methods can only be used to tens or at most up to hundreds of images. They would make lots of counting errors when analyzing hundreds or thousands of images… Thus, simulation of complex biological processes and images becomes an urgent issue.””Should such simulation model exist, it would be invaluable in validating
HTS image analysis algorithms without relying on laborious and costly manual validation”
14
Simulation of Images of Cell PopulationsMotivation for simulation of measurement systems
Unlimited amounts of free dataFree of financial and time constraints
Total control of the experimentsHypothesis testingEducation
Ground-truth informationValuable information for benchmarking various analysis methodsReduces requirements for preprocessing and quality control
Lehmussola et al., Computational framework for simulatingfluorescence microscope images with cell populations. IEEE Transactions on Medical Imaging, 2007
Simulation of Images of Cell PopulationsSimulated images share properties of realfluorescence microscope images
Example: real image (A) and synthetic image (B)Realistic enough for validating e.g. image analysisalgorithms
15
Comparison of Cell Enumeration AlgorithmsCell enumeration software are compared
Five different clustering conditionsEach image has the same number of cells (1000)Cells become more clustered more overlapping
Results of Comparison
•ImageJ: National Institutes of Health•CellProfiler: Whitehead Institute for Biomedical Research and MIT's CSAIL. •MCID Analysis: ImagingResearch, Inc.•CellC: TUT
16
Explorative data analysis 1Clustering and explorative analysis:
***Lähdesmäki et al., (2004) Distinguishing Key Biological Pathways Between Primary Breast Cancers and their Lymph Node Metastases by Gene Function-based Clustering Analysis, International Journal of OncologyD. Nicorici et al., (2006) Finding Large Domains of Similarly Expressed Genes using MDL Principle, IEEE Engineering in Medicine and Biology Magazine
Inter-disciplinary cancer researchNormalization, visualization
Main collaborators:Wei Zhang, University of Texas,
M. D. Anderson Cancer CenterOlli-Pekka Kallioniemi, Technical Research
Centre and University of Turku, FinlandJukka Partanen, Finnish blood serviceLauri Aaltonen, University of Helsinki
Inter-disciplinary cancer researchNormalization, visualization
Explorative data analysis 2Visualization, validation of results
***Nykter et al., Unsupervised analysis uncovers changes in histopathologic diagnosis in supervised genomic studies. Technology in Cancer Research and Treatment, 2006
Main collaborators:Wei Zhang, University of Texas,
M. D. Anderson Cancer CenterJukka Partanen, Finnish blood serviceMDS of microarray data
Labels from clinical database
17
MDS of microarray dataLabels from clinical database
Visualization, validation of resultsNykter et al., (2006) Unsupervised analysis uncovers changes in histopathologic diagnosis in supervised genomic studies. Technology in Cancer Research and Treatment
Statistical learning algorithms to complex problems in systems biology
***Lähdesmäki et al. Probabilistic inference of transcription factor binding from multiple data sources, PLoS ONE, 2008Lähdesmäki et al. Learning the structure of dynamic Bayesian networks from time series and steady state measurements (submitted)Lähdesmäki et al. On learning gene regulatory networks under the Boolean network model, Machine Learning, 2003 Ahdesmäki et al. Robust regression for periodicity detection in non-uniformlysampled time-course gene expression data, BMC Bioinformatics, 2007Ahdesmäki et al., Robust detection of periodic time series measured from biological systems, BMC Bioinformatics, 2005
1-3 Promotor Sequence Specificities4 Conservation (comparative to 28 other species)5 Predicted nucleosome positions6 Estimated neutral vs.regulatory
Transcriptional regulatory processesSignalling pathwaysStatistical inferenceMain collaborators:Ilya Shmulevich, Institute of Systems Biology Seattle
18
From stochastic gene regulatory networks to phenotypeStochasticity in GRN affect development and function of organisms. We investigate phenotypic variability arising from noise of the underlying GRN.
Milestones:Ribeiro et al.,General Modeling Strategy for Gene Regulatory Networks with Stochastic Dynamics, J. Comp. Bio., 2006Ribeiro et al., SGN Sim, a Stochastic Genetic Networks Simulator, Bioinformatics, 2007Ribeiro et al., Noisy Attractors and Ergodic Sets in Models of Genetic Regulatory Networks, J. Theo. Bio., 2007Ribeiro et al., Effects of coupling strength and space on the dynamics of coupled toggle switches in stochastic gene networks with multiple-delayed reactions, Phys Rev E, 2007Ribeiro et al., Dynamics of a two-dimensional model of cell tissues with coupled stochastic gene networks, Phys Rev E, 2007.Ribeiro et al., Mutual Information in Random Boolean models of regulatory networks, Phys Rev E, 2007.
Main collaborators:. Stuart A Kauffman, Inst. Biocomp and Informatics, Univ. of Calgary.. Fred G. Biddle, Dep. of Medical Genetics, Univ. of Calgary.. John J. Grefenstette Dep. Bioinformatics, George Mason University . Daniel Cloud, Dep. of Philosophy, Princeton Univ. . Rui Zhu, Dep. of Chemistry, Univ. of Calgary.
Current goals:Cells are parallel information processing systems, binding past events to future actions. Cells in tissues work collectively. We seek general principles governing these organizations.
Variability of cells in lineages and populations’ phenotypes.Effects of mutations and irradiation on the P53-Mdm2 loop. Dynamics of cells in tissues.Information propagation in GRN.
Simulation of P53 concentration time series in normal cells of a cell lineage responding to DNA double strand breaks due to irradiation in the mother cell.
Results: simulation of P53-Mdm2 feedback loop using the multi time-delayed SSA
P53 time series of a cell lineage. Cell divisions occur at each 1000 s
0
400
800
1200
1600
2000
0 500 1000 1500 2000 2500 3000
#P53
time (s)
P53 cell 1,1P53 cell 2,1P53 cell 2,2P53 cell 3,1P53 cell 3,4P53 cell 3,3DSB's
Time series of P53 in a cell line where ProP53 is subject to duplication
0
400
800
1200
1600
2000
0 1000 2000 3000 4000 5000 6000time (s)
#P53 P53 cell 1,1P53 cell 2,1P53 cell 2,2P53 cell 3,1P53 cell 3,4DSB (cell 1,1)DSB (cell 2,1)
P53 time series in cells of lineage where the P53 gene is subject to duplication during each cell life time.
Experimental measures of P53 concentration in a cell lineage responding to DNA double strand breaks due to irradiation.
Geva-Zatorsky et al, Molec Sys Bio, 2, 2006
A.S. Ribeiro et al., CellLine - Cell Lineages Gene Networks Simulator, Bioinformatics, 2007.
19
Complex systemsGeneral laws in biologyNew biological observablesRandomized network ensembles
Discrete large-scale models of biosystems
Annealed Boolean dynamics:Kesseli et al., On Spectral Techniques in Analysis of Boolean Networks, Physica D, 2005Kesseli et al., Tracking perturbations in Boolean networks with spectral methods, Physical Review E, 2005Kesseli et al., Iterated maps for annealed Boolean dynamics, Physical Review E, 2006
Criticarity in gene regulatory networksRämö et al., Stability of Functions in Boolean Models of Gene Regulatory Networks, Chaos, 2005***Rämö et al., Perturbation Avalanches and Criticality in Gene Regulatory Networks,Journal of Theoretical Biology, 2006
Flux analysisAho et al., RMBNToolbox: random models for biochemical networks, BMC SystemsBiology, 2007
Main collaborators:Stuart Kauffman, Institute for Biocomplexity and
Informatics, University of CalgaryIlya Shmulevich, Institute of Systems Biology, Seattle
)(1 tt xfx =+
Information theoretic approach to biologyInference of Boolean networks from data:
Lähdesmäki et al., Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks, Signal Processing, 2006
Information theoretic approachRämö et al., Information Propagation in Models of Gene Regulatory Networks, Physica D, 2007Nykter et al., Critical networks exhibit maximal information diversity in structuredynamicsrelationships, Physical Review Letters, 2007*** Nykter et al., (2008) Gene expression dynamics in the macrophage exhibit criticality,PNAS, 2008
Main collaborators:Ilya Shmulevich, Institute of Systems Biology, Seattle Stuart Kauffman, Institute for Biocomplexity and
Informatics, University of CalgaryLeroy Hood, Institute of Systems Biology, Seattle
20
Abstraction and zooming in
Adding up the details
We need to balance both: To analyze the trees (idiosyncratic details), yet appreciate the forest (‘universal’ principles)
Image courtesy to Sui Huang, 2006, Back to the biology in systems biologyBriefings in functional genomics & proteomics,Oxford Univ. Press, 2004