+ All Categories
Home > Documents > STBC2023 – Introduction to Bioinformatics Intermolecular Interactions & Biological Pathways M....

STBC2023 – Introduction to Bioinformatics Intermolecular Interactions & Biological Pathways M....

Date post: 22-Dec-2015
Category:
Upload: leslie-sanders
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
39
STBC2023 – Introduction to Bioinformatics Intermolecular Interactions & Biological Pathways M. Firdaus Raih Room 1166, Bangunan Sains Biologi Office Hours: Wednesdays Phone: 0389215961 Email: [email protected] er. 02-Mar-09-1
Transcript

STBC2023 – Introduction to Bioinformatics

Intermolecular Interactions & Biological Pathways

M. Firdaus RaihRoom 1166, Bangunan Sains Biologi

Office Hours: WednesdaysPhone: 0389215961 Email: [email protected]

Ver. 02-Mar-09-1

Pre-session Questions

• Do biological macromolecules act alone? • What are intermolecular interactions?• What are biological pathways?• How do we identify biological interactions?• Can these interactions and pathways be organised?• Where can information on intermolecular interactions

and biological pathways be found?• Can these interactions and pathways be visualised?• Can these interactions and pathways be searched and

compared?• What can be derived from the compiled information on

interactions/pathways?

Do biological macromolecules act alone?• What are the macromolecules involved? • Do these macromolecules act alone?

• Think of the central dogma of molecular biology and relate it to the questions here.

Do biological macromolecules act alone?• What are the macromolecules involved?

DNA, RNA, Proteins and carbohydrates/sugars • Do these macromolecules act alone?

NO.

• For example in the central dogma:• DNA replication – would involve interaction of a protein (DNA

polymerase) with DNA (being copied).• RNA synthesis would involve interactions of protein to DNA and RNA.• Protein synthesis involves interaction of protein with RNA.

• For many biological functions to take place, intermolecular interactions are involved.

What are intermolecular interactions?• What are intermolecular interactions?

• What are biological pathways?

• Where is all this leading to?

What are intermolecular interactions?• What are intermolecular interactions?

Interactions between different molecules – we are interested in the functions or outcomes of these interactions

• What are biological pathways?Interactions result in reactions the outcomes of the reactions are involved in other interactions and so on – these series of reactions form a pathwayCan be defined as: a modular unit of interacting molecules to fulfill a cellular function. ie. Metabolic pathway. Can you think of other examples?

• Where is all this leading to?The post genomics age – there is now a huge amount of data available. Individual biochemical components of organisms can be assembled into a representative model for an organism systems biology

What are intermolecular interactions?Challenges of the post genomics age:

– Data integration: integrate diverse biological information (amidst constant data generation)

• Scientific literature, existing body of knowledge about cellular systems• Genomic sequences• Protein sequences, motifs, and structures• Expression data from microarray, dbEST, and RT-PCR• Protein-protein interaction data from large-scale screening

– Functional discovery: assign functions to the 60,000+ human genes• Only 5% of known genes have assigned function• We have no clue what the function for the majority of discovered genes• In addition to genes themselves – the epigenome is also believed to play an important

role.• Without understanding function, no drug discovery can be done in either small

molecule, or in biopharmaceuticals.• Will be the focus of next 20-years of life-science research.

Where does the data come from?• Generally, the data on interactions come from the (wet) laboratory.

• Numerous methods are available to study different types of macromolecular interactions.

• These studies are then collated into pathways.

• It then becomes necessary to organise/store, integrate and visualise interactions/pathways data.

• The integrated data also needs to be able to be interrogated and compared.

Databases of Interactions and Pathways• The level of complexity for pathway databases are higher when

compared to databases previously looked at.

– The genome• 4 bases• 3 billion bp total• 3 billion bp/cell, identical

– The proteome• 20 amino acids• ~60K genes, ~200K proteins • ~10K proteins/cell; different cells/conditions, different expressions

– The pathome• ~200K reactions• ~20K pathways• ~1K pathways/cell; different cells/conditions, different expressions

Databases of Interactions and Pathways

• Homology (evolutionary relationships) is perhaps the most important assumption for bioinformatics analysis at the sequence level.

• For pathways, evolutionary relationships also applies.

• For pathways we look at:• Evolution of a simple pathway to a more complex pathway• Duplication, diversification and modular re-use of pathways

• This can be done by first building the pathways, followed by comparisons of the pathways. This is not unlike sequence and structure comparisons albeit with its own uniqueness.

Databases of Interactions and Pathways• Why study pathways?

To answer fundamental questions about biology.• What is the minimum pathways for life?• How does a new function arise?

• How can answers to these fundamental questions be applied?Some examples:Complete information of pathways will give rise to knowledge of what is essential and what is redundant. Such knowledge can then be tapped for discovery and/or design of drugs.

The uniqueness of individual pathomes can be tapped for concepts such as personalized medicine.

Databases of Interactions and PathwaysWhat are the data types, file formats, data structures/representations for pathway databases?

– Data types: • Sequence, interaction, ligand, literature, annotations (functions,

mechanisms etc.), structures, structural interactions etc.

– Data (file) formats – mainly text• Such as FASTA format for sequences, PDB format for structures and

miscellaneous text files containing data of interactions etc, files containing data for relationships.

– These individual bits of data can be represented visually; while the data can be structured and arranged as tables in relational databases.

Databases of Interactions and PathwaysRelational database implementation example (with only protein nodes shown).

motif_id

seq_id literature_id

pathway_idprotein=seq_id

gene_id

Gene_Table

gene_idchromosomestartstop

Protein_Table

seq_idcellular locationseq_txtgene_id

Interaction_Table

protein Aprotein Bpathway_idliterature_id

Info flow direction

Pathway_Table

pathway_idpathway_namedescriptionspeciescuratorentry_data

Protein_Motifs

motif_idseq_id <fk>

Motif_Def_Table

motif_iddescriptionregular expresssionHMM_matrix

Literature_Table

literature_idauthorjournalpub_datePDF_file

Visualisation of Interactions and Pathways• Lab data can be collated and built into a pathway.

• In general - pathways can be visualised symbolically.

A circle indicates a protein or a non-protein biomolecule. A symbol in between indicates the nature of molecule-molecule interaction.

Visualisation of Interactions and Pathways• Generally we can visualise a pathway as a network.

• In Graph theory:Relationships between data can be visualised using constructs referred to as graphs.A graph consists of a set of nodes (vertices) and connected by edges (which define the relationship(s) between the nodes).

Visualisation of Interactions and Pathways

• In Graph theory:Relationships between data can be visualised using constructs referred to as graphs.A graph consists of a set of nodes (vertices) and connected by edges (which define the relationship(s) between the nodes).

Vertices / Nodes

Arcs / Edges

Visualisation of Interactions and Pathways

• In Graph theory:Relationships between data can be visualised using constructs referred to as graphs.A graph consists of a set of nodes (vertices) and connected by edges (which define the relationship(s) between the nodes).

Visualisation of Interactions and Pathways

• Another feature of graphs in graph theory is that they can be directed or undirected. Therefore the relationships between graphs can be represented using vectors.

• How can a pathway visualisation be a graph?

• What other examples of graphs can you think of?

Visualisation of Interactions and Pathways

• How can graphs be represented for the computer?

Visualisation of Interactions and Pathways

• How can graphs be represented for the computer?

c b

a d4

2

6

108

Visualisation of Interactions and Pathways

• How can graphs be represented for the computer?The graphs can be represented by matrices.

c b

a d4

2

6

108

a b c da 8 4bc 6d 10 2

a c (8), d (4)bc b (6)d c (2), b (10)

Adjacency matrix Adjacency list

Curation of Pathway Databases• Database entries can come from:

• Raw experimental data• Other databases / existing datasets• Annotations / supplementary information

The data can in themselves contain errors in addition to introduced errors.

• Database curation involves not only annotation and integration of new data but also measures which include quality control and data integrity safeguards.

• Typically, databases can be populated via high-throughput means which can then be curated by computer programs or expert manual curation via appropriate interfaces. The same concepts are also applied for pathway databases.

• Many databases are relational type databases, however there are some which may employ other approaches such as XML.

• In addition to curation, some databases curate and validate data at the same time.

Databases of Interactions and PathwaysRelational database implementation example (with only protein nodes

shown).

motif_id

seq_id literature_id

pathway_idprotein=seq_id

gene_id

Gene_Table

gene_idchromosomestartstop

Protein_Table

seq_idcellular locationseq_txtgene_id

Interaction_Table

protein Aprotein Bpathway_idliterature_id

Info flow direction

Pathway_Table

pathway_idpathway_namedescriptionspecies

curatorentry_data

Protein_Motifs

motif_idseq_id <fk>

Motif_Def_Table

motif_iddescriptionregular expresssionHMM_matrix

Literature_Table

literature_idauthorjournalpub_datePDF_file

In this example:A pathway database can be built from an integration of data from external databases.Other data tables other than the ‘Pathway Table’ is sourced from existing data sources. Data for the ‘Pathway Table’ is curated manually either from raw data or collated data/information. A ‘curator’ field enables identification of the curator for a particular entry.

Navigating Pathway Databases• Overall search strategy

• Relevancy and specificity of purpose of the database used – growing number of databases; some are specific for particular pathways, diseases, organisms while some are well established for well known metabolic pathways; Searches must therefore be targeted at the correct resource.

• Several methods can be used to navigate pathway databases:• Browsing

• Hyper-links, menus – based on either protein name/family, ligand name, substrates, pathway name etc.

• Hyperlinked images / pathway diagrams to browse pathways reaction by reaction.

• Directed searches• Boolean type searches / ID or keyword searches• Sequence database search• Profile searches• Motif searches• Structure matching searches (ie. Small molecule comparisons to identify similar ligands)

Searching & Comparing Pathways• Comparisons of pathway graphs can be used to search for similar pathways.

• Comparisons can be done using several established algorithms and will depend on the type of comparison carried out. • Example: the graph comparisons can be carried out as graph/subgraph

isomorphism problems.

Can you present how this can be done in concept?

Searching & Comparing Pathways• Comparisons of pathway graphs can be used to search for similar pathways.

• Comparisons can be done using several established algorithms and will depend on the type of comparison carried out. • Example: the graph comparisons can be carried out as graph/subgraph

isomorphism problems.

Can you present how this can be done in concept?

Algorithm compares subgraph

Input pattern

Vertices / Nodes

Arcs / Edges

Searching & Comparing Pathways• Comparisons of pathway graphs can be used to search for similar pathways.

• Comparisons can be done using several established algorithms and will depend on the type of comparison carried out. • Example: the graph comparisons can be carried out as graph/subgraph isomorphism

problems.

How can the previous subgraph isomorphism representation be made as an input for computation?

Can you find other ways on how pathway searches and comparisons can be carried out?

Can you find other applications of graph theory in bioinformatics?

Investigating Interactions • To identify which approach is best fitted for your purposes, several questions need

to be posed and answered: Examples -• What is it ultimately that you are interested in:

• Components of a single interaction? • Interactions involving members of a family of proteins?

• Some strategies which can be considered:• Interested in a single interaction?

• Extract the information available for the interaction of interest; identify the components and other information of interest.

• Is structural interaction data available? Visualize and explore the interactions at atomic level. What is involved in this interaction at atomic level?

• Are there existing inhibitors? Find other similar inhibitors perhaps by small molecule structure comparison methods.

• Interested in understanding interactions for a family of protein, specific sets or in general – getting the macro information out.

• What is important at this level? Generalisation – can be presented as statistics.• For example: (1) Identify all protein structures which are bound to DNA. (2) Get statistics for the binding: which residues prefer to bind DNA, which motifs, etc?

What can such information be used for?

Building Pathway Databases: in Practice

Pathway Databases: Examples

Pathway Databases: in Practice

Applying Pathway InformationWhat can pathway information be used for?

Applying Pathway InformationWhat can pathway information be used for?• Understanding pathways in order to inhibit / intervene.• To build up background data for simulation purposes.

Useful for?

Applying Pathway InformationWhat can pathway information be used for?• Understanding pathways in order to inhibit / intervene.• To build up background data for simulation purposes.

Useful for:• Drug discovery and design.• Experimental design and refinements.• Whole cell/organism simulations.• Synthetic biology.• Personalized medicine.

Applying Pathway InformationWhat can pathway information be used for?• Understanding pathways in order to inhibit / intervene.• To build up background data for simulation purposes.

Useful for:• Drug discovery and design.• Experimental design and refinements.• Whole cell/organism simulations.• Synthetic biology.• Personalized medicine.

Self Study and Self Assessment

• The self study module for this series of lectures on analyses on intermolecular interactions and biological pathways will be available for download from SPIN. Format of the file is powerpoint show (.pps).

• The self assessment quiz is accessible from within the SPIN interface.

• Both these materials are for self assessment and self study use and DOES NOT contribute to your final grades for this course.

• Also explore the references and texts listed in the course information file and reading list.

• Explore resources made available via this self-study material.

Guide• This is a electronic self study and self assessment module which is based on the

lectures which cover the topic – Intermolecular Interactions & Biological Pathways of the STBC2023 – Introduction to Bioinformatics course.

• To navigate this module, use the buttons provided mostly on the bottom right hand corner of the page or in some slides, the bottom left hand corner. The Home icon button will automatically set the slide back to the key questions which we are trying to answer with this course material. Several pages have hyperlinks which navigate immediately to either specific slides OR navigate away from this module via the default web browser. To return, simply click back this file. Not clicking on the buttons properly will result in normal powerpoint slideshow mode progression of the slides as opposed to navigating to the directed pages.

• Practicals and self assessment questions to gauge your comprehension of a given concept or practical session are also provided throughout. Please attempt the practicals and the questions on your own before resorting to the solutions or answers provided.

Further Reading

Recommended Textbook (Lesk, 2nd Ed.)• Basics – Chapter 1

– Pages 1-59

• Sequence alignments – Chapter 5, Chapter 1– Pages 242-270– Pages 21-59

Other Textbooks • Baxevanis & Oullette, 3rd edition

– Chapters 5-7

• Pevsner


Recommended