+ All Categories
Home > Documents > creativecommons/licenses/by-sa/2.0

creativecommons/licenses/by-sa/2.0

Date post: 01-Jan-2016
Category:
Upload: adria-ewing
View: 18 times
Download: 0 times
Share this document with a friend
Description:
http://creativecommons.org/licenses/by-sa/2.0/. Bioinformatics. Prof:Rui Alves [email protected] 973702406 Dept Ciencies Mediques Basiques, 1st Floor, Room 1.08 Website: http://web.udl.es/usuaris/pg193845/testsite/ Course Website: http://web.udl.es/usuaris/pg193845/Bioinformatics_2009/. - PowerPoint PPT Presentation
58
http:// creativecommons.org/ licenses/by-sa/2.0/
Transcript
Page 1: creativecommons/licenses/by-sa/2.0

http://creativecommons.org/licenses/by-sa/2.0/

Page 2: creativecommons/licenses/by-sa/2.0

Bioinformatics

Prof:Rui [email protected]

973702406Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08Website:http://web.udl.es/usuaris/pg193845/testsite/

Course Website:http://web.udl.es/usuaris/pg193845/Bioinformatics_2009/

Page 3: creativecommons/licenses/by-sa/2.0

Language of the course

• Mine: English

• Slides: English

• Webpage: English

• Yours: Whichever you choose as long as I understand it. ALWAYS ASK WHEN YOU DON’T UNDERSTAND SOMETHING!!

Page 4: creativecommons/licenses/by-sa/2.0

Web Page of the course

http://web.udl.es/usuaris/pg193845/Bioinformatics_2009/

• There, you will find all the information about your tasks, links to bioinformatics resources, and the lecture

Page 5: creativecommons/licenses/by-sa/2.0

Goals of this course

• Give you an integrated view of how to use computers and informatics to gain a systemic understanding of biological systems at the molecular level.

• Integrate bioinformatics, mathematical modelling and other areas of computational biology to save lab work and address problems that can not yet be solved at the lab.

Page 6: creativecommons/licenses/by-sa/2.0

Course Plan

• First part of the course (2 weeks): Broad introduction to bioinformatics and computational biology in molecular biology.

• Second part of the course: Problems for you to solve in group at home, + in-depth lectures about the different subjects you need to solve the problems.

Page 7: creativecommons/licenses/by-sa/2.0

Evaluation Plan • 5 tasks in groups of four. At the end of each task you

deliver a paper as a group. (overall, all tasks will account for 50% of final grade).

• Final paper presenting the whole story together (20%).• Individual discussion of the final paper with me (20%).• Class participation (10%).• CAUTION: YOU NEED TO HAVE AT LEAST 5/10 IN

EACH TASK, IN THE FINAL PAPER AND IN THE DISCUSSION.

Page 8: creativecommons/licenses/by-sa/2.0

Index

• Why bioinformatics?

• Ontologies & Classification schemes

• Databases and servers

Page 9: creativecommons/licenses/by-sa/2.0

Why Bioinformatics?

Prof:Rui [email protected]

973702406Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08Website:http://web.udl.es/usuaris/pg193845/testsite/

Page 10: creativecommons/licenses/by-sa/2.0

What obvious problems do large scale sets create?

•Imagine the 6 500 000 000 human beings born within the last 130 years and still alive.•By and large a majority of them has had and education.•What problems need solving to ensure that education?

Knowledge1 – Organize Knowledge2 – Organize its transmission

Page 11: creativecommons/licenses/by-sa/2.0

First problem: organizing knowledge

•We do not need to know all there is to know in order to be productive in society•Furthermore we can not learn everything at the same time.•Problem: How to organize knowledge into bite-sized packages that can be consecutively parceled out, and from which one can build upon?

Page 12: creativecommons/licenses/by-sa/2.0

Organizing knowledge

Communication(Read, write, count)

Humanities

Sciences

Page 13: creativecommons/licenses/by-sa/2.0

Second problem: organizing the transmission of knowledge

•The school system is a way in which the most people can be trained with the least societal effort

Not effective

Page 14: creativecommons/licenses/by-sa/2.0

School and Books are the servers and databases of educating people

Users

Database

Server

New Server:

You

Page 15: creativecommons/licenses/by-sa/2.0

Hey, it’s raining!!! Why don’t we try and figure out how all the little molecular pieces in a cell work

together?!?!?!

Understanding biological systems

We were WRONG!!!!!

I need more data!!! How do I plan what to

do now?

Page 16: creativecommons/licenses/by-sa/2.0

The “omics” revolution in molecular biology

•Over many decades, a huge amount of biological data has accumulated.

•Unlike the “KNOWLEDGE” we discussed before, this data is not well organized and the connections between the different parcels of data are obscure.

•The omics revolution has compounded this problem 1000 fold because data now accumulates faster than ever.

Page 17: creativecommons/licenses/by-sa/2.0

What is the “omics” revolution in molecular biology?

•The omics revolution is a period of about ten years in which

several different technologies that can be applied to study

the complement molecular landscape of cells!!!

•Genomics

•Proteomics

•Metabolomics

•Et caeteromics

Page 18: creativecommons/licenses/by-sa/2.0

The “omics” revolution in molecular biology

•(We!!) Biologists want the data to make sense and they want it now!!!

Page 19: creativecommons/licenses/by-sa/2.0

Understanding biological systems

I need more data!!! Why

don’t they give it to me

Page 20: creativecommons/licenses/by-sa/2.0

Comparison between the two problems

People organized the Knowledge transmission system and its connections over milenia of trial and error.

It is impossible for people to organize the biological knowledge brought about by omics in the 10 years that have passed since the beginning of the omics era.

Page 21: creativecommons/licenses/by-sa/2.0

Why?

•Data is not well classified.

•Data is not well connected.

•Data is not well understood.

•Not enough people to do it in a short amount of time.

Page 22: creativecommons/licenses/by-sa/2.0

New types of servers and databases are required for very fast organization and

data mining

Users

Database

Server

BIOINFORMATICS!!

Page 23: creativecommons/licenses/by-sa/2.0

• Development and application of computational/informatic tools to the solution of biological problems

• The Standard of internet Bioinformatics:

What is Bioinformatics?

L A M PINUX

PACHE

Y

SQL

ERL

HPYTHON

Operating system

Internet server Database

server

Programing

language(s)

Page 24: creativecommons/licenses/by-sa/2.0

• JAVA facilitates that the servers launch a smaller number of processes by using the client’s machines for calculus and allowing for a larger number of simultaneous connections.

• TOMCAT “talks” very well with JAVA.

The standards are changing

L T M JINUX

OMCAT

Y

SQL

AVA

Operating system

Internet server Database

server

Programing

language(s)

Page 25: creativecommons/licenses/by-sa/2.0

What does a computer need to be effective?

•Well classified data•Ontologies, Classification schemes

•Well organized data•Databases, servers

•Good users

Page 26: creativecommons/licenses/by-sa/2.0

Index

• Why bioinformatics?

• Ontologies & Classification schemes

• Databases and servers

Page 27: creativecommons/licenses/by-sa/2.0

Ontologies and classification schemes for data

Prof:Rui [email protected]

973702406Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08Website:http://web.udl.es/usuaris/pg193845/testsite/

Page 28: creativecommons/licenses/by-sa/2.0

Biological Classification Schemes

• What is an Ontology (in the Biological sense)?

A set of definitions of controlled vocabularies with hierarchical relationships to one another, that can easily be dealt with by computers

Page 29: creativecommons/licenses/by-sa/2.0

What are Bio-Ontologies?

Biological Ontologies (Bio-ontologies) can be defined as a complex

hierarchical structure in which biological concepts are

described by their meanings (definitions) and relationships to

each other.

There are many Bio-Ontologies available and in use by databases.

The Plant Ontology, along with other ontologies such as the

Gene Ontology, are included in the open source Open

Biological Ontologies project at Sourceforge.

http://obofoundry.org/

Page 30: creativecommons/licenses/by-sa/2.0

The Gene Ontology

The most well-known example of a bio-ontology is the Gene Ontology

(GO; http://www.geneontology.org) which describes three

biological domains: cellular component (where the gene product

locates), molecular function (what the gene product does) and

biological process (the cellular, developmental or physiological

events the gene product is involved in).

GO are used to describe gene products. Because these descriptions are

independent of species-specific nomenclature and uniformly

applied, it is possible to make meaningful and efficient

comparisons of genes across diverse taxa.

Page 31: creativecommons/licenses/by-sa/2.0

Three “Super Categories of GO

• Molecular Function (what)– Tasks performed at the molecular level

• Biological Process (why)– How it pertains to the organism

• Cellular Component (where)– Its location

Page 32: creativecommons/licenses/by-sa/2.0

Example

• Gene Name: BRCA1

• Molecular Function: protein binding

• Biological Process: DNA Replication and Chromosome Cycle

• Cellular Component: nucleus

Page 33: creativecommons/licenses/by-sa/2.0

Structure of GO

• How to define the relationship between concepts?• Example: How to relate the terms: “cell” “nucleus”

“membrane”

Page 34: creativecommons/licenses/by-sa/2.0

How is GO Annotated?

• Manual– Humans sifting through primary literature

• Electronic– Assign GO Terms using already existing

information in databases.

Page 35: creativecommons/licenses/by-sa/2.0

Evidence Code for GO Annotation

IEA Inferred from Electronic Annotation

ISS Inferred from Sequence Similarity

IEP Inferred from Expression Pattern

IMP Inferred from Mutant Phenotype

IGI Inferred from Genetic Interaction

IPI Inferred from Physical Interaction

IDA Inferred from Direct Assay

RCA Inferred from Reviewed Computational Analysis

TAS Traceable Author Statement

NAS Non-traceable Author Statement

IC Inferred by Curator

ND No biological Data available

Detailed info available from:

http://www.geneontology.org/doc/GO.evidence.html

Page 36: creativecommons/licenses/by-sa/2.0

How to use GO in data analysis

• Simple Queries• Find over-represented GO categories in a list of

genes– Search Biological “Themes”

• Binning– Obtain a broad view of the distribution of major GO

terms in a list of genes.

• Clustering Genes on GO terms– Group together functionally related genes based on

GO terms.

Page 37: creativecommons/licenses/by-sa/2.0

GO Tools

• NetFlix – Get GO Annotation

• AmiGO – Browser and Simple Queries

• GoTermMapper – Binning(Go Slim)

• GeneToolBox – – Finding over-represented GO categories– Clustering based on similar GO terms – Query for Gene with Similar Function.

Page 38: creativecommons/licenses/by-sa/2.0

GO is not very good

• EC numbers

• Protein classification schemes

• TF classification schemes

• Transport proteins classification schemes

• Etc.

Page 39: creativecommons/licenses/by-sa/2.0

The EC number database

Page 40: creativecommons/licenses/by-sa/2.0

The BRENDA database

Page 41: creativecommons/licenses/by-sa/2.0

The TF classification database

Page 42: creativecommons/licenses/by-sa/2.0

The signal transduction classification database

Page 43: creativecommons/licenses/by-sa/2.0

The transport proteins classification database

All these classifications are reminiscente of the Dewey classification system for books!!!! (Remember public libraries?)

Page 44: creativecommons/licenses/by-sa/2.0

A general protein classification database

Page 45: creativecommons/licenses/by-sa/2.0

What does a computer need to be effective?

•Well classified data•Ontologies, Classification schemes

•Well organized data•Databases, servers

Page 46: creativecommons/licenses/by-sa/2.0

Index

• Why bioinformatics?

• Ontologies & Classification schemes

• Databases and servers

Page 47: creativecommons/licenses/by-sa/2.0

Databases & Servers

Prof:Rui [email protected]

973702406Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08Website:http://web.udl.es/usuaris/pg193845/testsite/

Page 48: creativecommons/licenses/by-sa/2.0

What is a Database?

• A database is a collection of data organized in such a way that it is easy to store in a computer and to mine by appropriate software

• A database is usually organized as a set of tables in which information about an object is stored

• The tables are related to each other in different ways.

Page 49: creativecommons/licenses/by-sa/2.0

What does database technology allow?

•Making information useful

•Avoiding "accidental disorganisation”

•Making information easily accessible and integrated with the rest of our work

Page 50: creativecommons/licenses/by-sa/2.0

S(tructured)Q(uery)L(anguange)

• ANSI (American National Standards Institute) standard computer language for accessing and manipulating database systems.

• SQL statements are used to retrieve and update data in a database.

• Includes:– Data Manipulation Language (DML)– Data Definition Language (DDL)

Page 51: creativecommons/licenses/by-sa/2.0

Web Databases

• Data is accessible through Internet• Have different underlying database models• Example: biological databases

– Molecular data: NCBI, Swissprot, PDB, KEGG, GO

– Protein interaction : DIP , BIND– Organism specific: Mouse , Worm, Yeast– Literature: Pubmed– Disease: OMIM

Page 52: creativecommons/licenses/by-sa/2.0

How to make databases useful

• Attach it to a server

• Let people use to mine for knowledge

Page 53: creativecommons/licenses/by-sa/2.0

An example of WAMP•The bioinformatics class server

Page 54: creativecommons/licenses/by-sa/2.0

An example of WAMP•The bioinformatics class server

Page 55: creativecommons/licenses/by-sa/2.0

An example of WAMP•The bioinformatics class server

Page 56: creativecommons/licenses/by-sa/2.0

An example of WAMP•The bioinformatics class server

Wireless

Page 57: creativecommons/licenses/by-sa/2.0

An example of WAMP•The bioinformatics class server

Wireless

Page 58: creativecommons/licenses/by-sa/2.0

Summary

• Why bioinformatics:– Because there is simply too much data out there

for human being to deal with without computer assistance.

– Because many of the calculations to extract knowledge from the data would take too long without computers.

• How to do bioinformatics:– Organize data well using appropriate

classification systems.– Use databases and server technology.


Recommended