+ All Categories
Home > Documents > Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage: Email:...

Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage: Email:...

Date post: 21-Dec-2015
Category:
Upload: veronica-hicks
View: 248 times
Download: 0 times
Share this document with a friend
Popular Tags:
38
Bin Liu ( 刘刘 ), PhD, Associate Professor Intelligent Computing Research Center Homepage: http://bioinformatics.hitsz .edu.cn/ Email: [email protected] or [email protected] Bioinformatics
Transcript
Page 1: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Bin Liu (刘滨 ), PhD, Associate ProfessorIntelligent Computing Research Center

Homepage: http://bioinformatics.hitsz.edu.cn/

Email: [email protected] or [email protected]

Bioinformatics

Page 2: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Before we startCourse name: BioinformaticsInstructor: Bin Liu, PhD, Associate Professor Office hours: by appointment, Office: C303B;Evaluation: attendance and presentation

(30%); projects and report (30%); examination (40%)

Class hours: 32; Credits: 2Object: students for master degrees of

Computer Science and related majors.Note : Biology background is not required.

Page 3: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Before we start: Under the dome

Page 4: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Why should we study this course?To understand ourselves

Most of the biologists don’t know computer science. Most computer scientists don’t know biology.

For studyVery easy to find a position in top universities

in the world.For jobs

Jobs in academicJobs in industry

Page 5: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

References not limited to

Carlos Setubal, Joao Meidanis, Introduction to Computational Molecular Biology

Dan E. Krance and Michael L. Raymer, Fundamental Concepts of Bioinformatics

Marketa Zvelebil, Jeremy O. Baum. Understanding bioinformatics

Page 6: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

DefinitionsBiology easily has 500 years of exciting problems to work on.

-- Donald E. Knuth (高德纳 ), Professor Emeritus of The Art of Computer Programming at Stanford University

Names:1 Bioinformatics: an interdisciplinary field that develops and

improves on methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to develop software tools to generate useful biological knowledge.

2. Computational Biology: involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems

Participants in fields:1. Computer Science: (1)algorithm; (2) AI; (3) database2. Biological Science 3. Mathematics

Page 7: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:
Page 8: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Definition in 百度百科生物信息学( Bioinformatics)是在生命科学的研究中,以计算机为工具对生物信息进行储存、检索和分析的科学。它是当今生命科学和自然科学的重大前沿领域之一,同时也将是21 世纪自然科学的核心领域之一。

Page 9: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

History of bioinformatics

Dr Hua A. Lim created the word “Bioinformatics” in 1987.

Page 10: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

History of bioinformatics

Page 11: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

1950s, the first periodA=T , G=C in DNA were discovered in

1949Pauling and Corey discovered the α and β

structures of protein sequences in 1951Watson and Crick proposed the DNA

structure in 1953The first bioinformatics meeting was help in

USA, 1956”

History of Bioinformatics

Page 12: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

History of bioinformatics

1960s, 1970s, second period .The basic concept of bioinformatics : sequence

comparison. Margret Dayhoff

Collecting protein family data,In 1970s, PAM(Percent Accepted Mutation

matrices) was proposed 。 Needleman & Wunsch : In 1970 , sequence

comparison algorithm 。

Page 13: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

History of bioinformatics

1980s.

EMBL, Genbank, DDBJ

Smith & Waterman ( algorithm of local

alignments )Pearson &Lipman FASTA tool.

Page 14: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

History of bioinformatics

1990s

Human Genome Project, HGPOther genome projects ( Gemone

projects ) : Mus. Musculus (家鼠) , C.elegans ( 线虫) ,, …

Lipman developed the BLAST tool and later PSI-BLAST.

Page 15: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Protein sequence

Protein structure

Unbalanced

1985 1990 1995 2000 2005 20100

100k

200k

300k

400k

No of protein sequence

dat abase updat e dat e

PDB Swiss-Prot

The growth rate comparison between protein sequence and structure data

History of Bioinformatics

Page 16: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

preface

Bioinformatics Biologists: creators and ultimate users of the data Scientists from mathematics and computer science: sheer

size and complexity of the data. Techniques

Databases: new database models to record changes Pattern recognition: to understand molecular

sequences, AI, machine learning, etc. Algorithms Internet

Page 17: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

preface

Can Biology Help Computing?Computational techniques inspired by

biology:Neural network (artificial intelligence)Genetic algorithm

A new driver of computer science:Better hardware (supercomputers)New data representation New driver for algorithm development

Develop new theoretical framework:DNA computing ant colony algorithm (communication between

ants)

Page 18: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

preface

Develop new theoretical framework:Ant colony algorithm (communication between

ants)

蚁穴 蚁穴

Page 19: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

preface

This course:To present a representative sample of bioinformatics

problems in biologyEfficient algorithms: for above problems

algorithmsDefinition: a step-by-step procedure that tries to

solve a certain well-defined problem in a limited time bound

Efficient algorithms: they should not take “too long” to solve a problem, even a large one. E.g., sequence comparison ⇒Chapter 2

Page 20: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Why does computation work? The digital computer

• Analog signals get degraded over time• Digital information can be propagated unaltered• The cell is mixture of analog and digital components

The digital molecules of life• DNA: inherit genetic information across generations• RNA: message temporary information within the cell• Protein: execute molecular processes as dictated in

code Properties of each molecule tailored to its role

• DNA: Highly stable, protected, self-complementary• RNA: Quickly degraded, single-stranded, mobile• Protein: Versatile code (nX20), complex 3D structure

Page 21: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Bioinformatics in China The research started at the early time

point Start in the end of 1960s The first bioinformatics center was

established in Peking university life science department in 1996

Page 22: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Bioinformatics websites

Page 23: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

National Center for Biotechnology Information ( NCBI ) http://www.ncbi.nlm.nih.gov/

Databases, bioinformatics tools and software.

Page 24: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

European Bioinformatics Institute (EBI)http://www.ebi.ac.uk/

Page 25: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

DDBJ (DNA Bank of Japan) :http://www.ddbj.nig.ac.jp/

Page 26: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Sanger : http://www.sanger.ac.uk

Tools

Page 27: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

http://www.isb-sib.ch/

Page 28: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Peking University Center for Bioinformatics : http://www.cbi.pku.edu.cn

是 EMBnet 和亚太生物信息网络( APBioNet )的中国节点。

Page 29: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

上海生命科学研究院生物信息中心:http://www.biosino.org/

Page 30: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

香港中文大学生物信息中心( HKBIC ):http://www.hkbic.bch.cuhk.edu.hk/

Page 31: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

台湾分子信息中心:

http://bioinfo.life.nctu.edu.tw/index.php

Page 32: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

http://www.chgc.sh.cn/

Page 33: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

http://www.genomics.cn/index

http://www.genomics.cn/index

Page 34: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

http://emuch.net/ (小木虫 )

http://www.dxy.cn/ (丁香园 )

http://www.bioon.com/ (生物谷 )

http://www.bio-soft.net (生物软件 )

Useful web sites

Page 35: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Course overview

Chapter 1 fundamental concepts from biology:basic structure and function of proteins and nucleic acidsmechanisms of molecular geneticsmost important laboratory techniques for studying the

genome of organismsan overview of existing sequence databases.

Chapter 2 strings: the most important mathematical objects used in

the course. Medical Literature retrieval.

Page 36: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Course overview

Chapter 3 sequence comparison two-sequence problem: classic dynamic programming algorithm more general cases of the problem: extensions of algorithm:

multiple-sequence comparison problemprograms used in database searchessome other miscellaneous issues

Chapter 4 phylogenetic tree Proteins and nucleic acids also evolve through the ages: an

important tool ⇒phylogenetic tree help understand protein function some of the mathematical problems related to phylogenetic tree

reconstruction simple algorithms: for certain special cases

Page 37: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Course overview

Chapter 5 genome rearrangements An important new field: some organisms are genetically different, not

so much at the sequence level, but in the order in which large similar chunks of their DNA appear in their respective genomes

mathematical modelsChapter 6 molecule's structure prediction

methods that try to predict a molecule's structure based on its primary sequence

RNA structure prediction: dynamic programming algorithms protein structure prediction:

•difficulties •protein threading: attempts to align a a protein sequence with a known structure

Page 38: Bin Liu ( 刘滨 ), PhD, Associate Professor Intelligent Computing Research Center Homepage:  Email: bliu@insun.hit.edu.cnEmail:

Course overview

Chapter 7 Data Driven Machine Learning Approaches for Bioinformatics


Recommended