+ All Categories
Home > Documents > 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

Date post: 19-Dec-2015
Category:
Upload: randolph-todd
View: 220 times
Download: 1 times
Share this document with a friend
Popular Tags:
16
06/27/22 Changhui (Charles) Yan 1 Gene Finding Changhui (Charles) Yan
Transcript
Page 1: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 1

Gene Finding

Changhui (Charles) Yan

Page 2: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 2

Gene Finding

Genomes of many organisms have been sequenced

Page 3: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 3

Genome

Page 4: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 4

Completely Sequenced Genomes

Page 5: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 5

Gene Finding

More than 60 eukaryotic genome sequencing projects are underway

Page 6: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 6

Human Genome Project (HGP)

To determine the sequences of the 3 billion bases that make up human DNA 99% human DNA sequence finished to 99.99%

accuracy (April 2003) To identify the approximate 100,000 genes

in human DNA (The estimates has been changed to 20,000-25,000 by Oct 2004) 15,000 full-length human genes identified

(March 2003) To store this information in databases To develop tools for data analysis

Page 7: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 7

Gene Finding

Genomes of many organisms have been sequenced

We need to decipher the raw sequences Where are the genes? What do they encode? How the genes are regulated?

Page 8: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 8

Gene Finding

Homology-based methods, also called `extrinsic methods‘ It seems that only approximately half of

the genes can be found by homology to other known genes (although this percentage is of course increasing as more genomes get sequenced).

Gene prediction methods or `intrinsic methods‘ (http://www.nslij-genetics.org/gene/)

Page 9: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 9

Machine Learning Approach Split data into a training set and a test set Use the training set to train a classifier Test the classifier on test set The classifier then can be applied to novel data

Training data

Machine Learning algorithm

Classifier

Test data

Evaluation of classifier

Novel data

Prediction

Page 10: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 10

Data, examples, classes, classifier

ccgctttttgccagcataacggtgtcga, 1accacgttttttgccagcatttgccagca, 0atcatcacgatcacgaacatcaccacg, 0…

Page 11: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 11

N-fold cross-validation

Training Set Test Set

Round 1

Round 2

Round 3

3-fold cross-validationE.Coli K12 Genome4,639,675

Page 12: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 12

Machine Learning Approach

Training data

Machine Learning algorithm

Classifier

Test data

Evaluation of classifier

Novel data

Prediction

Page 13: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 13

Gene-finders

Page 14: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 14

Prokaryotes vs. Eukaryotes Prokaryotes are organisms

without a cell nucleus. Most prokaryotes are bacteria. Prokaryotes can be divided into

Bacteria and Archaeabacteria. Eukaryotes are organisms which

a membrane-bound nucleus.

Page 15: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 15

Prokaryotes vs. Eukaryotes

Prokaryotes’ genomes are relatively simple: coding region (genes) vs. non-coding region.

Eukaryotes’ genomes are complicated.

Page 16: 6/10/2015Changhui (Charles) Yan1 Gene Finding Changhui (Charles) Yan.

04/18/23 Changhui (Charles) Yan 16

Eukaryotic genes


Recommended