+ All Categories
Home > Documents > Tutorial 7

Tutorial 7

Date post: 06-Jan-2016
Category:
Upload: ide
View: 40 times
Download: 0 times
Share this document with a friend
Description:
Tutorial 7. Gene expression analysis. Gene expression analysis. How to interpret an expression matrix Expression data DBs - GEO General clustering methods Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering - EPCLUST - PowerPoint PPT Presentation
53
Tutorial 7 Gene expression analysis 1
Transcript
Page 1: Tutorial 7

Tutorial 7

Gene expression analysis

1

Page 2: Tutorial 7

Gene expression analysis• How to interpret an expression matrix

• Expression data DBs - GEO

• General clustering methods Unsupervised Clustering

• Hierarchical clustering• K-means clustering

• Tools for clustering - EPCLUST

• Functional analysis - Go annotation2

Page 3: Tutorial 7

Gene expression data sources

3

Microarrays RNA-seq experiments

Page 4: Tutorial 7

How to interpret an expression data matrix

• Each column represents all the gene expression levels from:– In two-color array: from a single experiment.– In one-color array: from a single sample.

• Each row represents the expression of a gene across all experiments.

Exp1 /Sample 1

Exp2 /Sample 2

Exp3 /Sample 3

Exp4 /Sample 4

Exp5 /Sample 5

Exp6 /Sample 6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

4

Page 5: Tutorial 7

How to interpret an expression data matrix

Each element is a log ratio: • In two-color array: log2 (T/R).

T - the gene expression level in the testing sample R - the gene expression level in the reference sample • In one-color array: log2(X) X - the gene expression level in the current sample

Exp1 /Sample 1

Exp2 /Sample 2

Exp3 /Sample 3

Exp4 /Sample 4

Exp5 /Sample 5

Exp6 /Sample 6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

5

Page 6: Tutorial 7

How to interpret an expression data matrix

6

In two-color array: Scale

Red indicates a positive log ratio: T>R

Black indicates a log ratio of zero: T=~R

Green indicates a positive log ratio: T>R

Samp 1 Samp 2 Samp 3 Samp 4 Samp 5 Samp 6

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

ScaleIn one-color array:

Bright green indicates a high expression value

Black indicates no expression

Expr.1 Expr.2 Expr.3 Expr.4 Expr.5 Expr.6

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Page 7: Tutorial 7

Exp

Log

ratio

Exp

Log

ratio

Microarray Data:Different representations

T<R

T>R

7

Page 8: Tutorial 7

8

How to analyze gene expression data

Page 9: Tutorial 7

9

Expression profiles DBs

• GEO (Gene Expression Omnibus)http://www.ncbi.nlm.nih.gov/geo/

• Human genome browserhttp://genome.ucsc.edu/

• ArrayExpresshttp://www.ebi.ac.uk/arrayexpress/

Page 10: Tutorial 7

10

The current rate of submission and processing is over 10,000 Samples per month.

In 2002 Nature journals announce requirement for microarray data deposit to public databases.

Page 11: Tutorial 7

11

Searching for expression profiles in the GEOhttp://www.ncbi.nlm.nih.gov/geo/

Page 12: Tutorial 7

GEO accession IDs

GPL**** - platform IDGSM**** - sample IDGSE**** - series IDGDS**** - dataset ID

•A Series record denes a set of related Samples considered to be part of a group.•A GDS record represents a collection of biologically and statistically comparable GEO samples. Not every experiment has a GDS.

12

Page 13: Tutorial 7

Download dataset

Clustering

Statistic analysis 13

Page 14: Tutorial 7

Clustering analysis

14

Page 15: Tutorial 7

Clustering analysis – zoom in

15

Page 16: Tutorial 7

16

Clustering analysis – zoom in

Page 17: Tutorial 7

17

Page 18: Tutorial 7

Viewing the expression levels

18

Page 19: Tutorial 7

19

Viewing the expression levels

Page 20: Tutorial 7

20

Page 21: Tutorial 7

ClusteringGrouping together “similar” genes

21

Page 22: Tutorial 7

Clustering• Unsupervised learning: The classes are

unknown a priori and need to be “discovered” from the data.

• Supervised learning: The classes are predefined and the task is to understand the basis for the classification from a set of labeled objects. This information is then used to classify future observations.

22http://www.bioconductor.org/help/course-materials/2002/Seattle02/Cluster/cluster.pdf

Page 23: Tutorial 7

Unsupervised Clustering

• Hierarchical methods - These methods provide a hierarchy of clusters, from the smallest, where all objects are in one cluster, through to the largest set, where each observation is in its own cluster.

• Partitioning methods - These usually require the specification of the number of clusters. Then a mechanism for apportioning objects to clusters must be determined.

23http://www.bioconductor.org/help/course-materials/2002/Seattle02/Cluster/cluster.pdf

Page 24: Tutorial 7

This clustering method is based on distances between expression profiles of different genes. Genes with similar expression patterns are grouped together.

24

Hierarchical Clustering

Page 25: Tutorial 7

25

• In both phylogenetic trees and in clustering we create a tree based on distances matrix.

• When computing phylogenetic trees:We compute distances between sequences.• When computing clustering dendograms we

compute distances between expression values.

ATCTGTCCGCTCGATGTGTGCGCTTG

Expr.1 Expr.2 Expr.3 Expr.4 Expr.5 Expr.6

Gene 1

Gene 2

Rings a bell?...

Score Score

Page 26: Tutorial 7

How to determine the similarity between two genes?

Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, 1499 - 1501 (2005) , http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html

26

Page 27: Tutorial 7

27

Hierarchical clustering methods produce a tree or a dendrogram.They avoid specifying how many clusters are appropriate by providing a partition for each K. The partitions are obtained from cutting the tree at different levels.

2 clusters

4 clusters6 clusters

Page 28: Tutorial 7

28

The more clusters you want the higher the similarity is within each cluster.

http://discoveryexhibition.org/pmwiki.php/Entries/Seo2009

Page 29: Tutorial 7

Hierarchical clustering results

29http://www.spandidos-publications.com/10.3892/ijo.2012.1644

Page 30: Tutorial 7

An algorithm to classify the data into K number of groups.

30

K=4

Unsupervised Clustering – K-means clustering

Page 31: Tutorial 7

How does it work?

31

The algorithm iteratively divides the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

1 2 3 4

k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

k clusters are created by associating every observation with the nearest mean

The centroid of each of the k clusters becomes the new means.

Steps 2 and 3 are repeated until convergence has been reached.

Page 32: Tutorial 7

32

How should we determine K?

• Trial and error• Take K as square root of gene number

Page 33: Tutorial 7

33

http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

Tool for clustering - EPclust

Page 34: Tutorial 7

34

Page 35: Tutorial 7

35

Choose distance metricChoose algorithm

Page 36: Tutorial 7

36

Hierarchical clustering

Page 37: Tutorial 7

37

Zoom in by clicking on the nodes

Page 38: Tutorial 7

38

Page 39: Tutorial 7

39

K-means clustering

K-means clustering

Page 40: Tutorial 7

Graphical representation of the

cluster

Graphical representation of the

cluster

Samples found in cluster

40

Page 41: Tutorial 7

10 clusters, as requested

41

Page 42: Tutorial 7

Now that we have clusters – we want to know what is the function of each group.

There is a need for some kind of generalization for gene functions.

42

Now what?

Page 43: Tutorial 7

Gene Ontology (GO)http://www.geneontology.org/

The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:

Page 44: Tutorial 7

44

Cellular Component (CC) - the parts of a cell or its extracellular environment.

Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis.

Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.

Gene Ontology (GO)

Page 45: Tutorial 7

The GO tree

Page 46: Tutorial 7

GO sources

ISS Inferred from Sequence/Structural SimilarityIDA Inferred from Direct AssayIPI Inferred from Physical InteractionTAS Traceable Author StatementNAS Non-traceable Author StatementIMP Inferred from Mutant PhenotypeIGI Inferred from Genetic InteractionIEP Inferred from Expression PatternIC Inferred by CuratorND No Data availableIEA Inferred from electronic annotation

Page 47: Tutorial 7
Page 48: Tutorial 7

DAVID

Functional Annotation Bioinformatics Microarray Analysis

 

• Identify enriched biological themes, particularly GO terms• Discover enriched functional-related gene/protein groups• Cluster redundant annotation terms• Explore gene names in batch

http://david.abcc.ncifcrf.gov/

Page 49: Tutorial 7

ID conversion

annotation

classification

Page 50: Tutorial 7

Functional annotationUpload

Genes from your list

involved in this category

Charts for each

category

Charts for each

category

Charts for each

category

Page 51: Tutorial 7

Minimum number of genes for

corresponding term

Maximum EASE score/ E-value

Genes from your list

involved in this category

Genes from your list

involved in this category

E-ValueEnriched terms associated with

your genesSource of term

Page 52: Tutorial 7

52

A group of terms having similar biological meaning due to sharing similar gene members

Page 53: Tutorial 7

Gene expression analysis• How to interpret an expression matrix

• Expression data DBs - GEO

• General clustering methods Unsupervised Clustering

• Hierarchical clustering• K-means clustering

• Tools for clustering - EPCLUST

• Functional analysis - Go annotation53


Recommended