Post on 21-Dec-2015
transcript
RedundancyMinerA novel method of clustering
in genomic studies
Barry Zeeberg, NCI
Hongfang Liu, NCI and GU
Gene Ontology (GO) AmiGO browserHierarchical organization of categories
and mapped genes
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Typical HTGM resultclustered image map (CIM)
QuickTime™ and a decompressor
are needed to see this picture.
Redundancy problem• Because of the hierarchical nature of GO structure, parent-child categories may contain
partially redundant gene mappings• This can “inflate” the number of categories in the CIM• Thus obscure the core information content in the CIM• The redundancy itself can be studied to look at fine detail nuanced associations of category
clusters
QuickTime™ and a decompressor
are needed to see this picture.
RedundancyMiner (RM) is an attempt to solve that problem
• Remove the redundancy from the CIM– Redundancy cause the CIM to be inflated by
e.g. 3-fold
• Place the redundancy into a META CIM– Study the redundancy as a nuanced themes of
association of groups of GO categories
RM paradigm
• Similarity metric is probabilistic value based on the number of genes mapped in common to two GO categories
• Groups in the META CIM follow a “complete linkage” criterion for a selected threshold of p value
RM overcomes two problems of traditional hierarchical clustering
• All objects are put into one cluster or another, even if the object truly is an outlier
• Each object can appear in only one cluster, even though it may be related to several clusters
Additional examplegene expression in NCI-60 cell lines
• NCI-60 is set of 60 well-studied cancer cell lines
• Composed of around 5 or 6 each of around 8 or 9 different cancer types
Problem
• Full CIM of 60 cell lines x 20,000 gene expression values is too dense to allow meaningful viewing
• Solution is to select sub-portion of CIM based on RM analysis
NCI-60 META CIM based on correlation threshold = 0.20
QuickTime™ and a decompressor
are needed to see this picture.
Sub-CIM of highest correlating genes from group 33
QuickTime™ and a decompressor
are needed to see this picture.
Gene expression values are adjusted z-scores
Red = positive z score
Green = negative z score
Sub-CIM of highest correlating genes from group 32
QuickTime™ and a decompressor
are needed to see this picture.