+ All Categories
Home > Documents > Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The...

Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The...

Date post: 14-Dec-2015
Category:
Upload: scott-chandler
View: 215 times
Download: 2 times
Share this document with a friend
Popular Tags:
54
Cell State1 State2 State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotide s retinoids Gene Expression 200 3
Transcript
Page 1: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

CellState1

State2 State iInput Response

State: genes, proteins, metabolites, ions……

The Parts-List Problem

•proteins

•peptides

•amino acids

•nucleotides

•retinoids

Gene Expression

2003

Page 2: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Mouse Genome

Cell-Specific

Genes

Signaling Proteins

Molecule Pages

Cell-Specific

Gene Products

Cell-Specific

Gene Products

Invoked by Input

Signaling Proteins

Invoked in Input-

Specific Response

The Parts-List: AfCS Strategy

2003

Page 3: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Annotation Pipeline: Brian Saunders

• B-Cell Gene List from Agilent Array Data: Dennis Mock

• B-Cell Gene List from Affy Array Data: Eugene Ke, Chris Benner

• AfCS Protein List: Several People at UTSWMED, DUKE, etc.

The Parts-List Problem

Page 4: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Array contains an aggregate of Riken (Fantom and non-Fantom), NIA, Research Genetics, and Genome Systems clones

• Provided with clone ID as basis for analysis – no resequencing was performed

• Sequence information from clone ID– Full-length cDNA

• Genbank• Non-Genbank (from Fantom or NIA databases)

– 3’ and 5’ ESTs• Genbank ID• Non-Genbank (NIA database)

2003

Agilent cDNA Microarray

Page 5: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

2003

Agilent cDNA Microarray Details

Clone Type Unique Total

Riken 14828 14890

NIA 447 723

Research Genetics 155 155

Genome Systems 64 64

Total 15494 15832

Clone type distribution

Page 6: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

2003

Agilent cDNA Microarray Details

Riken Fantom cDNA (Genbank) 13982

Riken Fantom cDNA (non-Genbank)

604

NIA cDNA (non-Genbank) 270

No cDNA (EST only) 728

Total 15494

Sequence type

Page 7: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Sequences(Genbank

Accessions)

MGILocusLink Unigene

GeneOntology

(GO)

InterProAFCS

Proteins

Merge withAFCS

MoleculePage

LocusLinkAnnotation

2003

Annotation Procedure

Ensembl

ProteinRecords

Clone ID(NIA orFantom

Databases)

Chromosome

Reference

Blast

Blast or

reference

Page 8: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Annotation Procedure• Choose representative sequence if possible• Choose representative gene

– LocusLink or MGI membership– BlastN against database of all nucleotides in

LocusLink and MGI – use all sequences if no representative has been chosen

– Unigene membership

• Choose representative sequence if necessary– Sequence used to choose representative gene– Sequence length if gene-selection method fails

2003

Annotation Procedure (contd..)

Page 9: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• If gene must determined by BlastN and more than one gene matches above a given threshold (around 30% identity), Unigene agreement is used to choose the “best” gene.

• If no gene matches above threshold, then Unigene is used to choose best gene

• If no genes match the above criteria, the top Blast hit regardless of threshold is chosen

2003

Annotation: Choice of Representative Gene

Page 10: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Build database of (potentially) related genes– LocusLink MGI– LocusLink Unigene– MGI Unigene– Fantom MGI– NIA LocusLink/MGI– Top Blast hit (when above threshold, and not the

representative gene)

• Challenges– Outdated data sources (especially Fantom)– Incorrect annotation or errors– ESTs clustering to different Unigene IDs

2003

Further Gene Annotation: Relationships

Page 11: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Ensembl– LocusLink

– Blast (top hits above threshold)

• Protein Database Records– MGI

– LocusLink

• Chromosome– LocusLink

– MGI

– Unigene

2003

Other Annotations

Page 12: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Gene Ontology– MGI– LocusLink– Fantom

• InterPro– MGI– Protein database records (Swissprot/Trembl)

• AfCS– LocusLink merge with Molecule Page annotation

• Other miscellaneous gene annotation– Fantom– NIA

2003

Other Annotations

Page 13: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

2003

Annotation Schema

Page 14: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

2003

Viewing and Searching Gene Annotations

Page 15: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

2003

Viewing and Searching Annotations

Go to “data searches” from the “data center” page

Page 16: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

2003

Querying Annotation

Page 17: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

2003

Choosing Query Responses of Annotation

Annotation of

4931440G06

Page 18: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• H3091H05– Non-Genbank cDNA used for representative

sequence

• H3150F07– No cDNA available, 3’ EST used for representative

sequence (also an example of multiple Ensembl transcripts)

• 0610007B22– Example of more than one potential gene for a clone

2003

Other Annotation Examples

Page 19: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Of the 15,494 unique clones– 10,734 unique “genes”

• 4,576 have meaningful gene symbols• 6,156 have “Riken” gene symbols (e.g. 8030469F12Rik)

– Some of those are homologs to known human genes» Example: 1810037O03

– 2,172 map to multiple LocusLink IDs• Potential for “incorrect” gene choice

– example: 0610005A07

– 2,116 matches to AfCS• 1,490 unique AfCS IDs

2003

Agilent cDNA Microarray Annotation Summary

Page 20: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Of the 15,494 unique clones– Over 7800 have GO and InterPro annotation

– 13,243 are matched to at least one Ensembl gene• 1,323 clones have multiple Ensembl matches

• 3,574 clones match genes with multiple transcripts– example: H3150F07

• 9,706 unique Ensembl genes in all– 2,071 of the unique Ensembl genes have transcript variants

2003

Agilent cDNA Microarray Annotation Summary

Page 21: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• There is a wide disparity between predicted transcript variants depending on which database one uses (which makes sense, since they use different draft genomes and different gene prediction programs)

• Within a gene, the databases may present a different number of potential variants, with little overlap between the databases

• Variants grouped as one gene in one database may be grouped into multiple genes in another database

2003

Splice Variants: Database Disparities

Page 22: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Example: GNAS– Ensembl

• 6 transcripts in one record

– LocusLink (NCBI Evidence)• 4 transcripts split across 2 records, with 1 transcript not aligning

with the draft contig

– Only one translation (the “main” gene) is shared between the two sources

• Take-home message: need to pick one (the best if we are lucky) reference– Ensembl seems to have the most available features in a

digestible form

2003

Splice Variants: Database Disparities

Page 23: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Types of classification

• Domain or motif– Need to consider specific regions before assigning

attributes to an entire class– Automatic class assignment should be relatively safe

• Sequence identity clustering– No notion of function, but for high identity should

give a conservative class prediciton– Results cannot be entirely automated; cutoffs that

are used for one class of proteins might not be strict enough for another class of proteins

2003

AfCS Protein Classification

Page 24: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Genes that are expressed in untreated and statistically differentially at least in one other treatment (at 4 hours)

• This method is not based on ratios or intensity levels

• Reduces false positive predictions (e.g. hemoglobin gene is not picked up!)

• Provides a conservative estimate of B-cell gene parts list

2003

Which genes are expressed in B-Cells?

Page 25: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Note: the ligand cluster according early –late conditions with 90-100% accuracy

(metrics: sample = Euclidean; gene = Pearson)

.

.

.

.

.

.

.

.

.

late 2-4 hrearly .5-1 hr

0 hr early .5-1 hr

(non-mitogenic)

late 2-4 hr

mitogenic

Interleukins

2003

Two-way hierarchical clustering :Unsupervised n=33 (0.5, 1,2,4 hrs)

Page 26: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

(R. Tibshirani, G. Chu 2002)

Objective: The replicated expression for each gene is taken for the 4hr time condition (untreated vs ligand) to determine whether the gene is statistically

differentially up- or down- regulated.

The t-statistics for all the genes are ordered and noted. The labels are then permutated and the t-statistic is calculated again. After many iterations, the cumulative t-statistics is averaged for each gene. Finally, for a given false positive rate, [called “False Discovery Rate” or FDR], the significant genes are selected.

For each gene, define the adjusted “t-statistic” as follows:

treated - untreated

+ adjustment factor

mean of replicates

standard deviation for the gene

2003

Significance Analysis of Microarrays (SAM) Method

Page 27: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Differentially expressed genes for ligands vs UNTREATED @ 4hr [ SAM ; False Discovery Rate ( ) ]

ligand (4hr)

40L

(1%

)

LPS

(1%

)

AIG

(1%

)

IL4

(1%

)

CP

G (

1%)

IFB

(1.

5%)

GR

H (

1%)

2MA

(18

%)

LPA

(17

%)

CG

S (

2.9%

)

BO

M (

35%

)

IGF

(8%

)

S1P

(38

%)

PA

F (

2.4%

)

70L

(6%

)

NP

Y (

10%

)

DIM

(9%

)

LB4

(23%

)

M3A

(3.

5%)

FM

L (1

1%)

TG

F (

2.5%

)

TE

R (

35%

)

IL10

(20

%)

ELC

(26

%)

PG

E (

11%

)

BA

FF

(11

%)

BLC

(57

%)

NG

F (

42%

)

TN

F (

33%

)

SD

F (

20%

)

IFG

(25

%)

NE

B (

25%

)

SLC

(N

A)

num

ber

of g

enes

(pr

obes

)di

ffer

entia

lly e

xpre

ssed

0

50

100

150

200

500

600

700

800

900

1000

1100

down-regulated up-regulated

2003

Differentially Expressed Genes: Ligands 4 hr vs. Untreated

Page 28: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

“mitogenic” ligands FDR = 1%

FDR = 35%FDR = 18%

FDR = 1%- 3%

Two-way dendrogram using significantly expressed genes (4 hrs)

Page 29: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

number of ligands

0 1 2 3 4 5 6 7 8 9 10 11 12

num

ber

of g

enes

0

500

1000

1500

2000

8000

10000

12000

14000

Number of genes that are significantly different than UNTREATED in as many ligands at 4hr

genes that were not significantly differentially expressed in any of the 33 ligands at 4hr

D. Fambrough, K. McClure, A. Kazlauskas, and E.S. Lander (1999). Diverse signaling pathways activated by growth factor receptors induce broadly overlapping, rather than independent sets of genes. Cell 97: 727-741

2003

Expressed Genes: Significant vs. non-significant

Page 30: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

metabolism

cell growth and /or maintenace

cell communication

response to external stimulus

cell death

cell differentiation

cell motility

morphogenesis

response to stress

2003

149 cell communication 36 cell death 13 cell differentiation

369 cell growth and /or maintenace 13 cell motility

1 digestion 1 embryonic development

491 metabolism 11 morphogenesis

1 reproduction 128 response to external stimulus

9 response to stress

A Conservative Gene Parts-List

Page 31: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• Create a splice-variant gene DB using ENSEMBL• Identify 60-80 mer oligo sequences that are splice

specific using the criteria– Appropriate GC content– Appropriate melting temperature– Appropriate 5’ and 3’ ends

• For exons that are small use extended window method to obtain 60-80 mer sequences

• Validate mouse-specific oligos against human genome sequences.

• Explore motif-specific oligos where splice variation is known but exon sequences are as yet undetermined.

2003

Design of Splice-Specific Oligo Arrays

Page 32: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Mouse Genome

Cell-Specific

Genes

Signaling Proteins

Molecule Pages

Cell-Specific

Gene Products

Cell-Specific

Gene Products

Invoked by Input

Signaling Proteins

Invoked in Input-

Specific Response

The Parts-List: AfCS Strategy

2003

Page 33: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Sequence Identity Clustering

• Pairwise BlastP (no filtering), with a homology ratio defined by raw score divided by the self-Blast raw score of the shorter sequence

• Single-linkage clustering on homology– 3347 AfCS proteins– 2913 cluster with a homology of 0.1 or better– 2308 with a homology of 0.3 or better

2003

AfCS Protein Classification

Page 34: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

2003

Example Tree from Identity Clustering

Page 35: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

The AfCS Molecule Pages

2002

A Comprehensive Expert-Curated Resource For Signaling Proteins

Page 36: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

What Are The Molecule Pages?

2002

• The “AfCS Molecule Pages” is a website containing comprehensive information about selected signaling proteins• Each protein has a dedicated “Molecule Page” – the public’s one stop shop for everything pertinent to that protein• A “Molecule Page” is continuously updated with data from an expert author, and with automated data obtained from the public databases• Each published update becomes an official “Molecule Page Version”

Page 37: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• The author is responsible for entering information about– AfCS protein’s functional states – Interactions of their protein with other

proteins, and small molecules– Mutations of the protein, and their

consequences and/or phenotypes– Relevant experimental information

Molecule Pages – Author-Entered Data

2002

Page 38: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

• “Automated Data” for each protein is provided to both the public and the authors, and can be referenced by the author when entering their own data

• The types of automated data available will:– Summaries of and links to external database records

that correspond to, or are related to, the author’s protein

• (e.g., Genbank, SwissProt and PDB records)

Molecule Pages – Automated Data

2002

Page 39: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Molecule Pages

2002

RG RG*T RGD RG

GA G*AT GAD GA

G G*T GD G

RGA RG*AT RGAD RGA

GTP GDP

GTP

GTP

GTP

GDP

GDP

GDP

T2 P2 D2

P1 D1T1

T3 P3 D3

T4 P4 D4

A1

A4

A2 A3

A5 A6R1

R4

R2 R3

R5 R6

Page 40: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

OR

Page 41: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.
Page 42: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.
Page 43: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.
Page 44: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

                 

 

                                                          

Mini molecule page documentation for AfCS protein A002002Rac2

                                   

 

  

                                                                                                                                               

  

  

 

                

 

 

 

HOME | SIGNALING UPDATE | MOLECULE PAGES | DATA CENTER | ABOUT USregistration | e-alert | help | contact us | site guide | search

Permitted Use of Material

Privacy Policy

  

                                                                                            

                                 introduction browse protein list search molecule pages author application signaling maps

   Protein A002002

   Overview

   Database Links

   Protein Family

   Domains & Motifs

   Protein Structure

   Gene Info

   Orthologs & Paralogs

   Blast Data

   Mini Molecule Page

AfCS Protein ID A002002

Protein Name Rac2

Protein SynonymsEN-7; RAS-related C3 botulinum substrate 2; Rac2; RacB; Ras-related C3 botulinum toxin substrate 2; p21-Rac2

Author Gary M Bokoch

Co-Authors -

Protein FunctionRac2 is a member of the Rho family of small GTPases. Rac2 regulates several cellular functions by cycling between its inactive GDP-bound state (Rac-GDP) and its active GTP-bound state (Rac2-GTP).

Protein Regulation

The activity of Rac2 is controlled by several regulators. In its inacative state, Rac2-GDP is bound to RhoGDI (GDP dissociation inhibitor). The signal(s) that leads to dissociation of the Rac2-GDP-GDI complexs still remains to be determined. The exchange of GDP for GTP of Rho GTPases is regulated by a group of over 30 proteins collectively called GEFs (guanine nucleotide exchange factors). The intrinsic rate of GTP hydrolysis by Rho GTPases is controlled by another group of proteins known as GAPs (GTPase activating proteins). Rac2 is post-tranlationally processed at its carboxyl terminal CAAX motif with a geranylgeranyl lipid modification that allows it to bind to membranes. RhoGDI, however, appears to sequester Rac and prevents Rac from interacting with membranes.

Concentration Regulation Unknown

Subcellular LocalizationThe Rac2-RhoGDI complex is located in the cytoplasm. Upon receiving a stimulatory signal, Rac2 is released from RhoGDI and is regulated by RhoGDI that prevents the interaction of Rac with membranes.

Phenotypes

Neutrophils of Rac2-deficient mice displayed significant defects in chemotaxis, in shear-dependent-L- selectin-mediated capture on the endothelial substrate Glycam-1, F-actin generation, p38 and p42/44 MAP kinase activation induced by chemoattractants. Superoxide generation by bone marrow neutrophils was significantly reduced, but it was normal in activated peritoneal exudate neutrophils. These defects were reflected in vivo by baseline neutrophilia, reduced inflammatory peritoneal exudate formation, and increased mortality when challenged with Aspergillus fumigatus.

Splice Variants unknown

Mouse Gene Symbol Rac2

Genbank Accession 6679600

Genbank Organism Mouse

Major Sites of Expression T-cells, B-lymphocytes, hematopoietic cells

Cardiac Myocyte Expression

no (-)

B Lymphocyte Expression yes (Dorseuil,O. et al. (1992) J. Biol. Chem. 267:20540-20542)

Interactions•Ligands:GTP, GDP •Proteins:p67phox, p21-activated kinase (PAK), GAPs (Bcr, Abr),GEFs, smgGDS (small GTPase guanine nucleotide dissociation stimulator), RhoGDI, D4GDI

AntibodiesRabbit polyclonal available from Santa Cruz Biotechnology, Inc. Mouse monoclonal available from Upstate Biotechnology, Inc.

References

•Bokoch, G.M. (1995) Immunol. Res. 21:139-148 •Bishop, A. L. and Hall, A. (2000) Biochem. J. 348:241-255 •Roberts, A. W. et al. (1999) Imunity 10:183-196 •Williams, D. A. et al. (2000) Blood 96:1646-1654. •Scheffaek, K., Stephan, J., Jensen, O.n., Illenberger, D., and Geirshik, P. (2000) Nat. Struct. Biol. 7:122-126.

Page 45: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Determining Gene Parts List from Affymetrix Data

For Shankar Subramaniam

Eugene Ke

May 15th

2003

Page 46: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Differentially Expressed Genes

• A typical Affymetrix experiment consists of two microarrays– Control– Experiment

• Comparing two chips– Generates ratios– Generates a p-value

estimate• Emperically corrected

significance value

2003

Page 47: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Calling Significant Genes

• Affymetrix suggests a consensus scheme– Consider all aggregate

measures

– If greater than some percentage of arrays agree, call truly change significant

– All criteria are ultimately arbitrary

2003

Page 48: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Identifying Nondifferentially Expressed Genes

• Affymetrix returns signal values for each transcript– Estimate of transcript

number

• Quality controls are important to consider– Statistical measure that

signal is provably different than zero

– Need to adjust for multiple testing problem

2003

Page 49: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Generating a parts list

• Differentially expressed genes (D)– Significant genes

• Nondifferentially expressed genes (N)– Adjust detection p-values– Determine reasonable

threshold

• Apply union of sets– Take minimum of possible p-

values

• Determine reasonable cutoff

D N Parts List

Gene 1 0.001 0.03 0.001

Gene 2 0.4 0.3 0.3

Gene 3 0.46 0.08 0.08

Gene 4 0.61 .006 0.006

Gene 5 0.11 0.309 0.11

Gene 6 0.01 0.69 0.01

Gene 7 0.15 0.023 0.023

Gene 8 0.43 0.5 0.43

Gene 9 0.72 0.087 0.087

Gene 10 0.043 0.45 0.043

2003

Page 50: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Issues to Consider• How comparably are detection and change p-values?

– Detection p-value are “normal” p-values– Change p-values are “estimated” p-values

• How to adequately compensate for multiple testing problem?– Traditional methods much too stringent– Affymetrix applies some empirical approaches– What are other approaches?

• Replication– Yields better result– Apply joint probabilities

• Probability of appearing in all arrays• How to weight arrays?

– Each array uniquely effected by noise– Ideally, would have some method to weight “good” and “bad” arrays

• Ill-posed problem

2003

Page 51: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Comparing to other technologies

• How to relate probes from a Affymetrix chip to another chip?– Annotation– Sequence Comparison

• Annotation is fluid– Original probe sequences may not reflect current

realities– In-house annotation poses synchronization problems

• Sequence Comparison– What is considered a reasonable match?

2003

Page 52: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

References

• Detailed Statistical Algorithms white paper– http://www.affymetrix.com/support/technical/

whitepapers/sadd_whitepaper.pdf

• Affymetrix Probe Sequences– http://www.affymetrix.com/analysis/

download_center.affx

2003

Page 53: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Y2H Physical Data Model

0..*

0..*

0..*

0..*

0..*

0..*

0..*

0..*0..*

AFCS_PROT

AFCS_PIDPROT_NAMEPROT_SYNONYMSPROT_CATEGORY

VARCHAR2(12)VARCHAR2(200)VARCHAR2(2000)VARCHAR2(200)

<pk>

BAIT

BAIT_IDBAIT_AFCS_IDAFCS_PROT_VERSIONPROT_GINUCLEOTIDE_GIBAIT_SEQ_STARTBAIT_SEQ_ENDBAIT_SEQ

NUMBERVARCHAR2(12)NUMBER(2)VARCHAR2(12)VARCHAR2(12)NUMBERNUMBERVARCHAR2(2000)

<pk><fk>

BAIT_PREY

BAIT_IDLIBRARY_NAMEPREY_ID

NUMBERVARCHAR2(15)NUMBER(12)

<pk,fk1><pk,fk3><pk,fk2>

BAIT_STATUS

BAIT_STATUSDESCRIPTION

VARCHAR2(30)VARCHAR2(500)

<pk>

FILE_BAIT

BAIT_IDY2H_FILE_IDBAIT_STATUS

NUMBERNUMBERVARCHAR2(30)

<pk,fk1><pk,fk3><fk2>

PREY

PREY_IDPREY_AFCS_IDPREY_SEQ_IDPREY_NAMEPREY_TYPENUCLEOTIDE_GIPREY_SEQ_STARTPREY_SEQ_END

NUMBER(12)VARCHAR2(12)NUMBER(12)VARCHAR2(300)VARCHAR2(12)VARCHAR2(12)NUMBERNUMBER

<pk><fk1><fk2>

PREY_LIBRARY

LIBRARY_NAMEDESCRIPTION

VARCHAR2(15)VARCHAR2(500)

<pk>

PREY_SEQ

PREY_SEQ_IDPROT_GIPROT_SEQNUCLEOTIDE_SEQNOVEL_NUC_CHECKSUMNUC_SEQ_TYPENOVEL_SEQ_STARTNOVEL_SEQ_END

NUMBER(12)VARCHAR2(12)CLOBCLOBVARCHAR2(12)VARCHAR2(12)NUMBER(12)NUMBER(12)

<pk>

Y2H_FILE

Y2H_FILE_IDFILE_NAMEFILE_LOCATIONEXPT_DATEDATE_RECEIVEDDATE_INSERTED

NUMBERVARCHAR2(200)VARCHAR2(25)DATEDATEVARCHAR2(12)

<pk>

Molecule Page

Page 54: Cell State1 State2State i Input Response State: genes, proteins, metabolites, ions…… The Parts-List Problem proteins peptides amino acids nucleotides retinoids.

Y2H Database Views

AFCS_BAIT

AFCS_PIDPROTEIN_NAMEPROTEIN_SYNONYMSPROTEIN_CATEGORYAFCS_PROTEIN_VERSIONBAIT_IDBAIT_PROTEIN_GIBAIT_NUCLEOTIDE_GIBAIT_N_TERMINAL_STARTBAIT_C_TERMINAL_ENDBAIT_SEQUENCEY2H_FILE_IDBAIT_STATUS

BP_PROTEIN

AFCS_IDPROTEIN_NAMEPROTEIN_SYNONYMSPROTEIN_CATEGORY

B_PROTEIN

AFCS_IDPROTEIN_NAMEPROTEIN_SYNONYMSPROTEIN_CATEGORY

Y2H_INTERACTION

BAIT_AFCS_IDBAIT_PROTEIN_NAMEBAIT_IDBAIT_AFCS_VERSIONBAIT_PROTEIN_GIBAIT_NUCLEOTIDE_GIBAIT_N_TERMINAL_STARTBAIT_C_TERMINAL_ENDPREY_IDPREY_PROTEIN_NAMEPREY_LIBRARY_NAMEPREY_NUCLEOTIDE_GIPREY_N_TERMINAL_STARTPREY_C_TERMINAL_END

prey_V

PREY_DB_IDProtein_AFCS_IDProtein_NAMEPrey_Protein_TYPEPrey_LIBRARYNUCLEOTIDE_GIPrey_Protein_N_TermimalPREY_protein_C_TerminalPREY_Sequence_IDPROTein_GIPROTein_SEQuenceNUCLEOTIDE_SEQuenceNOVEL_NUC_CHECKSUMNUC_SEQuence_TYPENOVEL_Nuc_SEQ_STARTNOVEL_nuc_SEQ_END

Y2H.BAIT_PREYY2H.PREYY2H.PREY_SEQ

AFCS_PREY

PREY_AFCS_IDPREY_AFCS_NAMEPREY_IDPREY_TYPEPREY_LIBRARYPREY_NUCLEOTIDE_GIPREY_N_TERMINAL_STARTPREY_C_TERMINAL_END

Views in y2h

2003


Recommended