Analysis of the yeast transcriptional regulatory network
Transcription Factor (TF)
A TF is a protein that binds to DNA sequences and regulates the transcriptions of corresponding genes.
Usually the binding site of a TF is one small segment of specific promoter sequence.
The activity of a TF is regulated according to the cell’s need, largely through signal transduction. It may not be directly observed, but can be reflected by the genes it regulates.
Expression regulatory network
Identifying the expression regulatory network is a crucial step towards understanding the cellular regulation system.
Inferring network from microarray data alone
Inferring network from microarray data and TF-TG (Target Gene) Information
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N. Revealing modular organization in the yeast transcriptional network.Nat Genet. 2002 Aug;31(4):370-7.
Segal E et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data.Nat Genet. 2003 Jun;34(2):166-76.
TF Activity
Use TF-TG relation benefit the regulatory network identification
TF expression level is not a good measure of the TF activity. The activated protein level of a TF, rather than its expression level, is what controls gene expression.
The activity of a transcription factor is regulated according to the cell’s need, largely through signal transduction. It may not be directly observed, but can be reflected by the genes it regulates.
Identify TF Activity by NCA
Network Component Analysis
Liao JC et al. Network component analysis: reconstruction
of regulatory signals in biological systems.
Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15522-7.
NCA compared with PCA, ICA
NCA Model
Without further constraints, [E] cannot be uniquely decomposed to [A] and [P].
Criteria for Unique NCA [E] = [A][P]
1. The connectivity matrix [A] must have full-column rank.
2. When a node in the regulatory layer is removed along with all of the output nodes connected to it, the resulting network must be characterized by a connectivity matrix that still has full-column rank. This condition implies that each column of [A] must have at least L-1 zeros.
3. [P] must have full row rank. In other words, each regulatory signal cannot be expressed as a linear combination of the other regulatory signals.
Criteria 2
Estimation of [E]=[A][P]
Iteratively estimate [A] and [P]: A0 P1 A1 P2… until convergenceConvergence criterion: decrease of least square error < cutoff
NCA, infer TF activity in Yeast
[E] = [A] [P]
How to define the restrictions to CS? i.e. which CS{i,j}=0?
Identify the TF-TG relation by ChIP-chip experiment
Yeast cell cycle regulation
441 genes vs. 33 transcription factors
Inference of regulatory network by Two-stage
constrained factor analysis
Yu T, Li KC.
Inference of transcriptional regulatory network by two-stage constrained space factor analysis.
Bioinformatics. 2005 Nov 1;21(21):4033-8.
Inference of regulatory network by Two-stage constrained factor analysis
Shortcoming of Liao et. al.’s approach:E = AP
Let Cij = I{Eij}, the constraint of where the loading matrix A can be non-zero
C comes from very noisy source.
Estimate C, A, P simultaneously.
Model setting
Gene expression matrix
Gene x Condition
Regulation strength matrix
(to be estimated)
Gene x TF
TF activity matrix (to be estimated)
TF x Condition
Error matrix
Connection constraint matrix
Gene x TF
1: connection; 0: no connection
Constrained by: jibbc jijiji , ,,,, ∀≡×
KNC ×
Up to here, it is the NCA model by Liao et al.
Model Fitting
by and
However, we do not assume full knowledge on C. We require C to be bounded
Higher-confidence set, from biological evidence
Lower-confidence set, from ChIP data
Model FittingDifficulties:
Simultaneous estimation of both the structure and coefficients amounts to finding optimum in a very complex function.
The number of parameters to be estimated is overwhelming.
Solution:
Find a reasonable local optimum.
Use the high-confidence set to find a starting point as close to the global optimum as possible.
Implementation:
Stepwise model fitting.
Start with a network backbone with only the high-confidence set, and grow the network gradually, drawing new connections from the low-confidence set.
Set C=CMIN, estimate each activity profile tk by the consensus of the expression of the regulated genes.
Is the reduction of total RSS in the last few steps too small?
From (CMAX-C), find the TF-gene pair that best agree with current estimate of B and T
NO
Estimate B and T by alternating least squares, using ridge regression.
YES
Fix estimate of T, regress each gene expression profile on the activity profiles of TF’s that are associated with it in CMAX. Use BIC and p-value to select TF’s.
Result
Data:
Regular growth ChIP data;
cell-cycle microarray data;
99 TFs enter our study.
Start with 891 evidenced relationships and 29154 lower-confidence relationships.
Final network has 3846 TF-gene connections.
TF’s that exhibit correlated expression and activity:
Time-shifting between a TF’s activity profile and its expression profile:
(1) Fit the activity profile using cubic spline
(2) interpolate the spline to get shifted profile
(3) obtain correlation between the expression profile and shifted activity profile
(4) maximize absolute correlation with regard to minute shift.
TF’s that have activity lagging behind expression:
SWI4
TF’s that have activity lagging behind expression:
Between-TF regulations: