+ All Categories
Home > Documents > Step 07 - Looking at Codon and TRNA Adaptation Indices

Step 07 - Looking at Codon and TRNA Adaptation Indices

Date post: 07-Mar-2016
Category:
Upload: qatco
View: 222 times
Download: 0 times
Share this document with a friend

of 27

Transcript
  • Step 07 - Looking at Codonand tRNA Adaptation IndicesTAI CalcluationI'm using codonR to calculate tRNA adaptation for my constructs. I'mgetting this first set of commands from codonR/README.

    I first had to run codonM on the coding sequence only, in perl. I also ran thesame codonM script on the E. coli genome, for comparison.

    $,perl,codonR/codonM,\,,,,

  • multiplot(,,,,ggplot(melt(subset(ngs,,Promoter=='BBaJ23100'),,,,,,,,,,,,,,measure.vars=,c('RNA','Count.DNA','Prot','Trans')),,,,,,,,,,,,,,aes(x=tAI,,color=RBS,,y=value)),+,,,,,,,,,geom_point(alpha=0.05),+,theme_bw(),+,stat_smooth(method=lm,,se=F),+,,,,,,,,facet_wrap(RBS~variable,,scale='free'),+,,,,,,,,scale_x_continuous(name="Secondary,Structure,Free,Energy"),+,,,,,,,,scale_y_log10("Log10,of,Dependent,Variable"),+,,,,,,,,opts(title="tRNA,adaptation,correlations,(Strong,Promoter)"),

    ,,,,ggplot(melt(subset(ngs,,Promoter=='BBaJ23108'),,,,,,,,,,,,,,measure.vars=,c('RNA','Count.DNA','Prot','Trans')),,,,,,,,,,,,,,aes(x=tAI,,color=RBS,,y=value)),+,,,,,,,,,geom_point(alpha=0.05),+,theme_bw(),+,stat_smooth(method=lm,,se=F),+,,,,,,,,facet_wrap(RBS~variable,,scale='free'),+,,,,,,,,scale_x_continuous(name="Secondary,Structure,Free,Energy"),+,,,,,,,,scale_y_log10("Log10,of,Dependent,Variable"),+,,,,,,,,opts(title="tRNA,adaptation,correlations,(Weak,Promoter)"),cols=1)

  • I checked a few things to see if I did this right, because I expected to see amuch larger effect.

    Maybe I should compare this by gene as well:

  • ngs,
  • ##,,,,,,Effect,Pct.Explained##,1,,Promoter,,,,,,,,30.167##,2,,,,,,,RBS,,,,,,,,19.841##,3,,,,,,Gene,,,,,,,,13.775##,4,,,,,,,tAI,,,,,,,,,1.045##,5,,Gene:tAI,,,,,,,,,1.084##,6,Residuals,,,,,,,,34.088

    Now let's look at percent explained variation of the relative fold change inexpression within a promoter, rbs, and gene combination. Since we'relooking within promoter and gene, the only components to this ANOVA aretAI and gene identity.

    tai.lm,

  • ngs$Gene,
  • multiplot(,,,,ggplot(melt(subset(ngs,,,,,,,,,,,,,,Promoter=='BBaJ23100',&,grepl('Rare',ngs$CDS.type)),,,,,,,,,,,,,,measure.vars=,c('RNA','Count.DNA','Prot','Trans')),,,,,,,,,,,,,,aes(x=tAI,,group=CDS.type,,color=CDS.type,,y=value)),+,,,,,,,,,geom_point(alpha=0.15),+,theme_bw(),+,,,,,,,,facet_wrap(RBS~variable,,scale='free'),+,,,,,,,,scale_x_continuous(name="Secondary,Structure,Free,Energy"),+,,,,,,,,scale_y_log10("Log10,of,Dependent,Variable"),+,,,,,,,,opts(title="tRNA,adaptation,correlations,(Strong,Promoter)"),+,,,,,,,,geom_boxplot(fill=NA,,outlier.shape=NA),

    ,,,,ggplot(melt(subset(ngs,,,,,,,,,,,,,,Promoter=='BBaJ23108',&,grepl('Rare',ngs$CDS.type)),,,,,,,,,,,,,,measure.vars=,c('RNA','Count.DNA','Prot','Trans')),,,,,,,,,,,,,,aes(x=tAI,,group=CDS.type,,color=CDS.type,,y=value)),+,,,,,,,,,geom_point(alpha=0.15),+,theme_bw(),+,,,,,,,,facet_wrap(RBS~variable,,scale='free'),+,,,,,,,,scale_x_continuous(name="Secondary,Structure,Free,Energy"),+,,,,,,,,scale_y_log10("Log10,of,Dependent,Variable"),+,,,,,,,,opts(title="tRNA,adaptation,correlations,(Weak,Promoter)"),+,,,,,,,,geom_boxplot(fill=NA,,outlier.shape=NA),cols=1)

  • Alright, there is an effect here, but why is it so strong for RNA?. Themax protein measurement might be washing out the effect, perhaps? Or thefitness cost makes the cells divide more slowly, increasing the amount ofRNA per cell? Something weird is definitely going on here. I split the tAIinto regions and plot it with boxplots:

  • multiplot(ggplot(melt(subset(ngs,,Promoter,==,"BBaJ23100"),,measure.vars,=,c("RNA",,,,,,"Count.DNA",,"Prot",,"Trans")),,aes(y,=,value,,x,=,cut(tAI,,breaks,=,5),,,,,,color,=,RBS)),+,geom_boxplot(alpha,=,0.15),+,theme_bw(),+,facet_wrap(RBS,~,,,,,variable,,scale,=,"free"),+,opts(title,=,"tRNA,adaptation,correlations,(Strong,Promoter)"),+,,,,,scale_y_log10("Log10,of,Dependent,Variable"),+,geom_boxplot(fill,=,NA,,outlier.shape,=,NA),,,,,,ggplot(melt(subset(ngs,,Promoter,==,"BBaJ23108"),,measure.vars,=,c("RNA",,,,,,,,,,"Count.DNA",,"Prot",,"Trans")),,aes(y,=,value,,x,=,cut(tAI,,breaks,=,5),,,,,,,,,,color,=,RBS)),+,geom_boxplot(alpha,=,0.15),+,theme_bw(),+,facet_wrap(RBS,~,,,,,,,,,variable,,scale,=,"free"),+,opts(title,=,"tRNA,adaptation,correlations,(Weak,Promoter)"),+,,,,,,,,,scale_y_log10("Log10,of,Dependent,Variable"),+,geom_boxplot(fill,=,NA,,,,,,,,,,outlier.shape,=,NA),,cols,=,1)

  • So it looks like max/min is a strong effect but when I include the tAImeasures for all sequences, the effect is not very strong (though stillsignificant) when I include sequences not explicitly designed to have a highrare codon usage. What if we split it up into the extremes, the 1% and 99%quantiles:

    multiplot(ggplot(melt(subset(ngs,,Promoter,==,"BBaJ23100"),,measure.vars,=,c("RNA",,,,,,"Count.DNA",,"Prot",,"Trans")),,aes(y,=,value,,x,=,cut(tAI,,breaks,=,c(0,,,,,,quantile(ngs$tAI,,c(0.01,,0.99)),,1),,labels,=,c("1%",,"mid",,"99%")),,color,=,RBS)),+,,,,,geom_boxplot(alpha,=,0.15),+,theme_bw(),+,facet_wrap(RBS,~,variable,,scale,=,"free"),+,,,,,opts(title,=,"tRNA,adaptation,correlations,(Strong,Promoter)"),+,scale_y_log10("Log10,of,Dependent,Variable"),+,,,,,geom_boxplot(fill,=,NA,,outlier.shape,=,NA),,ggplot(melt(subset(ngs,,Promoter,==,,,,,"BBaJ23108"),,measure.vars,=,c("RNA",,"Count.DNA",,"Prot",,"Trans")),,aes(y,=,value,,,,,,x,=,cut(tAI,,breaks,=,c(0,,quantile(ngs$tAI,,c(0.01,,0.99)),,1),,labels,=,c("1%",,,,,,,,,,"mid",,"99%")),,color,=,RBS)),+,geom_boxplot(alpha,=,0.15),+,theme_bw(),+,,,,,,facet_wrap(RBS,~,variable,,scale,=,"free"),+,opts(title,=,"tRNA,adaptation,correlations,(Weak,Promoter)"),+,,,,,scale_y_log10("Log10,of,Dependent,Variable"),+,geom_boxplot(fill,=,NA,,outlier.shape,=,NA),,,,,,cols,=,1)

  • quantile_list,
  • So even plotting the top 1% and the bottom 1% of tAI values is not as strongas the difference between min/max rare codon types. It seems as if themin/max rare constructs have some other property that tAI alone is notcatching. Could it be re-use of the same tRNA for consecutive codons? Someother metric of codon usage? It is unclear, but I will have to explore further.

    Relative Codon FrequenciesInstead of tRNA adaptation index, what if we calculate the geometric meanof the codon frequencies for these 10 amino acids?

    I've calculated the relative freuency for each codon like this:

  • \[ f_{\,codon} = \frac{freq_{\,codon,\,genome}}{freq_{\,AA,\,genome}} \]

    The codon score for each gene is calculated like this:

    \[ F_{gene} = \exp\left( \frac{1}{peptide\,length}\sum_{codon}^{n_{codons}} \ln\left( f_{\,codon} \cdotfreq_{\,codon,\,gene}\right) \right) \]

    load_genomic_codon_data,

  • lib_seqs,
  • ngs,
  • ##,,,,,,,,,,,,,,,,Effect,Pct.Explained##,1,,,,,,Rel.Codon.Freq,,,,,,,,4.8597##,2,,,,,,,,,,,,,,,,Gene,,,,,,,,0.9269##,3,Rel.Codon.Freq:Gene,,,,,,,,6.5388##,4,,,,,,,,,,,Residuals,,,,,,,87.6746

    It's a much stronger effect that tAI. (4.8% versus 3.8%). The effect isstronger for some genes than others. Let's look at fitness also:

    rcf.lm,

  • quantile_list,
  • multiplot(ggplot(ddply(melt(subset(ngs,,Promoter,==,"BBaJ23100"),,measure.vars,=,c("RNA",,,,,,"Count.DNA",,"Prot",,"Trans")),,c("variable",,"RBS"),,transform,,scaled,=,scale(value)),,,,,,aes(y,=,scaled,,x,=,cut(Rel.Codon.Freq,,breaks,=,c(0,,quantile_list,,1),,,,,,,,,,labels,=,quantile_labels),,color,=,RBS)),+,geom_boxplot(fill,=,NA,,outlier.shape,=,NA),+,,,,,theme_bw(),+,geom_jitter(alpha,=,0.01),+,facet_wrap(RBS,~,variable,,scale,=,"free"),+,,,,,opts(title,=,"Codon,adaptation,correlations,by,SD,(Strong,Promoter)",,axis.text.x,=,theme_text(angle,=,490,,,,,,,,,,size,=,6)),+,scale_y_continuous("x,*,SD,of,of,Dependent,Variable",,limits,=,c(44,,,,,,4)),,ggplot(ddply(melt(subset(ngs,,Promoter,==,"BBaJ23108"),,measure.vars,=,c("RNA",,,,,,"Count.DNA",,"Prot",,"Trans")),,c("variable",,"RBS"),,transform,,scaled,=,scale(value)),,,,,,aes(y,=,scaled,,x,=,cut(Rel.Codon.Freq,,breaks,=,c(0,,quantile_list,,1),,,,,,,,,,labels,=,quantile_labels),,color,=,RBS)),+,geom_boxplot(fill,=,NA,,outlier.shape,=,NA),+,,,,,theme_bw(),+,geom_jitter(alpha,=,0.01),+,facet_wrap(RBS,~,variable,,scale,=,"free"),+,,,,,opts(title,=,"Codon,adaptation,correlations,by,SD,(Weak,Promoter)",,axis.text.x,=,theme_text(angle,=,490,,,,,,,,,,size,=,6)),+,scale_y_continuous("x,*,SD,of,Dependent,Variable",,limits,=,c(44,,,,,,4)),,cols,=,1)

  • A 'chunk' consists of 711.7 sequences each on average. Here is the samedata, not log10 adjusted:

    quantile_list,

  • Lets try using protein level relative to the mean for that gene:

    multiplot(ggplot(melt(subset(ngs,,Promoter,==,"BBaJ23100"),,measure.vars,=,c("RNA",,,,,,"Count.DNA",,"Prot.FoldChange.Codons",,"Trans")),,aes(y,=,value,,x,=,cut(Rel.Codon.Freq,,,,,,breaks,=,c(0,,quantile_list,,1),,labels,=,quantile_labels),,color,=,RBS)),+,,,,,geom_boxplot(fill,=,NA,,outlier.shape,=,NA),+,theme_bw(),+,geom_jitter(alpha,=,0.01),+,,,,,facet_wrap(RBS,~,variable,,scale,=,"free"),+,opts(title,=,"Codon,adaptation,correlations,(Strong,Promoter)",,,,,,axis.text.x,=,theme_text(angle,=,490,,size,=,6)),+,scale_y_continuous("Gene4Relative,Fold,change,of,Dependent,Variable"),,,,,,ggplot(melt(subset(ngs,,Promoter,==,"BBaJ23108"),,measure.vars,=,c("RNA",,,,,,,,,,"Count.DNA",,"Prot.FoldChange.Codons",,"Trans")),,aes(y,=,value,,x,=,cut(Rel.Codon.Freq,,,,,,,,,,breaks,=,c(0,,quantile_list,,1),,labels,=,quantile_labels),,color,=,RBS)),+,,,,,,,,,geom_boxplot(fill,=,NA,,outlier.shape,=,NA),+,theme_bw(),+,geom_jitter(alpha,=,0.01),+,,,,,,,,,facet_wrap(RBS,~,variable,,scale,=,"free"),+,opts(title,=,"Codon,adaptation,correlations,(Weak,Promoter)",,,,,,,,,,axis.text.x,=,theme_text(angle,=,490,,size,=,6)),+,scale_y_continuous("Gene4Relative,Fold,change,of,Dependent,Variable"),,,,,,cols,=,1)

  • So it does look like rare codons matter, but it is clear that only using lots ofthe rarest codons makes a difference. Maybe we can find a 'cutoff' codonvalue and see which codons are present much more often in thosesequences?

    cdnfrq.get_sum_sq,

  • ##,Error:,object,'cdnfrq.lm',not,found

    ggplot(subset(ngs,,!is.na(dG),&,!is.na(Prot)),,aes(x,=,reorder(Gene,,log10(Prot),,,,,,,mean),,y,=,CDS.type)),+,geom_tile(aes(fill,=,scaled_residuals)),+,opts(panel.background,=,theme_rect(fill,=,"gray80"),,,,,,axis.ticks,=,theme_blank()),+,scale_fill_gradient2(low,=,"blue",,mid,=,"white",,,,,,high,=,"red"),+,facet_grid(RBS,~,Promoter),+,opts(title,=,"Residuals,after,ANOVA,on,all,measured,variables,so,far",,,,,,plot.title,=,theme_text(size,=,14,,lineheight,=,0.8,,face,=,"bold"),,legend.position,=,NA,,,,,,axis.text.x,=,theme_text(angle,=,490))

    ##,Error:,object,'dG',not,found

    I was hoping there would be some rhyme or reason to the remainingvariation, but it doesn't appear to be the case.


Recommended