+ All Categories

edfasdf

Date post: 13-Mar-2016
Category:
Upload: hank-worster
View: 213 times
Download: 0 times
Share this document with a friend
Description:
adfasdfasdf
Popular Tags:
30
OSU-IU CCSB 2010 Progress Report I. Seq, seq, seq a. Statistics: Illumina sequencing performed for CCSB projects (Dr. Pearlly Yan) i. Methylation Immunoprecipitation Recovery Assay (MIRA-seq): 324 single-end 36 bp lanes ii. Chromatin Immunoprecipitation (ChIP-seq): 26 single-end 36 bp lanes iii. Chromosome Conformation Capture assay (3C): 48 paired-end 36 bp lanes iv. Mate-Pair seq: 46 paired-end 36 bp lanes v. RNA-seq: 7 single-end 51 bp lanes b. Sequencing output: quality and quantity (Dr. Pearlly Yan) i. Total Number of Bases Sequenced (raw): 804,273,870,360 ii. Total Number of Bases Sequenced (passed filter): 666,390,709,200 iii. Total Number of Bases Aligned (passed filter): 443,730,734,048 iv. Clusters (% passed filter): 80.185311 (mean); 10.41649879 (SD) v. aligned cluster (% passed filter): 65.51593301 (mean); 13.73843401 (SD) vi. Alignment Score (% passed filter): 78.50961722 (mean); 33.24720802 (SD) vii. Error Rate (% passed filter): 0.941770335 (mean); 1.235052684 (SD) c. From sequencer to supercomputer – an automatic data processing pipeline (Drs. Kun Huang, and Pearlly Yan) In the phase I of the CCSB project, we developed a data management system called QUEST to manage the large amount of high throughput experimental data such as gene expression microarray, ChIP-chip and an early batch of ChIP-seq data. During the past year, we have extensively expanded QUEST system as proposed to accommodate data from different modalities of next generation sequencing (NGS) experiments besides ChIP-seq and also integrate QUEST into an automatic sequencing data processing pipeline which make QUEST a LIMS (laboratory information management system). Our system incorporated the following features: i. Configuration manager for setting up NGS experiments that tracks metadata related to accounts/labs, users, samples, projects, experiments, genomes, flow cells, and lanes. ii. A staging server for bundling NGS raw data sets for transferring to and from a compute cluster (supercomputer). iii. A compute cluster with the required software programs, genomes, and pipelines installed to process NGS raw data into a suitable result format including mapping the short reads to the reference genomes. iv. Integration with a Laboratory Information Management System (LIMS) system for cataloguing NGS runs and the resulting output files for easy lookup and retrieval.
Transcript

OSU-IU CCSB 2010 Progress Report

I. Seq, seq, seqa. Statistics: Illumina sequencing performed for CCSB projects (Dr. Pearlly Yan)

i. Methylation Immunoprecipitation Recovery Assay (MIRA-seq): 324 single-end 36 bp lanes

ii. Chromatin Immunoprecipitation (ChIP-seq): 26 single-end 36 bp lanesiii. Chromosome Conformation Capture assay (3C): 48 paired-end 36 bp lanesiv. Mate-Pair seq: 46 paired-end 36 bp lanesv. RNA-seq: 7 single-end 51 bp lanes

b. Sequencing output: quality and quantity (Dr. Pearlly Yan)i. Total Number of Bases Sequenced (raw): 804,273,870,360ii. Total Number of Bases Sequenced (passed filter): 666,390,709,200iii. Total Number of Bases Aligned (passed filter): 443,730,734,048iv. Clusters (% passed filter): 80.185311 (mean); 10.41649879 (SD)v. aligned cluster (% passed filter): 65.51593301 (mean); 13.73843401 (SD)vi. Alignment Score (% passed filter): 78.50961722 (mean); 33.24720802 (SD)vii. Error Rate (% passed filter): 0.941770335 (mean); 1.235052684 (SD)

c. From sequencer to supercomputer – an automatic data processing pipeline (Drs. Kun Huang, and Pearlly Yan)

In the phase I of the CCSB project, we developed a data management system called QUEST to manage the large amount of high throughput experimental data such as gene expression microarray, ChIP-chip and an early batch of ChIP-seq data. During the past year, we have extensively expanded QUEST system as proposed to accommodate data from different modalities of next generation sequencing (NGS) experiments besides ChIP-seq and also integrate QUEST into an automatic sequencing data processing pipeline which make QUEST a LIMS (laboratory information management system). Our system incorporated the following features:i. Configuration manager for setting up NGS experiments that tracks metadata related to

accounts/labs, users, samples, projects, experiments, genomes, flow cells, and lanes.ii. A staging server for bundling NGS raw data sets for transferring to and from a compute

cluster (supercomputer).iii. A compute cluster with the required software programs, genomes, and pipelines

installed to process NGS raw data into a suitable result format including mapping the short reads to the reference genomes.

iv. Integration with a Laboratory Information Management System (LIMS) system for cataloguing NGS runs and the resulting output files for easy lookup and retrieval.

v. Notification and logging system for reporting pipeline and data transfer progress, any data processing errors, and automatic emails sent to end-users when pipeline execution and data transfer is completed.

vi. Data Portal (including both sftp and http access) for users to search and download results.

vii. Pluggable architecture to accommodate additional NGS sequencers from different manufacturers, computing clusters, and/or data portals. The system with the above features manages the large NGS experimental data with a complete workflow for primary analysis (image analysis and base calling) and secondary analysis (sequence mapping, splice junction detection, file conversion and visualization of results). It can also be extended to integrate further analysis tools such as peak detection (for ChIP-seq applications) and splice variants inference (using Cufflink).

Figure 1. The architecture of the automatic NGS data processing pipeline. Besides QUEST, the two computer clusters are highlighted.

c. Hosting multiple types of NGS data and expanding to multiple sequencing platforms (Dr. Kun Huang)The Configuration Manager has been deployed in a production capacity since December 2009 and has configured and cataloged all the runs from the two GAIIx sequencers. The automation pipeline has been in production since May 2010. Over 700 samples have been processed and we have over 5000 result files and 4 TBs of results are accessible. These samples include data generated from ChIP-seq, MIRA-seq, RNA-seq, targeted DNA re-sequencing, and de novo genome sequencing experiments from multiple organisms including human, mouse, rat, fungus, and bacteria. Currently this system is being expanded to accommodate a SOLiD 4 sequencer in Dr. Wolfgang Sadee’s group in the Department of Pharmacology. In addition, we are in discussion with the OSU Plant Genomics Center led by Dr. Robert Tabita to adapt our system for their Roche/454 GLX sequencer.

d. Develop a data management system on a high performance computing platform for integrating different sources of data and informative visualization and analysis of the data (Dr. Victor Jin): The present methods of visualizing the raw data from high throughput experiments and the information in down-stream analyses are encumbered by many difficulties: a large quantity of information which renders experimental results difficult to interpret and a large data size which renders the information difficult to transmit to collaborators and share. Since increasingly novel uses of genome wide sequencing technology are being made every month, there is a need for a technology which will immediately visualize new results from an experiment as well as allow collaborators to visualize the same data in the same way, quickly and without transmitting the many gigabytes of raw data in full.

There are several solutions to these problems separately. The UCSC genome browser is unmatched in its breadth, and it is an online service so data may be shared with collaborators. Online genome browsers have several advantages, the degree of configuration required to set them up is usually very minimal and they effortlessly communicate data to collaborators; however, there are a few disadvantages as well: they are remote and the sometimes large quantities of data must be uploaded which can consume significant amounts of time. It is a common use case to perform an iterative series of computational analyses on genomic data and then viewing the results in order to refine the analytic parameters used. Conversely, desktop visualization programs solve this problem by acting very quickly on the locally available files to produce high quality, nearly instant visualization. However, desktop visualization software cannot effectively allow collaborators to browse the same data at their own pace.

To address these seemingly conflicting needs, we will develop a robust, flexible, web accessible genomic data management system, the Portable Omic Regulator Framework (PORF), for storage, query and visualization of ‘omics data (Figure 1), which is a part of our comprehensive informatics platform. A first version of hormone receptor target binding loci database (HRTargetDB) based on this tool is shown in our publication.

The PORF is designed to visualize discrete and continuous data on a genome. It may be configured for any species, and it includes the ability to render raw experimental reads, ChIP sequence peaks, RNA-sequence data, methylation levels, protein binding sites, genes, miRNA, and secondary binding sites within a larger region (e.g. computationally predicted sites within a MIRA-seq loci). The visualization provided is high quality, configurable, and interactive. Users may view detailed information about visualized elements within the browser itself: by moving the mouse over a genomic feature, information pertaining to it will be displayed instantly. The ability to be run as a desktop program and simultaneously provide the option of sharing experimental results online makes the PORF a flexible and highly valuable visualization tool. The general features of the PORF are server side user customization, data storage, backup, and access control. The client side interaction includes data search, visualization, and custom track upload/visualization. With the ease of customization and deployment, and the powerful search and simplicity of client side interaction, PORF is an excellent platform to store, search, and visualize all of the genomic data produced by a lab of any size.

The PORF is a comprehensive, highly modular application for the storage of genomic data, data search, and graphical display through a web browser. The general features of the PORF are

server side user customization, data storage, backup, and access control. The client side interaction includes data search, visualization, and custom track upload/visualization. The server code is implemented in the CakePHP 1.3.2 framework in the PHP language. The graphical rendering of the data is done in a dynamic, modular fashion, using Scalable Vector Graphics (SVG) which is rendered in the browser by Javascript. The data is supplied to the client in Javascript Object Notation from the server which is then used by my program to dynamically render both the SVG graphical visualization and the HTML tabular representation of the data being visualized. This approach modularizes the client side visualization.

Figure 1. A. An overview of our comprehensive informatics platform to integrate the different source of data into our data store.

B. An illustration of Interoperating, well standardized technologies move information throughout the system bringing modularity and external compatibility at every point in the PORF's operation.

e. A new comparative method applicable to large number of NGS datasets (Dr. Kun Huang)With the large amount of NGS data generated in this project, comparison of multiple experiments on different cell lines under different experimental conditions becomes a big challenge. We have investigated ways to compare multiple ChIP-seq experiments. We specifically studied epigenetic regulation of breast cancer and the effect of estrogen using 50 ChIP-seq data from Illumina Genome Analyzer IIx. First, we evaluate the correlation among different experiments focusing on total number of reads in transcribed regions of the genome. Then, we adopt the method that is used to identify most stable genes in RT-PCR experiments to understand background signal across all experiments and to identify most variably transcribed regions of the genome. Gene ontology and function enrichment analysis on the 100 most variable genes demonstrate the biological relevance of the results. This method can also effectively select differentially transcribed regions based over multiple experiments using real data points without any normalization among the samples. This work was presented in Computational Systems Bioinformatics (CSB), Palo Alto, California, 2010.

f. A novel wavelet-based enriched region detection algorithm for ChIP-seq data (Dr. Kun Huang)Unlike regular transcription factors, the binding of histone markers on the genome does not focus on specific narrow regions which can be identified as “peaks” in ChIP-seq data. Instead, the genomic regions enriched with histone markers can span a long range and many peak finding algorithms are not applicable. To resolve this issue, we propose a signal processing approach for detecting enrichment regions from ChIP-seq datasets. A wavelet transform of the ChIP-seq data offers a direct characterization of both short- and long-range patterns of the genome-wide mapping profile for protein binding site on DNA. To investigate the location of enriched binding sites from ChIP-seq data, a wavelet-based peak detection algorithm is proposed. Differing from prior methods exploring the statistics of peaks in whole genome, scalogram of raw data is used. In addition, a SNR-like parameter used to detect the peaks is proposed to instead of raw data for tackling the peak finding problem. Also peak depth and the length of peak regions can be obtained by the measurement of SNR-like parameter with a

threshold constrain. Furthermore, in order to eliminate false positives, a filter which sifts out the peaks with sufficient SNR but not deep enough in sequence depth is applied. The effectiveness of our method is demonstrated by applying the STAT1 ChIP-seq data and comparing to the well known published method, PeakSeq. The experimental results show that a large fraction of peaks identified by our method are consistent with the results of PeakSeq algorithm while our results show more consistent motif conservation scores. This work will be presented in the IEEE International Conference on Bioinformatics and BioMedicine (BIBM’10), Hong Kong, 2010. Currently we are expanding this work to our histone marker ChIP-seq data analysis.

f. Computational modeling on microRNA-mediated regulatory network (Dr. Yunlong Liu): Constructing and modeling the gene regulatory network is one of the central themes of systems biology. With the growing understanding of the mechanism of microRNA biogenesis and its biological function, establishing a microRNA-mediated gene regulatory network is not only desirable but also achievable. In this study, we designed a bioinformatics strategy to construct the microRNA-mediated regulatory network using genome-wide binding patterns of transcription factor(s) and RNA polymerase II (RPol II), derived using chromatin immunoprecipitation following next generation sequencing (ChIP-seq) technology. Our strategy includes three key steps, identification of transcription start sites and promoter regions of primary microRNA transcripts using RPol II binding patterns, selection of cooperating transcription factors that collaboratively function with the transcription factors targeted by ChIP-seq assay, and construction of the network that contains regulatory cascades of both transcription factors and microRNAs. We have tested our model in two biological system, which interrogated microRNA-mediated regulatory network in interferon-stimulated HeLa cells (Wang et al. PLoS One, 2010a), and steroid hormone-treated breast cancer cells (Wang et al. PLoS One, 2010b). This work has been featured in the Genomeweb news. We used this model to investigate the transcription of microRNAs in response to hormone treatment of two breast cancer cell lines, estrogen-dependent breast cancer cells (MCF7) and the anti-estrogen (tamoxifen) resistant subline (MCF7-T). Our model identified TSS for 72 microRNAs in at least one of four conditions (treatment of MCF7 or MCF7-T with either vehicle or 17β-estradiol). We validated our findings by examining the conservation, CpG content, and activating histone marks (H3K4Me2) in the identified promoter regions. We applied our model to assess changes in microRNA transcription in steroid hormone-treated breast cancer cells. The results demonstrate many microRNA genes have lost hormone-dependent regulation in tamoxifen-resistant breast cancer cells. MicroRNA promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription, and therefore allows comparison of transcription activities between different conditions, such as normal and disease states. Our results suggest that microRNA predisposition can contribute to the development of antiestrogen resistance in hormone-dependent breast cancer cells.

g. Prioritization of disease microRNAs through a human phenome-microRNAome network (Dr. Yunlong Liu): In collaboration with Harbin Institute of Technology in China (part of our out-reach effort), we have published a computational method to infer potential microRNA-disease associations by prioritizing the entire human microRNAome for diseases of interest (Jiang et al. BMC Systems Biology, 2010). Many of the coauthors (trainee) in this work, Hao Y., Wang G., Juan L., Teng M., are either financially supported by the ICBP program, or conducting research with ICBP environment. Our goal in this study is to infer potential microRNA-disease associations and derive testable hypotheses for the experimental efforts to identify the roles of microRNAs in human diseases. This method is a logical extension of previous network-based method for predicting or prioritizing disease-related protein-coding genes. We first constructed a functionally related microRNA network based on the overlapping of their targets, and a human phenome-microRNAome network by integrating the microRNA network with a phenome network using experimentally verified microRNA-disease associations. We subsequently examined whether functionally related microRNAs tend to be associated with phenotypically similar diseases and prioritized microRNAs for human diseases. We tested the model on 270 known experimentally verified microRNA-disease associations and achieved an area under the ROC curve of 75.80%. Moreover, we demonstrated that the model is applicable to diseases with which no known microRNAs are associated. The microRNAome-wide prioritization of

microRNAs for 1,599 disease phenotypes is publicly released to facilitate future identification of disease-related microRNAs.

h. Developing RNA-seq analysis pipeline (Dr. Yunlong Liu): In anticipation of the RNA-seq data from CCSB biology labs in the Center (Huang, Nephew, Wang), we developed a customized pipeline for RNA-seq analysis. The pipeline includes three major components, QC filtering, sequencing alignment, and gene expression and isoform identification (for alternative splicing studies). (1) QC filtering. We will first use ELAND (Illumina) for the read quality recalibration, and then conduct a series of filtering based on the sequencing quality. Second, sequences containing more than two ‘N’ or ‘.’ wildcard matches will be discarded; these represent the bases that the sequence calls cannot be made correctly. Third, each sequence will be scanned for low quality regions. If a 5 base sliding window had an average quality score less than 20, the read will be truncated at that position. Since the quality of base calling with an Illumina system declines as a function of position, once a low quality region is found, the confidence in correctly matching any remaining sequence is deteriorating. Any read with a length of less than 35 bases will be discarded. We have applied this strategy on a variety of sequencing applications. Our experience suggests that such strategy effectively eliminated low quality reads while retaining high quality regions. (2) Alignment. We will use BFAST (Homer N Merriman B and Nelson SF, PloS ONE, 2009) as primary alignment algorithm, and adopt a TopHat-like strategy to align the sequencing reads that cross splicing junctions. We have used BFAST as our primary alignment algorithm since it has high sensitivity for identifying small insertion and deletions. After aligning the sequence reads to a filtering index including repeats, rRNAs, and other sequences are not of interest, we will conduct sequence alignment for three levels, genomic alignment, known junction alignment, and novel junction alignment. The known junctions will be constructed based on UCSC known gene annotation, and novel junction will be constructed based on the enriched regions identified in the genomic alignment. We have developed a mature pipeline using the Indiana University Super Computer system. (3) Isoform identification. Based on the alignment data, we will use Cufflinks to identify the distribution of transcript isoforms in each biological sample. The expression levels of each isoform will be represented by the FPKM values, reported by the Cufflinks. (4) Genetic variation in transcripts. Based on the aforementioned sequence alignment results, we will use multiple pileup tools in the samtools software package to identify genetic variations in the transcripts and evaluate the confidence (with the null hypothesis of sequencing error) using a binomial model. Allele specific expression can be identified for highly expressed transcripts with heterozygous variants.

i. The DIME Package (Dr. Shili Lin): We have developed a software package, DIME, for analyzing ChIP-seq data. The approach implemented in DIME is an ensemble of mixture models. More specifically, three classes of mixture models, GNG (Gamma-Normal-Gamma), NUDGE (Normal-Uniform) and iNUDGE (improved Normal-Uniform) were considered. The best model, the output of DIME, is selected through a two-stage procedure. Although DIME was motivated by the analysis of ChIP-seq data, the package can also be applied to analyzing other high throughput data. A book chapter on the underlying methodology of DIME and a step-by-step guide to analyzing ChIP-seq data, including pre-processing and normalization, has been accepted for publication. The software package and the accompanying manuscript describing the software and its usage have been submitted to the journal Bioinformatics for potential publication. DIME can be freely downloaded from our website.

II. Specific Aimsa. Aim 1: Using an empirical Bayesian mixture and hidden Markov method for classifying

spatiotemporal patterns of genomic targets in a signaling network.i. Estrogen-induced enhancer amplification (Drs. Tim Huang, Victor Jin, Ken Nephew, Lang Li,

Yunlong Liu, Qianben Wang, Kun Huang, Raghu Machiraju, Changyu Shen, Pearlly Yan): Genomic amplification is frequently observed but poorly understood in cancers. We demonstrate that distant-acting regulatory elements, or enhancers, may contribute to this oncogenic event. Enhancers are transcription factor binding sites known to remotely regulate transcription through chromatin looping. Genome-wide screening of chromatin loops comprehensively identifies inter- and intra-chromosomal interactions between estrogen receptor α binding sites (ERαBSs) and target loci. Focal ERαBSs, or enhancer clusters, formed

synchronous, multiple loops with distal target loci upon estrogen stimulation.

Fig. 1. Integrative mapping of ERα-mediated chromatin loops. (A) Experimental scheme. Cross-linked DNAs of E2 (17β-estradiol, 70 nM, 24 hr)-treated and-untreated MCF-7 cells were digested with BamHI and ligated in a diluted condition. Ligated DNA was subjected to paired-end sequencing. Mate-pair sequencing was also conducted to map structural fusions (i.e., genomic rearrangements) in MCF-7 cells. To identify estrogen-responsive loops, these structural fusions were filtered-out from the paired-end dataset. The filtered data were integrated with an ERα ChIP-seq dataset to identify ERα-mediated loops. (B) Representative circular maps of 20q13-related chromatin loops before (blue) and after (red) estrogen stimulation using the Circos software (http://mkweb.bcgsc.ca/circos/). Chromosomes are individually colored for circular visualization.

Fig. 2. Amplification and genomic rearrangements of the 20q13 enhancer cluster.(A) Genomic distribution of ERα binding sites (ERαBSs) on 20q13. ChIP-seqprofiles of ERαBSs are shown in MCF-7 cells treated with E2 (17β-estradiol, 70nM) in different time periods as indicated. Total numbers (n) of ERαBSs areindicated for two main and two satellite clusters, respectively. While ERα hasbeen shown to bind to some target sequences in the absence of estrogenstimulation (27), the activity of this binding was drastically increased at 0.5 hrafter E2 treatment and lasted over a 24-hr period. Translocation breakpoints onthe 20q13 region were mapped using mate-pair sequencing data. (B) Positivecorrelation of ERα binding density and structural fusion frequency. An “unstableregion” was defined as a genomic area displaying ≥4 rearrangement sites in1-Mb of a sliding window based on mate-pair sequencing data. Twenty unstableregions were located in four ERαBS clusters (1p13, 3p14, 17q23, and 20q13) inMCF-7 cells. Compared to ERα binding profiles of 100 randomly chosenregions, greater (p=0.00045) number of ERαBSs were observed in the fourunstable regions, suggesting that structural fusions frequently occur in theseenhancer clusters. (C) Amplification of 20q13. Left panel: Interphasefluorescence in situ hybridization (FISH) was performed on both normal breastepithelial cells (upper) and MCF-7 cancer cells (lower). Right panel:Three-dimensional (3D) FISH was performed on intact nuclei. While a total of~60 copies were counted in a “flatten” MCF-7 nucleus by conventional FISH, only8-9 fuzzy clusters were identified in an intact nucleus by 3D-FISH (see furtherexplanation in the text).

Fig. 1. Integrative mapping of ERα-mediated chromatin loops. (A) Experimental scheme. Cross-linked DNAs of E2 (17β-estradiol, 70 nM, 24 hr)-treated and-untreated MCF-7 cells were digested with BamHI and ligated in a diluted condition. Ligated DNA was subjected to paired-end sequencing. Mate-pair sequencing was also conducted to map structural fusions (i.e., genomic rearrangements) in MCF-7 cells. To identify estrogen-responsive loops, these structural fusions were filtered-out from the paired-end dataset. The filtered data were integrated with an ERα ChIP-seq dataset to identify ERα-mediated loops. (B) Representative circular maps of 20q13-related chromatin loops before (blue) and after (red) estrogen stimulation using the Circos software (http://mkweb.bcgsc.ca/circos/). Chromosomes are individually colored for circular visualization.

Fig. 2. Amplification and genomic rearrangements of the 20q13 enhancer cluster.(A) Genomic distribution of ERα binding sites (ERαBSs) on 20q13. ChIP-seqprofiles of ERαBSs are shown in MCF-7 cells treated with E2 (17β-estradiol, 70nM) in different time periods as indicated. Total numbers (n) of ERαBSs areindicated for two main and two satellite clusters, respectively. While ERα hasbeen shown to bind to some target sequences in the absence of estrogenstimulation (27), the activity of this binding was drastically increased at 0.5 hrafter E2 treatment and lasted over a 24-hr period. Translocation breakpoints onthe 20q13 region were mapped using mate-pair sequencing data. (B) Positivecorrelation of ERα binding density and structural fusion frequency. An “unstableregion” was defined as a genomic area displaying ≥4 rearrangement sites in1-Mb of a sliding window based on mate-pair sequencing data. Twenty unstableregions were located in four ERαBS clusters (1p13, 3p14, 17q23, and 20q13) inMCF-7 cells. Compared to ERα binding profiles of 100 randomly chosenregions, greater (p=0.00045) number of ERαBSs were observed in the fourunstable regions, suggesting that structural fusions frequently occur in theseenhancer clusters. (C) Amplification of 20q13. Left panel: Interphasefluorescence in situ hybridization (FISH) was performed on both normal breastepithelial cells (upper) and MCF-7 cancer cells (lower). Right panel:Three-dimensional (3D) FISH was performed on intact nuclei. While a total of~60 copies were counted in a “flatten” MCF-7 nucleus by conventional FISH, only8-9 fuzzy clusters were identified in an intact nucleus by 3D-FISH (see furtherexplanation in the text).

Fig. 2. Amplification and genomic rearrangements of the 20q13 enhancer cluster.(A) Genomic distribution of ERα binding sites (ERαBSs) on 20q13. ChIP-seqprofiles of ERαBSs are shown in MCF-7 cells treated with E2 (17β-estradiol, 70nM) in different time periods as indicated. Total numbers (n) of ERαBSs areindicated for two main and two satellite clusters, respectively. While ERα hasbeen shown to bind to some target sequences in the absence of estrogenstimulation (27), the activity of this binding was drastically increased at 0.5 hrafter E2 treatment and lasted over a 24-hr period. Translocation breakpoints onthe 20q13 region were mapped using mate-pair sequencing data. (B) Positivecorrelation of ERα binding density and structural fusion frequency. An “unstableregion” was defined as a genomic area displaying ≥4 rearrangement sites in1-Mb of a sliding window based on mate-pair sequencing data. Twenty unstableregions were located in four ERαBS clusters (1p13, 3p14, 17q23, and 20q13) inMCF-7 cells. Compared to ERα binding profiles of 100 randomly chosenregions, greater (p=0.00045) number of ERαBSs were observed in the fourunstable regions, suggesting that structural fusions frequently occur in theseenhancer clusters. (C) Amplification of 20q13. Left panel: Interphasefluorescence in situ hybridization (FISH) was performed on both normal breastepithelial cells (upper) and MCF-7 cancer cells (lower). Right panel:Three-dimensional (3D) FISH was performed on intact nuclei. While a total of~60 copies were counted in a “flatten” MCF-7 nucleus by conventional FISH, only8-9 fuzzy clusters were identified in an intact nucleus by 3D-FISH (see furtherexplanation in the text).

Further more, frequent estrogen stimulation of normal epithelial cells intensified these long-distance interactions, leading to aberrant increases in enhancer copy-number. The amplification event included widespread enhancer cluster insertions into multiple genomic locations and was frequently observed in cancer cells. Unlike amplification of single oncogenes, our results indicate that enhancer copy-number gain increases transcriptional interaction frequency and profoundly alters expression of multiple target genes during tumor progression.

ii. Inference of hierarchical regulatory network of estrogen-dependent breast cancer through ChIP-based data (Drs. Victor Jin and Tim Huang): Global profiling of in vivo protein-DNA interactions using ChIP-based technologies has evolved rapidly in recent years. Although many genome-wide studies have identified thousands of ERα binding sites and have revealed the associated transcription factor (TF) partners, such as AP1, FOXA1 and CEBP, little is known about ERα associated hierarchical transcriptional regulatory networks. In this study, we applied computational approaches to analyze three public available ChIP-based datasets: ChIP-seq, ChIP-PET and ChIP-chip, and to investigate the hierarchical regulatory network for ERα and ERα partner TFs regulation in estrogen-dependent breast cancer MCF7 cells. 16 common TFs

A summary of the computational analytical approach(A) Comparison of common genes between MCF7 and MCF7-T cells in ChIP-seq dataset. Genes with both ER and Pol-II peak binding in the gene region (between 100 kb upstream of TSS and 100 kb downstream of 3’UTR), 2661 (273 common genes overlapped with gene expression data) for MCF7 cells, 530 (438 common genes overlapped with gene expression data) for MCF7-T cells, respectively. 58(~21.2%) common genes were found between MCF7 and MCF7-T cells. (B) Transcription factor motifs identified by our de novo ChIPMotifs approach for MCF7-T cells. (C) The Regulatory network for E2 treated MCF7-T cells

A summary of the computational analytical approachA summary of the computational analytical approach(A) Comparison of common genes between MCF7 and MCF7-T cells in ChIP-seq dataset. Genes with both ER and Pol-II peak binding in the gene region (between 100 kb upstream of TSS and 100 kb downstream of 3’UTR), 2661 (273 common genes overlapped with gene expression data) for MCF7 cells, 530 (438 common genes overlapped with gene expression data) for MCF7-T cells, respectively. 58(~21.2%) common genes were found between MCF7 and MCF7-T cells. (B) Transcription factor motifs identified by our de novo ChIPMotifs approach for MCF7-T cells. (C) The Regulatory network for E2 treated MCF7-T cells

and two common TF partners (RORA and PITX2) were found among ChIP-seq, ChIP-chip and ChIP-PET datasets. The regulatory networks were constructed by scanning the ChIP-peak region with TF specific position weight matrix (PWM). A permutation test was performed to test the reliability of each connection of the network. We then used DREM software to perform gene ontology function analysis on the common genes. We found that FOS, PITX2, RORA and FOXA1 were involved in the up-regulated genes. We also conducted the ERα and Pol-II ChIP-seq experiments in tamoxifen resistance MCF7 cells (denoted as MCF7-T in this study) and compared the difference between MCF7 and MCF7-T cells. The result showed very little overlap between these two cells in terms of targeted genes (21.2% of common genes) and targeted TFs (25% of common TFs). The significant dissimilarity may indicate totally different transcriptional regulatory mechanisms between these two cancer cells. Our study uncovers new estrogen-mediated regulatory networks by mining three ChIP-based data in MCF7 cells and ChIP-seq data in MCF7-T cells. We compared the different ChIP-based technologies as well as different breast cancer cells. Our computational analytical approach may guide biologists to further study the underlying mechanisms in breast cancer cells or other human diseases.

iii. To define spatial interactions between TFBSs and target promoters in hormone-chemo-sensitive vs. –insensitive prostate cancer cells (Dr. Qianben Wang): The UBE2C oncogene is overexpressed in many types of solid tumors including

Figure 1. Identification of three putative UBE2C enhancers in PC-3 cells. 3C assays were performed in LNCaP cells and PC-3 cells in the presence and absence of DHT. The black shading shows the position of the fixed fragment (the UBE2C promoter). The grey shading indicates three fragments that have significant higher crosslinking frequencies with the UBE2C promoter in PC-3 cells than in LNCaP cells.

Figure 1. Identification of three putative UBE2C enhancers in PC-3 cells. 3C assays were performed in LNCaP cells and PC-3 cells in the presence and absence of DHT. The black shading shows the position of the fixed fragment (the UBE2C promoter). The grey shading indicates three fragments that have significant higher crosslinking frequencies with the UBE2C promoter in PC-3 cells than in LNCaP cells.

the lethal castration-resistant prostate cancer (CRPC) that is highly heterogeneous in androgen receptor (AR) expression. While our recent studies have identified two CRPC-specific AR-bound enhancers that drive UBE2C overexpression in AR-positive CRPC, little is known about the regulation of UBE2C in AR-negative CRPC. We found that UBE2C mRNA and protein levels were significantly greater in PC-3 (an AR-negative CRPC cell model) vs. LNCaP (an earlier stage androgen-dependent prostate cancer (ADPC) cell model) (data not shown). Significantly, downregulation of UBE2C expression level resulted in decreased proliferation of PC-3 but not LNCaP cells (data not shown). Given that cell-specific enhancers drive cell-specific gene expression, we hypothesized that PC-3-specific UBE2C enhancers direct the upregulation of the

UBE2C gene in PC-3 cells. We thus performed quantitative 3C assays for the UBE2C locus in LNCaP and PC-3 cells, in order to identify potential PC-3-specific UBE2C enhancers. Analysis of the 3C results revealed greater interactions between the -20 kb fragment (E1,1.9-fold), -14 kb fragment (E2,1.6-fold) and +2 kb fragment (E3,1.8-fold) and the UBE2C promoter in PC-3 cells than in LNCaP cells in the presence or absence of DHT (Figure 1). As a negative control, no difference in crosslinking frequencies between the +46 kb fragment and the UBE2C promoter was found between the two cell lines (Figure 1).We next investigated whether specific transcription factors and coactivators are recruited to E1, E2 and E3 regions. As shown in Figure 2, significantly higher levels of FoxA1 and MED1 (1.5-fold) were observed on E1, E2, E3 and the promoter region but not on the control region in the

absence and/or presence of hormone (Figure 2).To investigate whether the increased binding of FoxA1 and MED1 at the PC-3 specific UBE2C enhancers plays a causal role in long-range interactions between the UBE2C enhancers and the UBE2C promoter, we examined the effect of silencing of FoxA1 and MED1 on loop formation. As shown in Figures 3A and 3B, FoxA1 and MED1 silencing caused a modest and a significant decrease of crosslinking frequencies between E1, E2, E3 and the UBE2C promoter, respectively. Consistent with the notion that formation of chromatin loops is a prerequisite for transcription, silencing of FoxA1 or MED1 (but more notably MED1) significantly decreased UBE2C mRNA expression (Figure 3C).

Figure 3. Silencing of FoxA1 and MED1 impairs long-range interactions in the UBE2C locus and decreases UBE2C gene expression. (A) Left panel: Knocking down of FoxA1 and MED1 decreases crosslinking frequencies between the UBE2C enhancers and the UBE2C promoter. 3C assays were performed in siControl, siFoxA1 or siMED1 transfected PC-3 cells. Right panel: The 3C results were presented as fold changes in relative crosslinking frequencies. (B) Suppression of FoxA1 and MED1 levels by siRNAs. PC-3 cells were transiently transfected with siControl, siFoxA1 or siMED1 and protein levels were determined by Western blot analysis. (C) Silencing of FoxA1 and MED1 decreases UBE2C gene expression in PC-3 cells. PC-3 cells were transfected with siRNAtargeting FoxA1 and MED1. Seventy-two h after siRNA transfection, real-time RT-PCR was performed (mean [n=3] SD).

These findings indicated that MED1 was a crucial mediator of chromatin looping and gene expression in the UBE2C locus in AR-negative CRPC.We next investigated the underlying mechanisms for MED1-mediated looping in the UBE2C locus in AR-negative CRPC. Given that previous studies suggested that protein-protein interactions between enhancer-bound proteins and promoter-bound proteins mediate chromatin looping and protein phosphorylation significantly affects protein-protein interaction, we hypothesized that MED1 phosphorylation codes for chromatin looping by enhancing protein-protein interactions on chromatin. We used a retrovirus-mediated gene transfer technique to establish stable PC-3 cell lines expressing a FLAG/HA epitope-tagged wild-type MED1 (termed PC-3/WT MED1) and a FLAG/HA epitope-tagged double phosphomutant (T1032A/T1457A) MED1 (termed PC-3/DM MED1). Importantly, phospho-T antibodies only recognized WT MED1 but not DM MED1 immunoprecipitated by FLAG antibodies (data not shown). To examine the

Figure 2. Increased recruitment of FoxA1 and MED1 to the UBE2C enhancers in AR-negative CRPC compared with AR-positive ADPC. (A) Increased FoxA1 binding at the UBE2C enhancers in PC-3 cells than in LNCaP cells. ChIP assays were performed using antibodies against FoxA1 in LNCaP treated with (blue color) or without (red color), and PC-3 treated with (brown color) or without (green color) 10 nM DHT (mean [n=3] SD). (B) Increased recruitment of MED1 to the UBE2C enhancers and promoter. ChIP assays were conducted as above using MED1 antibodies.

Figure 2. Increased recruitment of FoxA1 and MED1 to the UBE2C enhancers in AR-negative CRPC compared with AR-positive ADPC. (A) Increased FoxA1 binding at the UBE2C enhancers in PC-3 cells than in LNCaP cells. ChIP assays were performed using antibodies against FoxA1 in LNCaP treated with (blue color) or without (red color), and PC-3 treated with (brown color) or without (green color) 10 nM DHT (mean [n=3] SD). (B) Increased recruitment of MED1 to the UBE2C enhancers and promoter. ChIP assays were conducted as above using MED1 antibodies.

effect of MED1 phosphorylation on protein-protein interactions at the UBE2C locus, a siRNA targeting MED1 3’ untranslated region (UTR) was used to deplete endogenous MED1 (data not shown), as ectopic epitope-tagged WT MED1 and DM MED1 constructs lack 3’ UTR. re-ChIP experiments revealed that interactions between FoxA1 and Pol II /TBP were significant attenuated (2-fold) on the UBE2C enhancers and the UBE2C promoter (Figure 4A). We next tested whether the decreased protein-protein interactions at the UBE2C locus attenuate chromatin looping. As shown in Figure 4B, the crosslinking frequencies between all three UBE2C enhancers and the UBE2C promoter were significantly lower in PC-3/DM MED1 cells versus PC-3/WT MED1 cells, suggesting that MED1 phosphorylation was necessary for UBE2C locus looping.

Figure 4. Phosphorylation of MED1 is required for UBE2C locus looping. (A) Decreased interactions between FoxA1 and Pol II/TBP in the UBE2C locus in PC-3/DM MED1 cells compared with PC-3/WT MED1 cells. PC-3/DM MED1 cells and PC-3/WT MED1 cells were transfected with siControl and siMED1 3’UTR. Re-ChIP assays were conducted with antibodies against FoxA1 (for first ChIP) and Pol II/TBP (for re-ChIP) (mean [n=3] SD). (B) Disrupted UBE2C locus looping in PC-3/DM MED1 cells compared with PC-3/WT MED1 cells. 3C assays were performed using siControl and siMED1 3’UTR transfected PC-3/DM MED1 cells and PC-3/WT MED1 cells.

iv. Modeling and Analysis of Histone Modification Patterns (Dr. Shili Lin): Our goal is to model the histone methylation patterns in the promoter regions of known Refseq genes and to model the patterns in the known enhancer regions of AR target genes so that the models can be used to predict novel promoters and novel enhancers. Specifically, we have considered the profiles of H3K4me1, H3K4me2, and H3K4me3 around Refseq TSS for LNCaP and abl, which are androgen dependnet and androgen independent prostate cancer cell lines. Our analysis uncovered three unique binding patterns in both cell lines (Figure 1(a)). However, we found almost 700 genes whose profiles changed from LNCaP to abl (Figure 1(b)). We have also identified novel promoters, but this work is continuing.

Figure 1(a): LNCaP and abl Profiles: three unique patterns Figure 1(b): Patterns of changing profiles (top 2 rows)

v. Modeling and Analysis of Data for Regulation on the 3rd Dimension (Dr. Shili Lin): The expression of a gene is usually controlled by the regulatory elements in its promoter region. However, it has long been hypothesized that, in complex genomes, such as the human genome, a gene may be controlled by distant enhancers and repressors. Using Hi-C and ChIA-PET, data substantiating such hypotheses have been produced. However, due to the experimental process, it is known that false “loops” may exist due to “random collision”. We have been working on a hierarchical Bayesian model for analyzing such

Figure 1: The two numbers on the two ends of a segment indicate the originating chromosome(s) the pair, where solid lines are for intra chromosomal pairs and long dashed lines are for inter-chromosomal pairs.

large-scale genome-wide looping data, with particular attention focused on how to deal with the problem of random collision to reduce false positives. Our result clearly shows the existence of cluster centers on chromosomes 17 and 20 of inter-chromosomal interactions for Hi-C data from a breast cancer cell line (Figure 1). We have further extended our model to detect changes in the looping mechanism across time points or between cells with different conditions, such as hormone dependent and hormone independent. We are continuing this work to integrate with data from ChIP-chip/ChIP-seq and genomic annotation to increase the power of detecting true spatial interactions. Work on a more sophisticated model for analyzing time-course data is also continuing.

vi. CTCF Enhances AR Regulation in Conjunction with H3K4me2 and FoxA1 (Dr. Shili Lin): CTCF is highly conserved across different cells. It has been hypothesized that CTCF plays an important role in transcription regulation, including the notion that it may act as an inhibitor of long-range looping mechanism. We consider common CTCF binding sites across three cell lines, Jurkat, Hela and CD4. Since it is believed that CTCF is highly conserved, we use this common CTCF set as binding sites for LNCaP cell. Our work studies the role of CTCF in AR regulation in conjunction with H3K4me2, AR and FoxA1. This integrative study shows that genes in CTCF blocks with AR have higher expression than blocks without AR, leading to the hypothesis that CTCF can further enhance AR regulation within the block. Our optimal logit model, in which distances of AR and FoxA1 binding sites to the TSS and the simultaneous binding of AR and H3K4me2 are including as covariates, shows high sensitivity and specificity of predicting gene expression changes (Figure 2(a)). Most interesting, the set of genes that are predicted to be responsive is significantly enriched in its involvement in cancer-related pathways than the set that is predicted to be non-responsive, and the responsive genes are closer to AR and FoxA1 (Figure 2(b)). This result is novel in that it shows the role of CTCF in predicting cancer-related genes in conjunction with the

usual factors.

Figure 1(a): logit model predictions Figure 1(b): Genes predicted as responsive are closer to AR and FoxA1.

b. Aim 2: Interactive modeling of transcription hubs after ligand stimulation. i. A Modulated Empirical Bayes Model for Identifying Topological and Temporal Estrogen

Receptor α Regulatory Networks in Breast Cancer (Drs. Changyu Shen and Lang Li): Estrogens regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression (Fig 1). Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. Dynamic gene expression changes have been shown to characterize the breast cancer cell response to estrogens, the every molecular mechanism of which is still not well understood.We developed a modulated empirical Bayes model, and constructed a novel topological and temporal transcription factor (TF) regulatory network in MCF7 breast cancer cell line upon stimulation by 17β-estradiol stimulation. In the network, significant TF genomic hubs were identified including ER-alpha and AP-1; significant non-genomic hubs include ZFP161, TFDP1, NRF1, TFAP2A, EGR1, E2F1, and PITX2 (Fig 1). Although the early and late networks were distinct (<5% overlap of ERα target genes between the 4 and 24 h time points), all nine hubs were significantly represented in both networks. In MCF7 cells with acquired resistance to tamoxifen, the ERα regulatory network was unresponsive to 17β-estradiol stimulation. The significant loss of hormone responsiveness was associated with marked epigenomic changes, including hyper- or hypo-methylation of promoter CpG islands and repressive histone methylation.

Fig 1. ERα regulatory network af ter 4 hours E2 stimulation in MCF7 cells; and (B) ERα regulatory network af ter 24 hours E2 stimulation in MCF7 cells

ii. RNA-seq analysis of platinum-resistant vs. platinum-sensitive ovarian cancer cells (Drs. Ken Nephew and Yunlong Liu): toward our overall objective of identifying signaling networks (Specific Aim 2, Section N4.4) that contribute to chemotherapy resistance, we have achieved significant progress toward identifying resistance-associated misexpression of specific microRNAs (miRNAs), as well as misexpression of a more recently discovered family of noncoding RNAs, large intergenic non-coding RNAs (lincRNAs). Similar to miRNAs, lincRNAs have now been demonstrated to play a role in oncogenesis, both as tumor suppressors or tumor promoters. To accurately and extensively assess altered expression of coding and noncoding RNAs in chemotherapy-resistant ovarian cancer cells, we have now optimized and successfully performed a high-throughput, precise approach, “strand-specific RNA-seq.” RNA-seq has been used extensively in the past two years to study transcriptome structure and function and during this time there have been more than 100 papers published using the this method on the Illumina (San Diego, CA) Genome Analyzer sequencing platform. In progress toward our projected rigorous study of ovarian cancer transcriptome changes linked to acquired cisplatin resistance, we have now successfully performed RNA-seq on our parental, chemosensitive A2780 cells. Total cellular RNA was fragmented using a protocol accompanying a RNA fragmentation reagent kit (Ambion, Austin, TX), reverse transcribed to cDNA and end repair, followed by ligation to adapters (SRA v1.5, Illumina) complementary to Genome Analyzer flow cell surface oligonucleotides. Subsequent to adapter ligation, the cDNA library was then subjected to elimination of high copy cDNA, largely resulting from ribosomal RNA (an essential enrichment step, as rRNA accounts for ~85% of total RNA in mammalian cells), accomplished by moderate temperature (68 oC) annealing (for preferential hybridization of high abundance nucleic acids), followed by digestion with duplex-specific nuclease (DSN; Evrogen, Inc.). However, using the Illumina protocol, which specifies a 20-min digestion at 68oC in 500 mM NaCl, substantial digestion of the cDNA library (unacceptable) was observed. Based on the well-known effect of salt in reducing stringency of nucleic acid annealing, we hypothesized 500 mM NaCl to be too permissive (thus allowing undesirable annealing of the cDNA library-ligated adapters). Consequently, based on DNA sequence (OligoAnalyzer 3.1, Integrated DNA Technologies, Coralville, IA), we determined that at 68 deg. C., adapter annealing would be inhibited at 20 mM NaCl, thus restricting annealing to high copy cDNA species (Figure 1). Following our DSN optimization, RNA-seq was performed on the A2780 parental (cisplatin-naive) cells, using the Illumina platform (currently, our RNA-seq analysis of our five-fold drug-treated, chemoresistant cells is ongoing).

iii. Integrated microRNA and mRNA analysis, of ovarian cancer cells treated with histone deacetylase

Figure 1. Optimization of duplex-specific nuclease (DSN)-mediated reduction of 18S and 28S ribosomal RNA (light blue bar represents ionic strength recommended by Illumina.

Figure 2. Gene expression clustering and pathway analysis following siRNA downregulation of miRs-221 and -222 in fulvestrant-resistant MCF7 cells.

(HDAC) inhibitors (Drs. Ken Nephew and Sun Kim): As part of our pilot project (Section N4.7), we wished to assess the mechanism of action of a specific HDAC inhibitor, OSU-HDAC42, which we had previous demonstrated to possess anticancer activity against ovarian cancer cells and xenograft tumors. In this study, we assessed OSU-HDAC42 for possible changes in cell signaling pathways, using our previously published analysis web tool, “MicroRNA and mRNA Integrated Analysis” (MMIA, cancer.informatics.indiana.edu/mmia). Following identical treatments (1.0 µM for 24- or 48-hr), total RNA was isolated from cisplatin-resistant CP70 cells and analyzed for miRNA (24 hr) and protein-coding (mRNA, 48-hr) gene expression, using customized and Affymetrix (Santa Clara, CA) microarrays, respectively. Those miRNA/mRNA expression results were then compiled and analyzed using the MMIA algorithm, yielding several intracellular pathways predicted to be altered by OSU-HDAC42-associated changes in miRNA expression. Consistent with previous studies demonstrating OSU-HDAC42 anticancer activity, this composite analysis predicted upregulation of the apoptotic death receptor pathway and downregulation of the oncogenic cascade Wnt. Ongoing pathway reporter studies have now preliminarily validated decreased Wnt signaling, following CP70 treatment with OSU-HDAC42 (data not shown). Overexpression of two miRNAs in fulvestrant-resistant breast cancer. In another study, of pathways contributing to breast cancer antiestrogen resistance, from the widely used, estrogen receptor (ER)-positive breast cancer cell line MCF7, we established a fulvestrant antihormone-resistant subline (“MCF7-F”), a significant and dire clinical phenomenon. Similar to our HDAC inhibitor study (described above), we subjected MCF7-F cells to miRNA profiling, demonstrating >10-fold upregulation of two miRNAs, miR-221 and miR-222, previously implicated in several carcinomas. We next examined the biological effects of miRs-221/222, by loss of function in MCF7-F cells (using stable antisense siRNA). Those results demonstrated that MCF7-F miR-221/222 “knockdown” resulted in upregulation of several tumor suppressive pathways, including p53, adherens junctions, and cell adhesion (Figure 2). This study is now “in press”.

iv. ChIP-seq defined genome-wide map of TGFβ/SMAD4 targets: implications with clinical outcome of ovarian cancer patients (Drs. Victor Jin and Tim Huang): A2780 is a human epithelial ovarian cancer cell line, but not an aggressive cancer line. The A2780 cells are still sensitive to a key chemotherapeutic drug cisplatin, cis-diamminedichloroplatinum(II). The A2780 cells have only an intermediate level of TGFβ dysregulation: they are still able to induce SMAD4 expression and transduce existing SMAD4 from the cytoplasm to the nucleus following TGF-β stimulation. As such, this cancer line is often used as a model of studying ovarian cancer. In our study, we will also first use this cell line to genome-wide mapping of SMAD4 target genes, then identify the TGFβ /SMAD4-regulated genes pathways implicated in ovarian cancer patients.In this study, we applied the ChIP-seq technology to study TGFβ/SMAD4 regulation in platinum-sensitive ovarian cancer cell line A2780. We profiled SMAD4 binding sites in this cell with TGFβ stimulation. Combining with computational approaches, we have investigated the binding pattern for SMAD4 and also compare it with normal ovarian epitheilial cell (IOSE) from our previous study.The TGFβ stimulated and unstimulated samples were each produced in two biological replicates, A and B. The reads from these two replicates were combined in to a single data set, stimulated having 32,228,812 uniquely mapping reads and unstimulated having 25,705,696. Both samples in the combined data set were processed using BELT with a 300 nucleotide bin size at an acceptance threshold of 0.996. This produced 1,723 and 1,499 numbers of peaks respectively after excluding peaks which were not within 100 knt of a known

453

24

1246

Treated Sample,1723 peaks

1063

412

24

Control Sample,1499 peaks

A

B

gene. We further processed these two datasets by removing spurious peaks which overlapped with read-peaks in the input lane for replicate A.The distribution of SMAD4 peaks in a histogram plotted by relative to gene nearest that, plotted for all genes (Figure 1A). TGFβ stimulated only peaks are peaks whose nearest gene has a peaks only in the stimulated set, likewise for unstimulated only. A shift peak has a peak appearing on the same gene in both samples and they are greater than 1,000 nt apart. A basal peaks appear on the same gene in both samples are they are <= 1,000 nt apart (Figure 1B).We also correlated the SMAD4 binding peaks with gene expression changes after TGFβ stimulation. A heatmap of the expression fold changes for genes between the unstimulated sample and the stimulated sample. The segments separated by small gaps are up regulated, no change, and down regulated in vertical order. Up- and down regulated genes are defined as having a log fold change greater than 0.5 or less than -0.5 respectively (Figure 2A).

Total Treated Only

Shift

0

50

100

150

200

250

300

350

Up

Down

Gene Expression

1827genes

1303genes

A B C

Overlaps between genes with SMAD4 binding in the stimulated data set which have significant and differential expression between the two microarray samples. This shows three main categories of genes in relation to the expression and chip-sequence data, those with differential expression and no near peak, those with a near peak and similar expression, and those with both (Figure 2B).A sample of the differences between the GO annotation of the three main segments of the Venn diagram in Figure 2B. Those with only a peak but no differential gene expression, those with both differential gene expression and a near peak, and those with only differential gene expression but no near peak (Figure 2C).

c. Aim 3: Stochastic modeling of permissive and non-permissive epigenetic marks based on altered histone/DNA methylation profiles.i. An empirical Bayes model for gene expression and methylation profiles in antiestrogen

resistant breast cancer (Dr. Changyu Shen): The nuclear transcription factor estrogen receptor alpha (ER-alpha) is the target of several antiestrogen therapeutic agents for breast cancer. However, many ER-alpha positive patients do not respond to these treatments from the beginning, or stop responding after being treated for a period of time. Because of the association of gene transcription alteration and drug resistance and the emerging evidence on the role of DNA methylation on transcription regulation, understanding of these relationships can facilitate development of approaches to re-sensitize breast cancer cells to treatment by restoring DNA methylation patterns.

We constructed a hierarchical empirical Bayes model to investigate the simultaneous change of gene expression and promoter DNA methylation profiles among wild type (WT) and OHT/ICI resistant MCF7 breast cancer cell lines.We found that compared with the WT cell lines, almost all of the genes in OHT or ICI resistant cell lines either do not show methylation change or hypomethylated. Moreover, the correlations between gene expression and methylation are quite heterogeneous across genes, suggesting the involvement of other factors in regulating transcription. Analysis of our results in combination with H3K4me2 data on OHT resistant cell lines suggests a clear interplay between DNA methylation and H3K4me2 in the regulation of gene expression. For hypomethylated genes with alteration of gene expression, most (~80%) are up-regulated, consistent with current view on the relationship between promoter methylation and gene expression.

ii. Heritable Methylation in Cisplatin Resistant Ovarian Cancer Cells (Dr. Lang Li): Deoxycytosine methylation, within the dinucleotide CpG, is a stable and heritable epigenetic modification that is crucial for regulating gene dosages and silencing harmful DNA elements. In tumor progression, however, this process is highly dysregulated, resulting in a dramatic global redistribution of CpG methylation patterns. In this study, using a DNA methylation microarray assay, we examined CpG methylation patterns associated with the progression to cisplatin resistance in chemosensitive A2780 ovarian cancer cells. Following 1, 3, and 5 cycles of cisplatin selection, respectively increasing GI50 doses from 5 to 20 to 35 µM, we observed a linear increase in the associated number of hypermethylated (400, 650, and 900), and hypomethylated (50, 400, and 1200) loci, as compared to untreated cells. Hierarchical clustering further revealed that methylation levels in subsets of hyper- and hypomethylated loci sequentially increased or decreased with drug selection, respectively, thus demonstrating heritable hyper- and hypomethylation, while most loci displayed random methylation patterns. Heritable hypomethylation also correlated (p=0.0001) with gene upregulation. Interestingly, numerous transcription factor-binding sites (TFBSs) showed enrichment within heritable hypermethylated (51) and hypomethylated loci (p=0.00003), but not in randomly methylated loci. Pathway analysis further revealed hypermethylated sequences to comprise pathways such as cytosketeton remodeling, cell cycle regulation, and proteolysis, while hypomethylated loci were enriched in processes such as apoptosis/stress response, epithelial-to-mesenchymal transition (EMT), oxidative response, and MAPK signaling. Similarly, specific TFBSs enriched in hypomethylated regions correlated with oncogenic processes including EMT, Wnt signaling, and cell proliferation. Together, these results demonstrate that heritable DNA hypomethylation and hypermethylation of specific loci is likely pivotal to pathways that mediate ovarian cancer chemoresistance, presumably by influencing transcription factor binding.

iii. To define chromatin structure for permissive or non-permissive binding of gTFs, cTFs, and RNA Pol II in hormone-/chemo-sensitive vs. –insensitive prostate cancer cells (Dr. Qianben Wang)In mouse and human prostate cancer, the expression of the FoxA1, a member of Forkhead transcription factor family, persists in early- and

Fig 2. Grouping genes into nine categories based on gene expression (down, no change, up) and methylation (hyper, no change, hypo). The three numbers correspond to dif ferent rules of def ining the nine categories. (A) OHT vs. WT. (B) ICI vs. WT.

M DOWN No change UP

HYPER 0/0/0 1/1/0 0/0/0No change 86/20/9 2309/1518/922 237/114/64

HYPO 50/12/7 1350/811/526 149/61/40

GE

A

B

Rou

nd1

Con

trol

Rou

nd3

Con

trol

Rou

nd5

Con

trol

Rou

nd1

Trea

ted

Rou

nd3

Trea

ted

Rou

nd5

Trea

ted

Her

itabl

eH

ypo

Met

hyla

tion

Her

itabl

eH

yper

M

ethy

latio

n

Ran

dom

Met

hyla

tion

Low High

Figure 3 Heatmap of methylation patterns. Three groups of methylation are arranged to indicate the dramatic dif ferent methylationtrends from control to treatments, including heritable hypo-methylation, heritable hyper-methylation and random methylation. Color doesn’t ref lect intensity dif ference among rows, but only on columns of the same rows.

late-phase of the disease. However, the functional role of FoxA1 in critical cellular processes such as prostate cancer growth is poorly understood. Given that FoxA1 functions as a pioneer factor for AR and enhances the expression of a subset of AR target genes (e.g. PSA and UBE2C) in ADPC and CRPC, and in view of the fact that AR expression is necessary for both ADPC and CRPC growth, we hypothesize that FoxA1 is also required for ADPC and CRPC growth. We tested the effect of FoxA1 silencing on cell proliferation. Surprisingly, we found that FoxA1 silencing selectively decreases the growth of CRPC cell model abl but not ADPC cell model LNCaP (data not shown).

Since we have found that FoxA1 and AR co-regulate M-phase target genes such as UBE2C in abl cells, we reasoned that the decreased cell growth of abl cells is caused by a cell cycle G2/M block. Unexpectedly, FACS analysis on unsynchronized LNCaP and abl cells transfected with siFoxA1 or siControl revealed that FoxA1 silencing specifically leads to a G1/S block in abl cells but not in LNCaP cells (Figure 1C). Since AR silencing blocks abl cells at G2/M phase, these data suggests that the function of FoxA1 in promoting cell

cycle G1/S transition is independent of AR. To identify FoxA1-regulated genes that account for G1/S transition, we performed quantitative RT-PCR and Western blots to measure the mRNA and protein expression of known G1 phase genes CCND1, CCND3, CCDE1, CCNE2, CCNA2, CDK2, CDK4, CDKN1A and CDKN1B in LNCaP and abl cells transfected with siFoxA1 or siControl. We found that the mRNA and protein expression levels of CCNA2 and CCNE2 are significantly higher in abl cells than in LNCaP cells. Interestingly, FoxA1 silencing selectively decreases the expression of CCNA2 and CCNE2 in abl and not in LNCaP (data not shown). To investigate the mechanisms for differential regulation of CCNA2 and CCNE2 by FoxA1 between LNCaP and abl cells, we performed whole genome FoxA1 ChIP-on-chip in LNCaP and abl cells. Based on a threshold p<1E-4, we identified 22,492 FoxA1 binding sites in LNCaP cells and 43,132 FoxA1 binding regions in abl cells using the MAT algorithm. The observation that there are no FoxA1 binding sites within 100 kb (-50 kb ~ +50 kb) of transcription start sites (TSS) of CCNA2 in abl cells suggests that CCNA2 is not a direct target of FoxA1. By contrast, we found greater occupancy of 8 FoxA1 binding sites near CCNE2 gene in abl cells than in LNCaP cells. Direct ChIP analysis confirmed the stronger FoxA1 binding at these 8 sites in abl cells than in LNCaP cells (data not shown). This data suggests that CCNE2 is a direct target gene of FoxA1 in abl but not in LNCaP cells. The molecular mechanisms underlying the selective binding of FoxA1 at CCNE2 regulatory regions are currently under investigation.

d. Aim 4: Pattern recognition algorithms for predicting transcription factor binding sites and methylation-prone or resistant sequences.

i. Modeling DNA methylation susceptibility (Dr. Sun Kim): A number of studies have recently identified distinct features of genomic sequences that can be used for modeling specific DNA

Figure 5. FoxA1 and AR binding at the regulatory region of CCNE2. The arrows indicate eight regions have higher FoxA1 binding in abl cells than in LNCaP cells.

sequences that may be susceptible to aberrant CpG methylation in both cancer and normal cells. Furthermore, next generation sequencing technology now makes it possible to measure CpG site-specific methylation on a genome-wide scale across different cell types. However, currently there are no reports on modeling cell-type specific DNA methylation susceptibility. Thus, we conducted comprehensive modeling of cell-type specific DNA methylation susceptibility at three different resolutions: individual CpG sites, CpG segments, and promoter regions. Using a k-mer ({xi} (figure to the right) mixture logistic regression model, we showed that DNA methylation susceptibility was effectively modeled across five different cell types. Further, at the segment level, up to 0.75 in AUC prediction accuracy in a 10-fold cross validation study was achieved using a mixture of k-mers. The significance of these results is three fold: 1) the first report to indicate that CpG methylation susceptible segments {Sk} that can be defined by boundary variables {Bi} (figure to the right) exist by minimizing errors (the objective function O(S) in the figure to the right) as opposed to modeling methylation susceptibility at the the promoter region, CpG island level or at the CpG site specific level; 2) our model shows the significance of certain k-mers for the mixture model, which can potentially highlight DNA sequence features (k-mers) of differentially methylated promoter CpG island sequences across different tissue types; 3) as previous studies have only used 3 or 4 bp patterns in modeling DNA methylation susceptibility, our study is the first to demonstrate that k-mer modeling can be performed up to 6-mer without the loss of modeling accuracy. A manuscript for the k-mer logistic regression modeling is currently in review.

III. Education and Outreacha. OSU-IU CCSB Computation Workshop (June 7th, 2010) was held in conjunction with the Ohio-

Michigan Illumina Sequencing meeting (June 8th – 9th, 2010) this year. This joint platform allowed us to showcase our integrative cancer epigenetics approaches to 30+ scientists from 8 Midwest sequencing facilities and to 70+ researchers from Columbus, Indiana and West Virginia. The agendas of both meeting are displayed below.

b. 2010 NCI CCSB Summer Fellowsi. IU-CCSB summer fellows : In our commitment to cancer systems biology education, we hosted

two Summer 2010 CCSB Fellows, Mr. Karl (Cong) Gao, of Georgia Institute of Technology (Atlanta, GA), and Mr. Phillip Wu, of the University of California at San Diego. Mr. Wu received extensive training in bioinformatic analyses of gene expression and DNA methylation

assessments of ovarian cancer patient tumor biopsies, before and after treatment with a DNA methyltransferase inhibitor, decitabine, in a phase I clinical trial (National Cancer Institute clinical trials identifier NCT00477386, clinicaltrials.gov/ct2/show/NCT00477386) (Sept. 2010, Cancer 116(17):4043-4053). Phillip was mentored by Dr. Meng Li, our Senior Bioinformatician (currently a Bioinformatics Specialist of Health Sciences, University of Southern California Norris Medical Library), training in the use of Mehthylumi (a bioconductor package for R) for analyzing methylation data (Infinium BeadChip, Illumina), Gene Set Enrichment Analysis (GSEA, 2005, Proc Nat Acad Sci, Subramanian et al. 15545-15550) and Mootha et al., 2003, Nat Genet 34: 267-273) and PathwayExpress (Draghici et al. 2007, Genome Res 17: W1537-W1545) for integrating gene expression (Affymetrix) and DNA methylation ((Infinium BeadChip, Illumina)) data for gene set/pathway analyses. That contribution to our phase I clinical trial resulted in the identification of several genes upregulated by decitabine, including the tumor suppressor/axon guidance-related gene, Semaphorin 3a.

In Mr. Gao’s ICBP Summer 2010 research project (Mentor, Mr. David Miller, Senior Research Associate), he extensively utilized quantitative PCR (qPCR) analysis for the assessment of RNA/cDNA integrity within a library generated from nucleic acids present in cisplatin-resistant vs. –sensitive ovarian cancer cells, in preparation for the use of that library for the high-throughput systems biology method, RNA-seq. Additionally, Karl contributed significantly to the characterization of a number of other genes found dysregulated in our chemoresistant ovarian cancer cell line, including gene encoding a long, intergenic noncoding RNA (lincRNA), HOTAIR, similar to its previously demonstrated overexpression in aggressive breast cancer cells (Apr. 2010, Nature 464:1071).

ii. OSU-CCSB summer fellows: We hosted two summer 2010 CCSB fellows: Ms. Leslie Watkins of Meredith College (Raleigh, NC), and Ms. Ye (Phoenix) Xu of Rutgers University (New Jersey). The first day of their training coincided with our computation workshops. The students were exposed to cutting-edge sequencing experiments and new approaches to integrate high-throughput epigenetic data. The students enrolled in the Mathematical Biosciences Institute Summer Undergraduate Research Program and participated in all the morning lectures. For the hands-on research components, we implemented a team approach to mentor our summer fellows. Michael Trimachi, PhD candidate in Tim Huang Lab, trained the students on basic laboratory techniques, including cell culture protocols, pipetting, and scientific calculations. Ben Rodriguez, PhD candidate in Tim Huang Lab, helped the summer fellows understand cancer epigenetics. He provided pertinent review articles to the students and discussed with them key concepts they needed for their summer project. In addition, Ben showed the students how to perform bisulfite-modified DNA PCR assays, including primer designs, database creation for storing assay records, standard PCR techniques, and data interpretation. Ben also introduced the students to data analysis functions found in Microsoft Excel, provided explanation to how the NCBI (National Council of Biotechnology Information) and UCSC (University of California-Santa Cruz) curate gene records. He also showed the students how to parse these records, how to retrieve sequence information from each database, and provided an overview of the various features of the UCSC genome browser and how this information applied to their summer project. Dr. Cenny Taslim, a post-doctoral researcher in Kun Huang/Shili Lin Group, showed the student how to pre-process ChIP-seq data for subsequent data analysis. This included performing a scaling normalization to the sequencing data using Matlab program. The students then performed comparative analysis between MCF7 and tamoxifen-resistant MCF7 before and after estradiol-treatment using a custom R-based analysis program. The students also learned to convert analyzed sequencing data so that they could be visualized on UCSC Genome Browser. Finally, the students were trained to perform functional analysis on gene promoters showing differential epigenetic bindings in the two breast

cancer cell lines after estradiol treatment. Dr. Pearlly Yan, EOC representative, assisted the students in project presentations and final report preparation.

iii. Visiting computation scientists at IU-Bloomington: At Indiana University, we hosted a Postdoctoral Fellow, Dr. Seung Yoon Nam, after receiving his Ph.D. (Seoul National University), from September 2008 through February 2010. Dr. Nam contributed significantly to our computational cancer biology analyses, successfully designing an integrated microRNA/mRNA web tool to assess microRNA regulation of target mRNAs (denoted as “microRNA and mRNA integrated analysis,” MMIA, cancer.informatics.indiana.edu/mmia), while also predicting the impact of such regulation on specific biological pathways, using gene-set enrichment and transcription factor-binding site assessments (Jul 2009, Nucleic Acids Res 37:W356-362). Dr. Nam also provided invaluable bioinformatics support for numerous ongoing laboratory projects, including successfully identifying an 11 gene-set “signature” of breast cancer cell bone metastasis/oncolysis (Dec. 2010, Semin Cell Dev Biol 29(9):951-960), while also performing rigorous gene-set/pathway enrichment analyses of gene/microRNA expression data for drug-resistant ovarian and breast cancer cells, and for ovarian cancer cells treated with a histone deacetylase inhibitor (ongoing work). Additionally, at Indiana University, we hosted a Visiting Scientist, Dr. Man-Wook Hur, a Professor of Biochemistry and Molecular Biology at Yonsei University, Seoul, Korea, from March – August, 2010. With respect to systems biology, Dr. Hur received training in genome-wide mapping of transcription factors, using computational approaches for the analysis of the high-throughput techniques, chromatin immunoprecipitation coupled to deep sequencing (“ChIP-seq”).

IV. OSU CCSB 2010 Supplemental Granta. Our proposal, entitled ‘Contributing Methylation

Immunoprecipitation Recovery Assay (MIRA)-Seq Data to Enrich Integrated Analysis of ICBP-43 and LBNL-17 Breast Cancer Cell Lines’ was funded by the NCI CCSB grant officers. This project was performed in collaboration with the LBNL-CCSB Center. Dr. Paul Spellman, in conjunction with Nick Wang, provided us with genomic DNA from the lines they previously derived Affymetrix exon array and aCGH data. The methylated region of the genome was captured by MBD2 protein (Invitrogen MethylMiner Kit) followed by sequencing library generation and massively parallel sequencing. The entire project took less than two months to complete. Data were quickly disseminated to the LBNL-CCSB group for data sharing. In our site, the data are stored in the QUEST database and currently being analyzed by Specific Aim 4 computation scientists. Figure below showed a scene shot of the set of data in QUEST.

V. Publications1. An empirical Bayes model for gene expression and methylation profiles in antiestrogen resistant breast cancer.Jeong J, Li L, Liu Y, Nephew KP, Huang TH, Shen C.BMC Med Genomics. 2010 Nov 25; 3(1):55. [Epub ahead of print]PMID: 21108837

2. An overview of epigenetics and chemoprevention.Huang YW, Kuo CT, Stoner K, Huang TH, Wang LS.FEBS Lett. 2010 Nov 5. [Epub ahead of print]PMID: 21056563

3. Genome-wide mapping of RNA Pol-II promoter usage in mouse tissues by ChIP-seq.

Sun H, Wu J, Wickramasinghe P, Pal S, Gupta R, Bhattacharyya A, Agosto-Perez FJ, Showe LC, Huang TH, Davuluri RV.Nucleic Acids Res. 2010 Sep 14. [Epub ahead of print]PMID: 20843783

3. Chromatin remodeling in mammary gland differentiation and breast tumorigenesis.Huang TH, Esteller M.Cold Spring Harb Perspect Biol. 2010 Sep;2(9):a004515. Epub 2010 Jul 7.PMID: 20610549

4. Methods in DNA methylation profiling.Zuo T, Tycko B, Liu TM, Lin HJ, Huang TH.Epigenomics. 2009 Dec 1; 1(2):331-345.PMID: 20526417

5.Estrogen-mediated epigenetic repression of large chromosomal regions through DNA looping.Hsu PY, Hsu HK, Singer GA, Yan PS, Rodriguez BA, Liu JC, Weng YI, Deatherage DE, Chen Z, Pereira JS, Lopez R, Russo J, Wang Q, Lamartiniere CA, Nephew KP, Huang TH.Genome Res. 2010 Jun;20(6):733-44. Epub 2010 May 4.PMID: 20442245

6. Profiling DNA methylomes from microarray to genome-scale sequencing.Huang YW, Huang TH, Wang LS.Technol Cancer Res Treat. 2010 Apr; 9(2):139-47. Review.PMID: 20218736

7. Nucleosome dynamics defines transcriptional enhancers.He HH, Meyer CA, Shin Y, Bailey S, Wei G, Wang Q, Zhang Y, Xu K, Ni M, Lupien M, Mieczkowski P, Lieb JD, Zhao K, Brown M and Liu XS.Nat Genet, 42: 343-347, 2010.PMCID: PMC2932437

8. Histone modifications and chromatin organization in prostate cancer.Chen Z, Wang L, Wang Q* and Li W* (* Corresponding author).Epigenomics, 2:551-560, 2010.

9. Phospho-MED1-induced chromatin looping drives androgen receptor negative prostate cancer growth.Chen, Z., Zhang, C., Wu, D., Rorick A., Zhang, X., and Wang, Q.EMBO J, Under revision, 2010.

10. Ca2+/Calmodulin-dependent protein kinase -mediated activation of AMP-activated kinase is required forandrogen-dependent migration of prostate cancer cells.Frigo DE, Howe MK, Wittmann BM, Brunner AM, Cushman I, Wang Q, Brown M, Means AR, McDonnell DP. Cancer Res, 2010 Nov 22 [Epub ahead of print].

11. Multivalent epigenetic marks confer microenvironment-responsive epigenetic plasticity to ovariancancer cells.Bapat SA, Jin V, Berry N, Balch C, Sharma N, Kurrey N, Zhang S, Fang F, Lan X, Li M, Kennedy B, BigsbyRM, Huang TH, Nephew KP (2010). Epigenetics 5(8): 716-729.

12. Cytokeratins potentiate the antiestrogenic activity of fulvestrant.Long X, Fan M, Nephew KP (2010).Cancer Biol Ther 9:389-96.

13. MicroRNA-221/222 confers breast cancer fulvestrant resistance by regulating multiple signaling

pathways.Rao X Di Leva G Li M, Fang F, Hartman-Frey C, Burow ME, Croce C, Nephew KP (2010). Oncogene Nov 8, 2010. [Epub ahead of print].

14. Derepression of CLDN3 and CLDN4 expression during ovarian tumorigenesis by crosstalk betweenmultiple epigenetic modifications.Kwon MJ, Kim S-S, Choi Y-L, Jung HS, Kim S-H, Song Y-S, Balch C, Marquez VE, Nephew KP, Shin YK (2010).Carcinogenesis, 31:974-83.PMCID: PMC2878357

15. A phase I and pharmacodynamic study of decitabine in combination with carboplatin in patients withrecurrent, platinum-resistant, epithelial ovarian cancer.Fang F, Balch C, Schilder S, Breen T, Zhang S, Shen C, Li L, Kulesavage C, Snyder AJ, Nephew KP*, Matei DE* (2010).Cancer 116(17):4043-53 (*corresponding authors).PMCID:PMC2930033

16. A rationally designed histone deacetylase inhibitor with distinct antitumor activity against ovarian cancer.Yang YT, Balch C*, Kulp SK, Mand MR, Nephew KP, Chen CS* (2009) (*co-corresponding authors).Neoplasia 11(6):552-563PMCID: PMC2685444

17. Epigenetics and ovarian cancer. In: Ovarian Cancer: Second Edition, S Stack, DA Fishman, Editors.Nephew KP, Balch C, Zhang S, Huang TH (2009).Springer Medicine, New York, NY.

18. Role of epigenomics in ovarian and endometrial cancers.Balch C, Matei DM, Huang TH-M, Nephew KP (2010).Epigenomics, 2:419-447.

19. Minireview: epigenetic changes in ovarian cancer.Balch C, Fang F, Matei DE, Huang TH, Nephew KP (2010).Endocrinology 150(9): 4003-4011.PMCID: PMC2736079

20. EGFR signaling in breast cancer: Bad to the bone.Foley J, Nickerson N, Nam S, Allen KT, Gilmore JL, Nephew KP, Riese DJ 2nd (2010).Semin Cell Dev Biol, in press.

21. Comparing multiple protein binding profiles in ChIP-seq experiments.Ozer H-G, Wu J, Huang Y-W, Parvin J, Huang T, Huang K (2010).The Proceedings of the Computational Systems Bioinformatics (CSB), Palo Alto, California.

22. Peak detection on ChIP-Seq data using wavelet transformation.Wu H-Y, Zhang J, Huang K (2010).The Proceedings of the Workshop on Data Mining for NGS Data in the IEEE International Conference on Bioinformatics and BioMedicine (BIBM’10), Hong Kong.

23. HRTBLDb: an informative data resource for Hormone Receptors Target Binding Loci.Kennedy, B.A., Gao, W., Huang, T.H.-M., Jin, V.X (2010).Nucleic Acids Res. 38, D676-D681, 2010.

24. Analyzing ChIP-seq data: preprocessing, normalization, differential identification and binding patterncharacterization.Taslim, C., Huang, K., Huang, T., Lin, S. (2010)

In Methods in Molecular Biology. Ed. Junbai Wang. Humana Press.

25. Prioritization of disease microRNAs through a human phenome-microRNAome network.Jiang Q., Hao Y., Wang G., Juan L., Zhang T., Teng M., Liu Y., Wang Y (2010).BMC Systems Biology, 2010 28; 4 S1:S2PMCID: PMC20522252

26. Signal transducers and activators of transcription-1 (STAT1) regulates microRNA transcription ininterferon γ – stimulated HeLa cells.Wang G., Wang Y., Teng M. Zhang D., Li L., and Liu Y. (2010).PLoS ONE, 2010 26;5 (7):e11794PMCID: PMC20668688

27. RNA Polymerase II binding patterns reveal genomic regions involved in microRNA gene regulation.Wang G., Wang Y., Shen C., Huang Y., Huang K., Huang T. H-M, Nephew K.P., Li L., Liu Y. (2010).PLoS ONE. 2010 2;5(11):e13798PMCID: PMC21072189

28. An empirical Bayes model for gene expression and methylation profiles in antiestrogen resistantbreast cancer.Jeong, J., Li, L., Liu, Y., Nephew, K.P., Huang, T. H-M., and Shen, C. (2010).BMC Medical Genomics doi:10.1186/1755-8794-3-55