+ All Categories
Home > Documents > ResearchArticle COPAR: A ChIP-Seq Optimal Peak...

ResearchArticle COPAR: A ChIP-Seq Optimal Peak...

Date post: 26-Aug-2018
Category:
Upload: lythien
View: 227 times
Download: 0 times
Share this document with a friend
5
Research Article COPAR: A ChIP-Seq Optimal Peak Analyzer Binhua Tang, 1,2 Xihan Wang, 1 and Victor X. Jin 3 1 Epigenetics & Function Group, School of Internet of ings, Hohai University, Jiangsu 213022, China 2 School of Public Health & Biostatistics, Shanghai Jiao Tong University, Shanghai 200025, China 3 Department of Molecular Medicine & Biostatistics, University of Texas Health Science Center, San Antonio, TX 78249, USA Correspondence should be addressed to Binhua Tang; [email protected] Received 28 October 2016; Accepted 14 February 2017; Published 5 March 2017 Academic Editor: Xingming Zhao Copyright © 2017 Binhua Tang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Sequencing data quality and peak alignment efficiency of ChIP-sequencing profiles are directly related to the reliability and reproducibility of NGS experiments. Till now, there is no tool specifically designed for optimal peak alignment estimation and quality-related genomic feature extraction for ChIP-sequencing profiles. We developed open-sourced COPAR, a user-friendly package, to statistically investigate, quantify, and visualize the optimal peak alignment and inherent genomic features using ChIP- seq data from NGS experiments. It provides a versatile perspective for biologists to perform quality-check for high-throughput experiments and optimize their experiment design. e package COPAR can process mapped ChIP-seq read file in BED format and output statistically sound results for multiple high-throughput experiments. Together with three public ChIP-seq data sets verified with the developed package, we have deposited COPAR on GitHub under a GNU GPL license. 1. Introduction Next-generation sequencing (NGS) integrated with ChIP technology provides a genome-wide perspective for biomed- ical research and clinical diagnosis applications [1–3]. Data quality and peak alignment of ChIP-sequencing profiles are directly related to the reliability and repro- ducibility of analysis results. For example, ChIP-seq data characterize alteration evidence for transcription factor (TF) binding activities in response to chemical or environmental stimuli, but if the ChIP-seq alignment is poorly selected, any follow-up analysis may lead to inaccurate TF binding results and inevitable loss of biological meanings [4, 5]. e mostly investigated items in ChIP-seq peak calling procedures are peak number, false discovery rate (FDR), cor- responding bin-size, and other statistical thresholds selected in each analysis. Without exception, such arguments form impenetrable barriers for biologists and bioinformaticians to choose a suitable pair condition for analyzing experimental results. And to our knowledge, few literatures or application notes focus on such topics; thus herein we propose a flexible package based on feature extraction and signal processing algorithms for solving such an argument-selection optimiza- tion problem in optimal peak alignment. In summary, the package COPAR can quantitatively measure NGS/ChIP-seq experiment quality through global peak alignment comparison and extract genomic features based on spectrum method for in-depth analysis of ChIP- sequencing profiles. 2. Materials and Methods 2.1. Optimal Peak Alignment Estimation. For determining optimal ChIP-seq alignment, we need to analyze peak num- bers under specific argument constraints. us we acquire optimal peak numbers by constraining specific arguments, which can be formalized as a class of optimal track analysis, illustrated as arg max , s.t. ≤ , = , ≤ , (1) Hindawi BioMed Research International Volume 2017, Article ID 5346793, 4 pages https://doi.org/10.1155/2017/5346793
Transcript
Page 1: ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzerdownloads.hindawi.com/journals/bmri/2017/5346793.pdf · ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzer BinhuaTang,1,2

Research ArticleCOPAR: A ChIP-Seq Optimal Peak Analyzer

Binhua Tang,1,2 XihanWang,1 and Victor X. Jin3

1Epigenetics & Function Group, School of Internet of Things, Hohai University, Jiangsu 213022, China2School of Public Health & Biostatistics, Shanghai Jiao Tong University, Shanghai 200025, China3Department of Molecular Medicine & Biostatistics, University of Texas Health Science Center, San Antonio, TX 78249, USA

Correspondence should be addressed to Binhua Tang; [email protected]

Received 28 October 2016; Accepted 14 February 2017; Published 5 March 2017

Academic Editor: Xingming Zhao

Copyright © 2017 Binhua Tang et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Sequencing data quality and peak alignment efficiency of ChIP-sequencing profiles are directly related to the reliability andreproducibility of NGS experiments. Till now, there is no tool specifically designed for optimal peak alignment estimation andquality-related genomic feature extraction for ChIP-sequencing profiles. We developed open-sourced COPAR, a user-friendlypackage, to statistically investigate, quantify, and visualize the optimal peak alignment and inherent genomic features using ChIP-seq data from NGS experiments. It provides a versatile perspective for biologists to perform quality-check for high-throughputexperiments and optimize their experiment design. The package COPAR can process mapped ChIP-seq read file in BED formatand output statistically sound results for multiple high-throughput experiments. Together with three public ChIP-seq data setsverified with the developed package, we have deposited COPAR on GitHub under a GNU GPL license.

1. Introduction

Next-generation sequencing (NGS) integrated with ChIPtechnology provides a genome-wide perspective for biomed-ical research and clinical diagnosis applications [1–3].

Data quality and peak alignment of ChIP-sequencingprofiles are directly related to the reliability and repro-ducibility of analysis results. For example, ChIP-seq datacharacterize alteration evidence for transcription factor (TF)binding activities in response to chemical or environmentalstimuli, but if the ChIP-seq alignment is poorly selected, anyfollow-up analysis may lead to inaccurate TF binding resultsand inevitable loss of biological meanings [4, 5].

The mostly investigated items in ChIP-seq peak callingprocedures are peak number, false discovery rate (FDR), cor-responding bin-size, and other statistical thresholds selectedin each analysis. Without exception, such arguments formimpenetrable barriers for biologists and bioinformaticians tochoose a suitable pair condition for analyzing experimentalresults.

And to our knowledge, few literatures or applicationnotes focus on such topics; thus herein we propose a flexiblepackage based on feature extraction and signal processing

algorithms for solving such an argument-selection optimiza-tion problem in optimal peak alignment.

In summary, the package COPAR can quantitativelymeasure NGS/ChIP-seq experiment quality through globalpeak alignment comparison and extract genomic featuresbased on spectrum method for in-depth analysis of ChIP-sequencing profiles.

2. Materials and Methods

2.1. Optimal Peak Alignment Estimation. For determiningoptimal ChIP-seq alignment, we need to analyze peak num-bers under specific argument constraints. Thus we acquireoptimal peak numbers by constraining specific arguments,which can be formalized as a class of optimal track analysis,illustrated as

argmax𝑖

𝑃𝑖, 𝑖 ∈ 𝑁

s.t. 𝑓𝑖 ≤ 𝜒,

𝑏𝑖 = 𝛽,

𝑝𝑖 ≤ 𝛿,

(1)

HindawiBioMed Research InternationalVolume 2017, Article ID 5346793, 4 pageshttps://doi.org/10.1155/2017/5346793

Page 2: ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzerdownloads.hindawi.com/journals/bmri/2017/5346793.pdf · ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzer BinhuaTang,1,2

2 BioMed Research International

Optimal estimation for peak alignment:(1) Fast global alignment for collecting candidates(2) Argument pairing for peak number and FDR(3) Optimal peak estimation, s.t. statistical FDR

Peak alignment&

genomic feature(ChIP-seq)

NGS experimentDB

COPAR

(i) Optimal estimation for NGS peak alignment(ii) Genomic feature extraction & comparison

(iii) Genome-wide analysis for multiple samples

Genomic feature extraction:(1) SP-based frequency spectrum analysis(2) Normalization for frequency spectrum(3) Randomized sample for statistical comparison

Figure 1: Flowchart for optimal peak alignment estimation and genomic feature analysis withCOPAR.Thepackage can performoptimal peakestimation based on global alignment of ChIP-seq data; then it can utilize the frequency spectrum approach for genomic feature extractionand carries out statistical comparison for multiple ChIP-seq samples.

where 𝑃𝑖 denotes a set of optimal peak numbers undercorresponding argument constraints, 𝑓𝑖 stands for argumentFDR, 𝑏𝑖 stands for bin-size, 𝑝𝑖 denotes 𝑝 value threshold,and 𝜒, 𝛽, and 𝛿 represent the presupposed argument values,respectively.

2.2. Spectrum-Based Genomic Feature Extraction. For a finiterandom variable sequence, its power spectrum is normallyestimated from its autocorrelation sequence by use ofdiscrete-time Fourier transform (DTFT), denoted as [6–8]

𝑃 (𝜔) = 12𝜋

∑𝑛=−∞

𝐶𝑥𝑥 (𝑛) 𝑒−𝑗𝑛𝜔, (2)

where 𝐶𝑥𝑥 denotes autocorrelation sequence of a discretesignal 𝑥𝑛, defined as

𝐶𝑥𝑥 (𝑖, 𝑗) =𝐸 [(𝑋𝑖 − 𝜇𝑖) (𝑋𝑗 − 𝜇𝑗)]

𝜎𝑖𝜎𝑗, (3)

where 𝜇 and 𝜎 stand for mean and variance, respectively.In our study, for consideration of the ChIP-seq data

characteristics, we use 128 sampling points to calculate dis-crete Fourier transform, with the related sampling frequency1 KHz.

3. Results

The COPAR package was developed and open-sourced foracademic biologists, and it uses built-in functions for deter-mining optimal peak alignment candidate and extractinggenomic features from ChIP-seq dataset.

The package is designed to handle BED-formatted ChIP-seq data as input [9], and it can process single ChIP-seqfor optimal peak alignment and feature extraction analysis,together with the capability to perform genome-wide statis-tical comparison for multiple ChIP-seq samples.The analysisflowchart for the package is given in Figure 1.

It can automatically determine the optimal peak align-ment with statistically meaningful FDR through fast globalalignment comparison; the global comparison is subjectto two statistical arguments, namely, bin-size and 𝑝 valuethreshold.

The functionalities of our developed package are largelycomplementary to and extend current tools used for ChIP-seq data analysis. The optimal peak alignment estimation isshown in Figures 2(a) and 2(b); and the spectrum-based fea-ture extraction is given in Figures 2(c) and 2(d). Figures 2(a)and 2(b) utilize heatmap to represent peak number and corre-sponding FDR candidate subject to each argument pair, bin-size (vertical axis), and 𝑝 value threshold (horizontal axis),respectively; Figure 2(c) denotes the spectrum distributionof the global peak alignment candidate sequence, normalizedwith its frequency range [0, 500]Hz and magnitude within[−40, −3] dB; Figure 2(d) denotes the randomized case.

4. Conclusions

Based on global peak alignment, COPAR optimizes theargument selection inChIP-seq analysis; meanwhile, COPARutilizes the signal spectrum processing method to furtherextract genomic features and statistically compare multipleChIP-seq samples for NGS high-throughput experiments.

In summary, our developed package COPAR can processmapped read file in BED format and output statistically sound

Page 3: ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzerdownloads.hindawi.com/journals/bmri/2017/5346793.pdf · ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzer BinhuaTang,1,2

BioMed Research International 3

Peak number0.

951

0.95

40.

957

0.96

0.96

30.

966

0.96

90.

972

0.97

50.

978

0.98

10.

984

0.98

70.

990.

993

0.99

60.

999

100150200250300350400450500

5000

10000

15000

(a)

False discovery rate (%)

0.95

10.

954

0.95

70.

960.

963

0.96

60.

969

0.97

20.

975

0.97

80.

981

0.98

40.

987

0.99

0.99

30.

996

0.99

9

100150200250300350400450500 0

5

10

15

(b)

Spectrum distribution

0.2 0.4 0.6 0.8 1.00.0Frequency (0–500 Hz)

−10

−20

−30

−40

0.0

0.2

0.4

0.6

0.8

1.0

Tim

e (1

–111

ms)

(c)

Spectrum distribution (random)

0.2 0.4 0.6 0.8 1.00.0Frequency (0–500 Hz)

−50

−40

−30

−20

−10

0.0

0.2

0.4

0.6

0.8

1.0

Tim

e (1

–111

ms)

(d)

Figure 2: Global optimal peak analysis result subject to the arguments bin-size and FDR. (a) Global distributions for peak number candidatesand (b) corresponding false discovery rate, subject to bin-size (vertical axis, from 100 through 500 bp) and 𝑝 value threshold (horizontalaxis, from 0.951 to 0.999), respectively; (c) genomic feature extraction based on spectrum distribution for global peak number candidatesidentified from COPAR; (d) spectrum distribution for the randomized sequence.

results for diverse high-throughput sequencing experiments;we further verified the package with three GEO ChIP-seqdatasets as study cases, and we included the analysis resultsinto the package manual. The developed package COPAR iscurrently available under a GNU GPL license from https://github.com/gladex/COPAR.

Abbreviations

NGS: Next-generation sequencingChIP-seq: Chromatin immunoprecipitation-sequencingFDR: False discovery rateTF: Transcription factorDTFT: Discrete-time Fourier transform.

Competing Interests

The authors declare that they have no competing interests.

Authors’ Contributions

Binhua Tang and Victor X. Jin conceived the method; BinhuaTang and Xihan Wang wrote and compiled the package;Binhua Tang, Xihan Wang, and Victor X. Jin drafted andproof-checked the manuscript.

Acknowledgments

This work has been supported by the Natural ScienceFoundation of Jiangsu, China (BE2016655 and BK20161196),Fundamental Research Funds for China Central Universities(2016B08914), and Changzhou Science & Technology Pro-gram (CE20155050). This work made use of the resourcessupported by the NSFC-Guangdong Mutual Funds for SuperComputing Program (2nd Phase) and the Open CloudConsortium- (OCC-) sponsored project resource, supportedin part by grants from Gordon and Betty Moore Foundation

Page 4: ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzerdownloads.hindawi.com/journals/bmri/2017/5346793.pdf · ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzer BinhuaTang,1,2

4 BioMed Research International

and the National Science Foundation (USA) and majorcontributions from OCC members.

References

[1] E. R. Mardis, “ChIP-seq: welcome to the new frontier,” NatureMethods, vol. 4, no. 8, pp. 613–614, 2007.

[2] G. J. Martinez and A. Rao, “Cooperative transcription factorcomplexes in control,” Science, vol. 338, no. 6109, pp. 891–892,2012.

[3] H. Kilpinen and J. C. Barrett, “How next-generation sequencingis transforming complex disease genetics,” Trends in Genetics,vol. 29, no. 1, pp. 23–30, 2013.

[4] M. D. Chikina and O. G. Troyanskaya, “An effective statisticalevaluation of chipseq dataset similarity,” Bioinformatics, vol. 28,no. 5, pp. 607–613, 2012.

[5] T. S. Furey, “ChIP-seq and beyond: new and improved method-ologies to detect and characterize protein-DNA interactions,”Nature Reviews Genetics, vol. 13, no. 12, pp. 840–852, 2012.

[6] A. V. Oppenheim and R. W. Schafer, Discrete-Time SignalProcessing, Prentice Hall, Upper Saddle River, NJ, USA, 3rdedition, 2010.

[7] B. Tang, H.-K. Hsu, P.-Y. Hsu et al., “Hierarchical modularityin ER𝛼 transcriptional network is associated with distinctfunctions and implicates clinical outcomes,” Scientific Reports,vol. 2, article 875, 2012.

[8] S.-L. Wang, Y.-H. Zhu, W. Jia, and D.-S. Huang, “Robustclassification method of tumor subtype by using correlationfilters,” IEEE/ACM Transactions on Computational Biology andBioinformatics, vol. 9, no. 2, pp. 580–591, 2012.

[9] X. Lan, R. Bonneville, J. Apostolos, W. Wu, and V. X. Jin, “W-ChIPeaks: a comprehensive web application tool for processingChIP-chip and ChIP-seq data,” Bioinformatics, vol. 27, no. 3, pp.428–430, 2011.

Page 5: ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzerdownloads.hindawi.com/journals/bmri/2017/5346793.pdf · ResearchArticle COPAR: A ChIP-Seq Optimal Peak Analyzer BinhuaTang,1,2

Submit your manuscripts athttps://www.hindawi.com

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttp://www.hindawi.com

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of

Microbiology


Recommended