Peak identification for ChIP-seq data with no controls
-
Graphical Abstract
-
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is increasingly being used for genome-wide profiling of transcriptional regulation, as this technique enables dissection of the gene regulatory networks. With input as control, a variety of statistical methods have been proposed for identifying the enriched regions in the genome, i.e., the transcriptional factor binding sites and chromatin modifications. However, when there are no controls, whether peak calling is still reliable awaits systematic evaluations. To address this question, we used a Bayesian framework approach to show the effectiveness of peak calling without controls (PCWC). Using several different types of ChIP-seq data, we demonstrated the relatively high accuracy of PCWC with less than a 5% false discovery rate (FDR). Compared with previously published methods, e.g., the model-based analysis of ChIP-seq (MACS), PCWC is reliable with lower FDR. Furthermore, to interpret the biological significance of the called peaks, in combination with microarray gene expression data, gene ontology annotation and subsequent motif discovery, our results indicate PCWC possesses a high efficiency. Additionally, using in silico data, only a small number of peaks were identified, suggesting the significantly low FDR for PCWC.
-
-