Volume 33 Issue 6
Nov.  2012
Turn off MathJax
Article Contents

YE Dan-Dan, FAN Meng-Meng, GUAN Qiong, CHEN Hong-Ju, MA Zhan-Shan. A review on the bioinformatics pipelines for metagenomic research. Zoological Research, 2012, 33(6): 574-585. doi: 10.3724/SP.J.1141.2012.06574
Citation: YE Dan-Dan, FAN Meng-Meng, GUAN Qiong, CHEN Hong-Ju, MA Zhan-Shan. A review on the bioinformatics pipelines for metagenomic research. Zoological Research, 2012, 33(6): 574-585. doi: 10.3724/SP.J.1141.2012.06574

A review on the bioinformatics pipelines for metagenomic research

doi: 10.3724/SP.J.1141.2012.06574
  • Received Date: 2012-09-17
  • Rev Recd Date: 2012-11-15
  • Publish Date: 2012-12-08
  • Metagenome, a term first dubbed by Handelsman in 1998 as “the genomes of the total microbiota found in nature”, refers to sequence data directly sampled from the environment (which may be any habitat in which microbes live, such as the guts of humans and animals, milk, soil, lakes, glaciers, and oceans). Metagenomic technologies originated from environmental microbiology studies and their wide application has been greatly facilitated by next-generation high throughput sequencing technologies. Like genomics studies, the bottle neck of metagenomic research is how to effectively and efficiently analyze the gigantic amount of metagenomic sequence data using the bioinformatics pipelines to obtain meaningful biological insights. In this article, we briefly review the state-of-the-art bioinformatics software tools in metagenomic research. Due to the differences between the metagenomic data obtained from whole genome sequencing (i.e., shotgun metagenomics) and amplicon sequencing (i.e., 16S-rRNA and gene-targeted metagenomics) methods, there are significant differences between the corresponding bioinformatics tools for these data; accordingly, we review the computational pipelines separately for these two types of data.
  • 加载中
  • [1] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. 2000. Gene ontology: tool for the unification of biology[J]. Nat Genet, 25(1): 25-29.
    [2] Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES. 2002. ARACHNE: a whole-genome shotgun assembler[J]. Genome Res, 12(1): 177-189.
    [3] Borodovsky M, McIninch J. 1993. GeneMark: parallel gene recognition for both DNA strands[J].Computers & Chemistry, 17(19):123-133.
    [4] Burge C, Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA [J]. J Molecular Biol, 268(1): 78-94.
    [5] Caporaso JK, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich J, Gordon JI, Gavin A Huttley, Kelley ST, Dan, EKoenig J, Ley RE, Lozupone CA, McDonald D, Muegge BD, Meg, Pirrung, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, and Knight R. 2012. QIIME allows analysis of high-throughput community sequencing data [J]. Nat Methods, 7(5): 335–336.
    [6] Cock PJ, Fields CJ, Goto N, Heuer M L, Rice P M. 2010. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants[J]. Nucleic Acids Res, 38(6): 1767-1771.
    [7] Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM. 2009. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis[J]. Nucleic Acids Res, 37: 141–145.
    [8] Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. 2005. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research[J]. Bioinfomatics, 21: 3674-3676.
    [9] Delcher AL, HarmonD, Kasif S, White O, Salzberg SL. 2001. Improved microbial gene identification with GLIMMER[J]. Nucleic Acids Res, 27: 4636-4641.
    [10] Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. 2003. DAVID: Database for Annotation, Visualization, and Integrated Discovery[J]. Genome Biol, 4(5): 3.
    [11] DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P and Andersen GL. 2006. Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB[J]. Appl Environ Microbiol, 72(7): 5069-5072.
    [12] Frigaard NU, Martinez A, Mincer TJ, DeLong EF. 2006. Proteorhodopsin lateral gene transfer between marine planktonic Bacteria and Archaea[J]. Nature, 439(7078): 847–850.
    [13] Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM. 1998. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products[J]. Chem Biol, 5(10): 245-249.
    [14] He JZ, Zhang LM, Sen JM, Zhu YG. 2008. Advances and perspectives of metagenomics[J]. Aata Scientiae Circumstantiae, 25(2): 231-234. [贺纪正, 张丽梅, 沈菊培, 朱永官. 2008. 宏基因组学(Metagenomics)的研究现状和发展趋势. 环境科学学报, 25(2): 231-234.]
    [15] Huang X, Yang SP. 2005. Generating a genome assembly with PCAP[M] // Current Protocols in Bioinformatics. New York: John Wiley & Sons.
    [16] Miller JR, Koren S, Sutton G. 2010. Assembly algorithms for next-generation sequencing data[J]. Genomics, 95(6): 315-327.
    [17] Kanehisa M, Goto S. 2000. KEGG: kyoto encyclopedia of genes and genomes[J]. Nucleic Acids Res, 28(1): 27-30.
    [18] Korf I, Flicek P, Duan D, Brent MR. 2001. Integrating genomic homology into gene structure prediction[J]. Bioinformatics, 17(1): 140-148.
    [19] Krogh A, Mian IS, Haussler D. 1994. A hidden Markov model that finds genes in E. coli DNA[J]. Nucleic Acids Res, 22(22): 4768-4778.
    [20] Li H, HeJJ, Zhang Y, Xu H, Chen GX. 2008a. Application of metagenomic technique in the exploring of uncultured environmental microbial gene resource [J]. Acta Ecol Sin, 28(4): 1762-1762, 1773. [李慧, 何晶晶, 张颖, 徐慧, 陈冠雄. 2008. 宏基因组技术在开发未培养环境微生物基因资源中的应用. 生态学报, 28(4): 1762-1762, 1773.]
    [21] Li R, Li Y, Kristiansen K, Wang J. 2008b. SOAP: short oligonucleotide alignment program[J]. Bioinformatics, 24(5): 713-714.
    [22] Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, Buchner A, Lai T, Steppi S, Jobb G, Förster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, König A, Liss T, Lüβmann R, May M, Nonhoff B, Reichel R, Strehlow R, Stamatakis A, Stuckmann S, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer KH. 2004. ARB: a software environment for sequence data[J]. Nucleic Acids Res, 32(4): 1363-1371.
    [23] Lukashin AV, Borodovsky M. 1998. Genemark. hmm: new solutions for gene finding [J]. Nucleic Acids Res, 26(4): 1107-1115.
    [24] Ma, Z S. 2012. A note on extending Taylor’s power law for characterizing human microbial communities: Inspiration from comparative studies on the distribution patterns of insects and galaxies, and as a case study for medical ecology.[Online] Available: arXiv.org/abs/1205.3504 (2012/5/15).
    [25] Ma Z S, Geng JW, Abdo Z, Forney LJ. 2012. A Bird’s Eye View of Microbial Community Dynamics // Microbial Ecology Theory: Current Perspectives. Norwich, UK: Horizon Scientific Press: 57-70.
    [26] Maidak BL, Olsen GJ, Larsen1 N, Overbeek R, McCaughey, Woese CR. 1997. The RDP (Ribosomal Database Project). Nucleic Acids Res, 25(1): 109–110.
    [27] Melsted P, Pritchard JK. 2011. Efficient counting of K-mers in DNA sequence using a bloom filter[J]. BMC Bioinformaitcs, 12(1): 333.
    [28] Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA, Anson EL, Bolanos RA, Chou HH, Jordan CM, Halpern AL, Lonardi S, Beasley EM, Brandon RC, Chen L, Dunn PJ, Lai Z, Liang Y, Nusskern DR, Zhan M, Zhang Q, Zheng X, Rubin GM, Adams MD, Venter JC. 2000. A whole-genome assembly of Drosophila. Science, 287(5461): 2196-2204.
    [29] Pevzner PA, Tang HX, Waterman MS. 2001. An Eulerian path approach to DNA fragment assembly[J]. Proc Natl Acad Sci USA, 98(17): 9748-9753.
    [30] Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glockner FO. 2007. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB[J]. Nucleic Acids Res, 35(21): 7188-7196.
    [31] Salamov AA, Solovyev VV. 2000. Ab initio gene finding in Drosophila genomic DNA[J]. Genome Res, 10(4): 516-522.
    [32] Schloss PD, Westcott SL, Ryabin T, Hall J R, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. 2009. Introducing mothur: open-source, platform independent, community-supported software for describing and comparing microbial communities[J]. Appl Environ Microbiol, 75(23): 7537-7541.
    [33] Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. 2009. ABySS: A parallel assembler for short read sequence data[J]. Genome Res, 19(6): 1117-1123.
    [34] Sun HX, Wang XJ. 2009. The development and future perspectives of DNA sequencing technology[J]. e-Science, 2(3): 19-29. [孙海汐, 王秀杰. 2009. DNA测序技术发展及其展望. e-Science技术, 2(3): 19-29.]
    [35] Sun RY, Li QF, Niu CJ, Lou AR. 2002. Basic Ecology [M]. Beijing: Higher Education Press: 112-144. [孙儒泳, 李庆芬, 牛翠娟, 娄安如. 2002. 基础生态学. 北京: 高等教育出版社: 112-114.]
    [36] Wang X. 2011. The Generation Algorithm Based on de Brujin Graph DNAContig[D]. Harbin: Harbin Institute of Technology. [王旭. 2011. 基于de Brujin图的DNAContig生成算法. 哈尔滨: 哈尔滨工业大学.]
    [37] Warren RL, Sutton GG, Jones SJ, Holt RA. 2007. Assembling millions of short DNA sequences using SSAKE[J]. Bioinformatics, 23(4): 500-501.
    [38] Wu QF. 2003. An introduction of several programs used in genomic analysis[J]. Hereditas, 25(6): 708-712. [吴清发. 2003. 基因组学研究中一些常用软件的概述. 遗传, 25(6): 708-712.]
    [39] Ye CX, Ma ZS, Cannon CH, Pop M, Yu DW. 2011a. SparseAssembler: de novo Assembly with the Sparse de Bruijn Graph.[Online] Available: arXiv.org/abs/ 1106.2603 (2011/6/14).
    [40] Ye CX, Cannon CH, Ma ZS, Yu DW, Pop M. 2011b. SparseAssembler2: Sparse k-mer Graph for Memory Efficient Genome Assembly.[Online] Available: arXiv.org/abs/1108.3556 (2011/8/17).
    [41] Ye CX, Ma ZS, Cannon CH, Pop M, Yu DW. 2012. Exploiting sparseness in de novo genome assembly[J]. BMC Bioinformatics, 13(S1): S1.
    [42] Ye J, Fang L, Zheng HK, Zhang Y, Chen J, Zhang ZJ, Wang J, Li ST, Li RQ, Bolund L, Wang J. 2006. WEGO: a web tool for plotting GO annotations[J]. Nucleic Acids Res, 34(S2): 293-297.
    [43] Zdobnov EM, Apweiler R. 2001. InterProScan-an intergration platform for the signature-recognition methods in InterPro[J]. Bioinformatics, 17(9): 847-848.
    [44] Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short Read assembly using de Bruijn graphs[J]. Genome Res, 18(5): 821-829.
    [45] Zhang H, Cui HZ. 2010. Metagenomics and its research progress[J]. China Animal H usbandry & Veterinary Medicine, 37(3): 87-90. [张辉, 崔焕忠. 2010. 宏基因组学及其研究进展, 中国畜牧兽医, 37(3): 87-90.]
    [46] Zhang EM, Hai R, Yu DZ. 2009. Research progress of gene prediction methods[J]. Chin J Vector Bio & Control, 20(3): 271-273. [张恩民, 海荣, 俞东征. 2009. 基因预测方法的研究进展. 中国媒介生物学及控制杂志, 20(3): 271-273.]
    [47] Zhou DQ. 1993. Laboratory Experiments in Microbiology[M]. Beijing: Higher Education Press, 396-398. [周德庆. 1993. 微生物学教程. 北京: 高等教育出版社, 396-398.]
    [48] Zhu HQ, Hu GQ, Yang YF, Wang J, She ZS. 2007. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes. BMC Bioinformatics, 8(1): 97.
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Article Metrics

Article views(4750) PDF downloads(11283) Cited by()

Related
Proportional views

A review on the bioinformatics pipelines for metagenomic research

doi: 10.3724/SP.J.1141.2012.06574

Abstract: Metagenome, a term first dubbed by Handelsman in 1998 as “the genomes of the total microbiota found in nature”, refers to sequence data directly sampled from the environment (which may be any habitat in which microbes live, such as the guts of humans and animals, milk, soil, lakes, glaciers, and oceans). Metagenomic technologies originated from environmental microbiology studies and their wide application has been greatly facilitated by next-generation high throughput sequencing technologies. Like genomics studies, the bottle neck of metagenomic research is how to effectively and efficiently analyze the gigantic amount of metagenomic sequence data using the bioinformatics pipelines to obtain meaningful biological insights. In this article, we briefly review the state-of-the-art bioinformatics software tools in metagenomic research. Due to the differences between the metagenomic data obtained from whole genome sequencing (i.e., shotgun metagenomics) and amplicon sequencing (i.e., 16S-rRNA and gene-targeted metagenomics) methods, there are significant differences between the corresponding bioinformatics tools for these data; accordingly, we review the computational pipelines separately for these two types of data.

YE Dan-Dan, FAN Meng-Meng, GUAN Qiong, CHEN Hong-Ju, MA Zhan-Shan. A review on the bioinformatics pipelines for metagenomic research. Zoological Research, 2012, 33(6): 574-585. doi: 10.3724/SP.J.1141.2012.06574
Citation: YE Dan-Dan, FAN Meng-Meng, GUAN Qiong, CHEN Hong-Ju, MA Zhan-Shan. A review on the bioinformatics pipelines for metagenomic research. Zoological Research, 2012, 33(6): 574-585. doi: 10.3724/SP.J.1141.2012.06574
Reference (48)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return