A review on the bioinformatics pipelines for metagenomic research

YE Dan-Dan; FAN Meng-Meng; GUAN Qiong; CHEN Hong-Ju; MA Zhan-Shan

doi:10.3724/SP.J.1141.2012.06574

YE Dan-Dan, FAN Meng-Meng, GUAN Qiong, CHEN Hong-Ju, MA Zhan-Shan. 2012. A review on the bioinformatics pipelines for metagenomic research. Zoological Research, 33(6): 574-585. DOI: 10.3724/SP.J.1141.2012.06574

Citation:

A review on the bioinformatics pipelines for metagenomic research

Graphical Abstract

Abstract

Abstract

Metagenome, a term first dubbed by Handelsman in 1998 as “the genomes of the total microbiota found in nature”, refers to sequence data directly sampled from the environment (which may be any habitat in which microbes live, such as the guts of humans and animals, milk, soil, lakes, glaciers, and oceans). Metagenomic technologies originated from environmental microbiology studies and their wide application has been greatly facilitated by next-generation high throughput sequencing technologies. Like genomics studies, the bottle neck of metagenomic research is how to effectively and efficiently analyze the gigantic amount of metagenomic sequence data using the bioinformatics pipelines to obtain meaningful biological insights. In this article, we briefly review the state-of-the-art bioinformatics software tools in metagenomic research. Due to the differences between the metagenomic data obtained from whole genome sequencing (i.e., shotgun metagenomics) and amplicon sequencing (i.e., 16S-rRNA and gene-targeted metagenomics) methods, there are significant differences between the corresponding bioinformatics tools for these data; accordingly, we review the computational pipelines separately for these two types of data.