IMP: Imperial Metagenomics Pipeline for high-throughput sequence data


Autoria(s): Hoyles, L.; Abbott, J.C.; Holmes, E.; Dumas, M.-E.; Butcher, S.A.; Nicholson, J.K.
Data(s)

2015

Resumo

We have developed an in-house pipeline for the processing and analyses of sequence data generated during Illumina technology-based metagenomic studies of the human gut microbiota. Each component of the pipeline has been selected following comparative analysis of available tools; however, the modular nature of software facilitates replacement of any individual component with an alternative should a better tool become available in due course. The pipeline consists of quality analysis and trimming followed by taxonomic filtering of sequence data allowing reads associated with samples to be binned according to whether they represent human, prokaryotic (bacterial/archaeal), viral, parasite, fungal or plant DNA. Viral, parasite, fungal and plant DNA can be assigned to species level on a presence/absence basis, allowing – for example – identification of dietary intake of plant-based foodstuffs and their derivatives. Prokaryotic DNA is subject to taxonomic and functional analyses, with assignment to taxonomic hierarchies (kingdom, class, order, family, genus, species, strain/subspecies) and abundance determination. After de novo assembly of sequence reads, genes within samples are predicted and used to build a non-redundant catalogue of genes. From this catalogue, per-sample gene abundance can be determined after normalization of data based on gene length. Functional annotation of genes is achieved through mapping of gene clusters against KEGG proteins, and InterProScan. The pipeline is undergoing validation using the human faecal metagenomic data of Qin et al. (2014, Nature 513, 59–64). Outputs from the pipeline allow development of tools for the integration of metagenomic and metabolomic data, moving metagenomic studies beyond determination of gene richness and representation towards microbial-metabolite mapping. There is scope to improve the outputs from viral, parasite, fungal and plant DNA analyses, depending on the depth of sequencing associated with samples. The pipeline can easily be adapted for the analyses of environmental and non-human animal samples, and for use with data generated via non-Illumina sequencing platforms.

Formato

application/pdf

Identificador

http://westminsterresearch.wmin.ac.uk/15260/1/Hoyles_Metagenomics_Cambridge.pdf

Hoyles, L., Abbott, J.C., Holmes, E., Dumas, M.-E., Butcher, S.A. and Nicholson, J.K. (2015) IMP: Imperial Metagenomics Pipeline for high-throughput sequence data. In: Exploring Human Host-Microbiome Interactions in Health and Disease, 29 Jun 2015, Cambridge, UK. (Unpublished)

Idioma(s)

en

Relação

http://westminsterresearch.wmin.ac.uk/15260/

Palavras-Chave #Science and Technology
Tipo

Conference or Workshop Item

NonPeerReviewed