Record Details

Data used for the analysis of CoMeta program

Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)

View Archive Info
 
 
Field Value
 
Title Data used for the analysis of CoMeta program
 
Identifier https://doi.org/10.7910/DVN/29265
 
Creator Kawulok, Jolanta
Deorowicz, Sebastian
 
Publisher Harvard Dataverse
 
Description The data used in the experiments reported in the paper doi: 10.1371/journal.pone.0121453 Due to the continuous updating of data on the NCBI website, we provide the data that we have used in our research. ### Taxonomy linked data: ### * taxdump.tar.gz - this file was downloaded on August 6, 2012 from ftp://ftp.ncbi.nih.gov/pub/taxonomy It includes 2 files: * nodes.dmp - represents taxonomy nodes * names.dmp - includes taxonomy names * gi_taxid_nucl.dmp.gz - this file was downloaded on December 19, 2012 from ftp://ftp.ncbi.nih.gov/pub/taxonomy. It contains two columns: the GenBank identifier (gi) and taxonomy identifier (taxid). ### Reference sequnces: ### nt.xx.tar - xx: from 00 to 12. Compressed data included reference sequences. These files was downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/db/. The update was Jul 27, 2012. ### Metagenomic sets: ### Each metagenomic set contains added tax number (taxid) by us. * facs_full.fa - contains 100,000 reads of an average 269 bp length. These reads were acquired from the FACS web site http://facs.scilifelab.se/, originally used by Stranneheim et al. [Str]. * facs_reduced.fa - FACS_full set after reduction, which contains 93,653 reads of an average 269 bp length. * carma.fa - contains 25,000 reads of an average 265 bp length. These reads were acquired from the WebCARMA web site http://wwww.cebitec.uni-bielefeld.de/webcarma.cebitec.uni-bielefeld.de/ and originally used by Gerlach and Stoye [GeSt]. * metaphyler.fa - contains 66,841 reads of a 300 bp length. It was originally used by Liu et al. [Liu] and contained 73,086 reads from which some had no information about their origin. * phylopythia.fa - contains 114,457 unique reads of an average 961 bp length. Originally this set was used by Patil et al. [Pat] and contained 124,941 reads from which some were repeated. MetaPhyler and PhyloPythia sets courtesy of Adam Bazinet, who used them in his paper [BaCu]. HiSeq and MiSeq datasets were downoladed from https://ccb.jhu.edu/software/kraken website. These data were originally used by Wood and Salzberg [WoSa]. ### References ### [GeSt] Gerlach W, Stoye J (2011) Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic acids research 39. [Str] Stranneheim
H, Käller M, Allander T, Andersson B, Arvestad L, Lundeberg J (2010) Classification of DNA sequences using Bloom filters. Bioinformatics 26: 1595-1600. [Liu] Liu B, Gibbons T, Ghodsi M, Pop M (2010) MetaPhyler: Taxonomic profiling for metagenomic sequences. In: Proceedings of the 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010. pp. 95-100. [Pat] Patil KR, Haider P, Pope PB, Turnbaugh PJ, Morrison M, Scheffer T, McHardy AC (2011) Taxonomic metagenome sequence assignment with structured output models. Nature Methods 8: 191-192. [BaCu] Bazinet A, Cummings M (2012) A comparative evaluation of sequence classification programs. BMC Bioinformatics 13: 1-13. [WoSa] Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.
 
Subject CoMeta