Data used for the analysis of CoMeta program
Harvard Dataverse (Africa Rice Center, Bioversity International, CCAFS, CIAT, IFPRI, IRRI and WorldFish)
View Archive InfoField | Value | |
Title |
Data used for the analysis of CoMeta program
|
|
Identifier |
https://doi.org/10.7910/DVN/29265
|
|
Creator |
Kawulok, Jolanta
Deorowicz, Sebastian |
|
Publisher |
Harvard Dataverse
|
|
Description |
The data used in the experiments reported in the paper doi: 10.1371/journal.pone.0121453 Due to the continuous updating of data on the NCBI website, we provide the data that we have used in our research. ### Taxonomy linked data: ### * taxdump.tar.gz - this file was downloaded on August 6, 2012 from ftp://ftp.ncbi.nih.gov/pub/taxonomy It includes 2 files: * nodes.dmp - represents taxonomy nodes * names.dmp - includes taxonomy names * gi_taxid_nucl.dmp.gz - this file was downloaded on December 19, 2012 from ftp://ftp.ncbi.nih.gov/pub/taxonomy. It contains two columns: the GenBank identifier (gi) and taxonomy identifier (taxid). ### Reference sequnces: ### nt.xx.tar - xx: from 00 to 12. Compressed data included reference sequences. These files was downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/db/. The update was Jul 27, 2012. ### Metagenomic sets: ### Each metagenomic set contains added tax number (taxid) by us. * facs_full.fa - contains 100,000 reads of an average 269 bp length. These reads were acquired from the FACS web site http://facs.scilifelab.se/, originally used by Stranneheim et al. [Str]. * facs_reduced.fa - FACS_full set after reduction, which contains 93,653 reads of an average 269 bp length. * carma.fa - contains 25,000 reads of an average 265 bp length. These reads were acquired from the WebCARMA web site http://wwww.cebitec.uni-bielefeld.de/webcarma.cebitec.uni-bielefeld.de/ and originally used by Gerlach and Stoye [GeSt]. * metaphyler.fa - contains 66,841 reads of a 300 bp length. It was originally used by Liu et al. [Liu] and contained 73,086 reads from which some had no information about their origin. * phylopythia.fa - contains 114,457 unique reads of an average 961 bp length. Originally this set was used by Patil et al. [Pat] and contained 124,941 reads from which some were repeated. MetaPhyler and PhyloPythia sets courtesy of Adam Bazinet, who used them in his paper [BaCu]. HiSeq and MiSeq datasets were downoladed from https://ccb.jhu.edu/software/kraken website. These data were originally used by Wood and Salzberg [WoSa]. ### References ### [GeSt] Gerlach W, Stoye J (2011) Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic acids research 39. [Str] Stranneheim H, Käller M, Allander T, Andersson B, Arvestad L, Lundeberg J (2010) Classification of DNA sequences using Bloom filters. Bioinformatics 26: 1595-1600. [Liu] Liu B, Gibbons T, Ghodsi M, Pop M (2010) MetaPhyler: Taxonomic profiling for metagenomic sequences. In: Proceedings of the 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010. pp. 95-100. [Pat] Patil KR, Haider P, Pope PB, Turnbaugh PJ, Morrison M, Scheffer T, McHardy AC (2011) Taxonomic metagenome sequence assignment with structured output models. Nature Methods 8: 191-192. [BaCu] Bazinet A, Cummings M (2012) A comparative evaluation of sequence classification programs. BMC Bioinformatics 13: 1-13. [WoSa] Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46. |
|
Subject |
CoMeta
|
|