TY - JOUR
T1 - Compression-based distance (CBD)
T2 - A simple, rapid, and accurate method for microbiota composition comparison
AU - Yang, Fang
AU - Chia, Nicholas
AU - White, Bryan A.
AU - Schook, Lawrence B.
N1 - Funding Information:
This research was supported by grants AG2008-34480-19328 and 454538AG58-5438-7-317l (LBS) from United States Department of Agriculture and Agricultural Research Service and by the Institute for Genomic Biology at the University of Illinois at Urbana-Champaign (NC). We are also indebted to Han Jiang for his technical support and discussion contributions; to Maksim Sipos for his technical support contributions; and to Dr. Saurabh Sinha, Shuyi Ma and Matthew A. Richards for their critical editing contributions.
PY - 2013/4/23
Y1 - 2013/4/23
N2 - Background: Perturbations in intestinal microbiota composition have been associated with a variety of gastrointestinal tract-related diseases. The alleviation of symptoms has been achieved using treatments that alter the gastrointestinal tract microbiota toward that of healthy individuals. Identifying differences in microbiota composition through the use of 16S rRNA gene hypervariable tag sequencing has profound health implications. Current computational methods for comparing microbial communities are usually based on multiple alignments and phylogenetic inference, making them time consuming and requiring exceptional expertise and computational resources. As sequencing data rapidly grows in size, simpler analysis methods are needed to meet the growing computational burdens of microbiota comparisons. Thus, we have developed a simple, rapid, and accurate method, independent of multiple alignments and phylogenetic inference, to support microbiota comparisons.Results: We create a metric, called compression-based distance (CBD) for quantifying the degree of similarity between microbial communities. CBD uses the repetitive nature of hypervariable tag datasets and well-established compression algorithms to approximate the total information shared between two datasets. Three published microbiota datasets were used as test cases for CBD as an applicable tool. Our study revealed that CBD recaptured 100% of the statistically significant conclusions reported in the previous studies, while achieving a decrease in computational time required when compared to similar tools without expert user intervention.Conclusion: CBD provides a simple, rapid, and accurate method for assessing distances between gastrointestinal tract microbiota 16S hypervariable tag datasets.
AB - Background: Perturbations in intestinal microbiota composition have been associated with a variety of gastrointestinal tract-related diseases. The alleviation of symptoms has been achieved using treatments that alter the gastrointestinal tract microbiota toward that of healthy individuals. Identifying differences in microbiota composition through the use of 16S rRNA gene hypervariable tag sequencing has profound health implications. Current computational methods for comparing microbial communities are usually based on multiple alignments and phylogenetic inference, making them time consuming and requiring exceptional expertise and computational resources. As sequencing data rapidly grows in size, simpler analysis methods are needed to meet the growing computational burdens of microbiota comparisons. Thus, we have developed a simple, rapid, and accurate method, independent of multiple alignments and phylogenetic inference, to support microbiota comparisons.Results: We create a metric, called compression-based distance (CBD) for quantifying the degree of similarity between microbial communities. CBD uses the repetitive nature of hypervariable tag datasets and well-established compression algorithms to approximate the total information shared between two datasets. Three published microbiota datasets were used as test cases for CBD as an applicable tool. Our study revealed that CBD recaptured 100% of the statistically significant conclusions reported in the previous studies, while achieving a decrease in computational time required when compared to similar tools without expert user intervention.Conclusion: CBD provides a simple, rapid, and accurate method for assessing distances between gastrointestinal tract microbiota 16S hypervariable tag datasets.
KW - Compression-based distance
KW - Microbiome analysis
KW - Microbiota comparison
UR - http://www.scopus.com/inward/record.url?scp=84876733400&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876733400&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-14-136
DO - 10.1186/1471-2105-14-136
M3 - Article
C2 - 23617892
AN - SCOPUS:84876733400
SN - 1471-2105
VL - 14
JO - BMC Bioinformatics
JF - BMC Bioinformatics
M1 - 136
ER -