Course Reading List
Microbiome tools overview
Tools for Analysis of the Microbiome
PMID: 32002757 DOI: 10.1007/s10620-020-06091-yAbstract
Over the past decade, it has become exceedingly clear that the microbiome is a critical factor in human health and disease and thus should be investigated to develop innovative treatment strategies. The field of metagenomics has come a long way in leveraging the advances of next-generation sequencing technologies resulting in the capability to identify and quantify all microorganisms present in human specimens. However, the field of metagenomics is still in its infancy, specifically in regard to the limitations in computational analysis, statistical assessments, standardization, and validation due to vast variability in the cohorts themselves, experimental design, and bioinformatic workflows. This review summarizes the methods, technologies, computational tools, and model systems for characterizing and studying the microbiome. We also discuss important considerations investigators must make when interrogating the involvement of the microbiome in health and disease in order to establish robust results and mechanistic insights before moving into therapeutic design and intervention.
Best practices for analysing microbiomes
PMID: 29795328 DOI: 10.1038/s41579-018-0029-9Abstract
Complex microbial communities shape the dynamics of various environments, ranging from the mammalian gastrointestinal tract to the soil. Advances in DNA sequencing technologies and data analysis have provided drastic improvements in microbiome analyses, for example, in taxonomic resolution, false discovery rate control and other properties, over earlier methods. In this Review, we discuss the best practices for performing a microbiome study, including experimental design, choice of molecular analysis technology, methods for data analysis and the integration of multiple omics data sets. We focus on recent findings that suggest that operational taxonomic unit-based analyses should be replaced with new methods that are based on exact sequence variants, methods for integrating metagenomic and metabolomic data, and issues surrounding compositional data analysis, where advances have been particularly rapid. We note that although some of these approaches are new, it is important to keep sight of the classic issues that arise during experimental design and relate to research reproducibility. We describe how keeping these issues in mind allows researchers to obtain more insight from their microbiome data sets.
The Madness of Microbiome: Attempting To Find Consensus "Best Practice" for 16S Microbiome Studies
PMID: 29427429 DOI: 10.1128/aem.02627-17Abstract
The development and continuous improvement of high-throughput sequencing platforms have stimulated interest in the study of complex microbial communities. Currently, the most popular sequencing approach to study microbial community composition and dynamics is targeted 16S rRNA gene metabarcoding. To prepare samples for sequencing, there are a variety of processing steps, each with the potential to introduce bias at the data analysis stage. In this short review, key information from the literature pertaining to each processing step is described, and consequently, general recommendations for future 16S rRNA gene metabarcoding experiments are made.
Microbiome sequencing methods
Unravelling the enigma of the human microbiome: Evolution and selection of sequencing technologies
PMID: 37929823 DOI: 10.1111/1751-7915.14364Abstract
The human microbiome plays a crucial role in maintaining health, with advances in high-throughput sequencing technology and reduced sequencing costs triggering a surge in microbiome research. Microbiome studies generally incorporate five key phases: design, sampling, sequencing, analysis, and reporting, with sequencing strategy being a crucial step offering numerous options. Present mainstream sequencing strategies include Amplicon sequencing, Metagenomic Next-Generation Sequencing (mNGS), and Targeted Next-Generation Sequencing (tNGS). Two innovative technologies recently emerged, namely MobiMicrobe high-throughput microbial single-cell genome sequencing technology and 2bRAD-M simplified metagenomic sequencing technology, compensate for the limitations of mainstream technologies, each boasting unique core strengths. This paper reviews the basic principles and processes of these three mainstream and two novel microbiological technologies, aiding readers in understanding the benefits and drawbacks of different technologies, thereby guiding the selection of the most suitable method for their research endeavours.
Current challenges and best-practice protocols for microbiome analysis
PMID: 31848574 DOI: 10.1093/bib/bbz155Abstract
Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also have an immediate effect on the results of the analyses. The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization. To simplify and standardize the computational analysis, we provide a set of best-practice workflows for 16S rRNA and metagenomic sequencing data (available at https://github.com/grimmlab/MicrobiomeBestPracticeReview).
Data management, processing, QC
Data pre-processing for analyzing microbiome data - A mini review
PMID: 37841330 DOI: 10.1016/j.csbj.2023.10.001Abstract
The human microbiome is an emerging research frontier due to its profound impacts on health. High-throughput microbiome sequencing enables studying microbial communities but suffers from analytical challenges. In particular, the lack of dedicated preprocessing methods to improve data quality impedes effective minimization of biases prior to downstream analysis. This review aims to address this gap by providing a comprehensive overview of preprocessing techniques relevant to microbiome research. We outline a typical workflow for microbiome data analysis. Preprocessing methods discussed include quality filtering, batch effect correction, imputation of missing values, normalization, and data transformation. We highlight strengths and limitations of each technique to serve as a practical guide for researchers and identify areas needing further methodological development. Establishing robust, standardized preprocessing will be essential for drawing valid biological conclusions from microbiome studies.
Measuring the microbiome: Best practices for developing and benchmarking microbiomics methods
PMID: 33363701 DOI: 10.1016/j.csbj.2020.11.049Abstract
Microbiomes are integral components of diverse ecosystems, and increasingly recognized for their roles in the health of humans, animals, plants, and other hosts. Given their complexity (both in composition and function), the effective study of microbiomes (microbiomics) relies on the development, optimization, and validation of computational methods for analyzing microbial datasets, such as from marker-gene (e.g., 16S rRNA gene) and metagenome data. This review describes best practices for benchmarking and implementing computational methods (and software) for studying microbiomes, with particular focus on unique characteristics of microbiomes and microbiomics data that should be taken into account when designing and testing microbiomics methods.
Microbiome data analysis
Analysis of Microbiome Data
PMID: 38962089 DOI: 10.1146/annurev-statistics-040522-120734Abstract
The microbiome represents a hidden world of tiny organisms populating not only our surroundings but also our own bodies. By enabling comprehensive profiling of these invisible creatures, modern genomic sequencing tools have given us an unprecedented ability to characterize these populations and uncover their outsize impact on our environment and health. Statistical analysis of microbiome data is critical to infer patterns from the observed abundances. The application and development of analytical methods in this area require careful consideration of the unique aspects of microbiome profiles. We begin this review with a brief overview of microbiome data collection and processing and describe the resulting data structure. We then provide an overview of statistical methods for key tasks in microbiome data analysis, including data visualization, comparison of microbial abundance across groups, regression modeling, and network inference. We conclude with a discussion and highlight interesting future directions.
Supplementary Reading List
Next-generation Sequencing technology
Next-generation DNA sequencing technology
PMID: 18846087 DOI: 10.1038/nbt1486Abstract
DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.
Sequencing technologies - the next generation
PMID: 19997069 DOI: 10.1038/nrg2626Abstract
Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.
Coming of age: ten years of next-generation sequencing technologies
PMID: 27184599 DOI: 10.1038/nrg.2016.49Abstract
Since the completion of the human genome project in 2003, extraordinary progress has been made in genome sequencing technologies, which has led to a decreased cost per megabase and an increase in the number and diversity of sequenced genomes. An astonishing complexity of genome architecture has been revealed, bringing these sequencing technologies to even greater advancements. Some approaches maximize the number of bases sequenced in the least amount of time, generating a wealth of data that can be used to understand increasingly complex phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of DNA, which are essential for resolving structurally complex regions. These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.