Data Overview

Background

The 16S ribosomal (rRNA) gene is one of the most commonly used marker genes to assign phylogenetic affiliation to prokaryotic organisms and has enabled compositional analysis of the bacterial component of the microbiome. The gene contains several hypervariable regions that allow differentiation of closely related organisms. The region of choice (e.g. V4, V1-V3) is PCR amplified and libraries are sequenced to the desired depth using a paired-end protocol.

Quality controlled sequences are then partitioned into meaningful units (e.g OTU or ASV) prior to taxonomic assignment. Together with the number of sequence reads present per OTU/ASV, this information provides an overview of community composition as well as an estimate of the relative abundance of individual taxa within that community.

You have chosen the region and for the analysis of your 16S rRNA amplicon sequencing project. Please find a full description of the methods used in the downloadable Methods documents provided.

Notes

Sequences

Raw data files in binary base call (BCL) format are converted into FASTQs and demultiplexed based on the single-index barcodes using the Illumina bcl2fastq software. The following files are provided below: Demultiplexed fastq reads and a ReadQC file with information about number of raw reads, mapped and unmapped reads.

RawSequences.zip
Read_QC.txt

After downloading RawSequences.zip, you can verify the file integrity using the file's built-in CRC checksums. On Mac or Linux, run the command unzip -t RawSequences.zip. On Windows, install 7zip, then right-click on RawSequences.zip and select "7-zip" -> "Test Archive".

Atima2 Customizable Plots

Atima2 is a web application combining publicly available R packages with purpose-written code to import sample data and identify trends in taxa abundance, alpha-diversity, and beta-diversity as they relate to sample metadata. The FeatureTable.biom file has been pre-loaded into Atima2, available at the link below. The Atima2 web-based browser allows you to explore correlations between taxonomic or functional profiles and sample metadata via highly-customizable plots and corresponding statistical analyses.

Standard Outputs

Demultiplexed read pairs are first quality trimmed, Illumina adapters and PhiX reads are removed using bbduk, before reads are merged using bbmap. Reads are partitioned into meaningful taxonomic units (Features) based on the chosen method and read counts per sample provide in FeatureTable.tsv. To determine taxonomy assignment, representative sequences (CentroidInformation.fa) are mapped against an optimized version of the SILVA Database version containing only sequences from the chosen region of the 16S rRNA gene. A phylogenetic tree (FeatureTable.tre) of the representative sequences is generated by creating a multiple sequence alignment using MAFFT and FastTree. A custom program developed in-house generates and summarizes the final FeatureTable.biom file (including abundances, taxonomy, phylogeny and representative sequences), recording read statistics and taxonomy information.

Summary_table.xlsx
FeatureTable.tsv
FeatureTable.biom (includes phylogeny and representative sequneces)
CentroidInformation.fa
FeatureTable.tre

The final datasets are available as XLSX and BIOM files. XLSX opens in Microsoft Excel and presents data in a human-readable manner. BIOM is a standards-compliant format for importing your data into other bioinformatics software (e.g ATIMA).