Glossary

Terms are defined as they are used in VTAM. They might not hold or be sufficiently precise in another context outside VTAM.

ASV or variant

Amplicon Sequence Variant: Unique amplicon sequence (Callahan et al., 2017). Identical reads are pooled into a variant. Variants are characterized by the number of reads in each replicate of each sample, which we also call “sample-replicate”.

ASV table

Representation of presence of each of the variants in each sample; Variants are in lines, samples are in columns, read numbers or presence absence are in cells.

BLAST hit

A sequence from the BLAST database that has significant similarity to the query sequence (variant).

Chimera borderline

When the chimera formation happens near the extremity of the parental sequences, the resulting chimera is very similar to one of the parental sequences. These chimeras are difficult to tell apart from real variants.

Coverage

[in BLAST] the percentage of the length of the query sequence that is covered by the BLAST alignment.

Demultiplexing

Sorting reads to sample-replicates according to the presence of primers and tags at their extremities.

Dereplication

Identical reads are pooled into a variant, and the read count is kept as an information.

Flag

If variants or occurrences are flagged, they remain in the dataset after the corresponding filtering step, but they will be marked (flagged) in the ASV table.

Locus/Gene

genomic region (COI, LSU, MatK, RBCL…)

Lowest Taxonomic Group (LTG)

The taxonomic group of the highest resolution (species is high resolution, phylum is low), that contains all or a given % of the sequences.

Marker

A region amplified by one primer pair.

Merge

Assemble each forward and reverse reads (read pair) to a single sequence.

Mock sample

Sample with known DNA composition.

Modulo

Remainder after a division of one number by another. (e.g. the modulo 3 of 4 is 1)

Occurrence

Presence of a variant in a sample or sample-replicate

Unexpected or ‘delete’ occurrence

A variant in a sample that is known to be erroneous. It can be a variant in a negative control, an unexpected variant in a mock sample, or variant identified from a clearly different habitat than that of the sample.

Expected or ‘keep’ occurrence

A variant that should be present in the given sample after filtering. These are expected variants in the mock samples.

OLSP

One-Locus-Several-Primers Strategy of using more than one primer pairs that amplify the same locus (slight variation in the position of the annealing sites) in order to increase the taxonomic coverage (Corse et al., 2019).

Renkonen distance

1 - sum(p1i, p2i), where p1i is the frequency of variant i in sample-replicate1 (Renkonen, 1938)

Replicate

or Replicate series : Pool of a single replicate from each sample of a run.

Run

A pool of samples and the associated positive (mock) and negative controls. Ideally, they are obtained in the same sequencing run.

Sample

DNA extraction from a given environment/individual.

Sample-Replicate

Technical replicate of the same sample. e.g different PCR on the same DNA extraction.

Tag-jump

Generation of artefactual sequences in which amplicons carry different tags than originally applied (Schnell et al., 2015)

Tag

Short DNA sequences present at one or both extremities of the amplified DNA fragment. A tag or the combination of forward and reverse tags determine the sample-replicate where the read comes from.

Trimming

Removing part of the extremities of a sequence (e.g. trim the tags/adapters/primers from a read to obtain the biological sequence)

TSV

A text file format with tab-separated values.