Input/Output Files¶

Help on the usage and complete list of I/O arguments of each command can be obtained using the command line help

vtam COMMAND --help
i.e.
vtam filter --help

Here we detail the content of the I/O files

params¶

Input of most commands. YML file with numerical parameters. Can be omitted if all parameters are by default. Simple text file with a “parameter name: parameter value” format. One parameter per line e.g.

lfn_variant_cutoff: 0.001
lfn_sample_replicate_cutoff: 0.003
lfn_read_count_cutoff: 70
pcr_error_var_prop: 0.05

fastqinfo¶

Input of merge. TSV file with the following columns:

TagFwd: Sequence of the tag on the forward primer (5’=>3’)

PrimerFwd: Sequence of the forward primer (5’=>3’)

TagRev: Sequence of the tag on the reverse primer (5’=>3’)

PrimerRev: Sequence of the reverse primer (5’=>3’)

Marker: Name of the marker (e.g. MFZR)

Sample: Name of the sample

Replicate: ID of the replicate

Run: Name of the sequencing run

FastqFwd: Name of the forward fastq file

FastqRev: Name of the reverse fastq file

fastainfo¶

Output of merge, input of sortreads. TSV file with the following columns:

run: Name of the sequencing run

marker: Name of the marker (e.g. MFZR)

sample: Name of the sample

replicate: ID of the replicate

tagfwd: Sequence of the tag on the forward primer (5’=>3’)

primerfwd: Sequence of the forward primer (5’=>3’)

tagrev: Sequence of the tag on the revrese primer (5’=>3’)

primerrev: Sequence of the reverse primer (5’=>3’)

mergedfasta: name of the fasta file with merged sequences

sortedinfo¶

Output of sortreads, input of filter and optimize. TSV file with the following columns:

run: Name of the sequencing run

marker: Name of the marker (e.g. MFZR)

sample: Name of the sample

replicate: ID of the replicate

sortedfasta: name of the fasta file containing merged, demultiplexed, trimmed sequeces

db¶

I/O of filter, taxassign. Input of optimize, pool. Sqlite database containing variants, samples, replicates, read counts, information on filtering steps, taxonomic assignations.

asvtable¶

Output of filter or pool, input of taxassign. TSV file with the variants (in lines) that passed all filtering steps, samples (in columns), presence-absence (output of pool) or read counts (output of filter) in cells and additional columns:

run: Name of the sequencing run

marker: Name of the marker (e.g. MFZR)

variant: Variant ID

pooled_variants (only in output of pool): IDs of variants pooled since identical in their overlapping regions

sequence_length length of the variant

read_count: Total number of reads of the variants in the samples listed in the table

[one column per sample] : presence-absence (output of pool) or read counts (output of filter)

clusterid: ID of the centroïd of the cluster (0.97 clustering of all variants of the asv table)

clustersize: Number of variants in the cluster

chimera_borderline (only in output of filter): Potential chimeras (very similar to one of the parental sequence)

[keep_mockXX; One column per mock sample, if known_occurrences option is used]: 1 if variant is expected in the mock sample, 0 otherwise

pooled_sequences (only in output of pool): Sequences of pooled_variants

sequence: Sequence of the variant

known_occurrences ¶

Input of filter and optimize. Output of make_known_occurrences. TSV file with expected occurrences (keep) and known false positives (delete).

marker: Name of the marker (e.g. MFZR)

run: Name of the sequencing run

sample: Name of the sample

mock: 1 if sample is a mock, 0 otherwise

variant: Varinat ID (can be empty)

action: keep (occurrences that should be kept after filtering) or delete (clear false positives)

sequence: Sequence of the variant

tax_name: optional, not used by optimize

mock_composition ¶

Input of filter. TSV file with expected sequences (keep) in mock samples.

marker: Name of the marker (e.g. MFZR)

run: Name of the sequencing run

sample: Name of the sample

mock: 1 if sample is a mock, 0 otherwise

variant: Variant ID (can be empty)

action: keep (occurrences that should be kept after filtering) or delete (clear false positives) or tolerate (variant present in a mock sample but amplifies badly)

sequence: Sequence of the variant

tax_name: optional, not used by optimize

sample_types ¶

Input of make_known_occurrences. TSV file.

run: Name of the sequencing run

sample: Name of the sample

sample_type: real/negative(negative control)/mock

habitat: habitat type (e.g. freshwater, marine), NA for negative contol samples. It is used to detect occurrences that do not correspond to the habitat type.

missing_occurrences ¶

Output of make_known_occurrences. TSV file with keep occurrences that are missing from the input ASV table.

marker: Name of the marker (e.g. MFZR)

run: Name of the sequencing run

sample: Name of the sample

mock: 1 if sample is a mock, 0 otherwise

variant: Variant ID (can be empty)

action: keep (occurrences that should be kept after filtering) or delete (clear false positives)

sequence: Sequence of the variant

tax_name: optional, not used by optimize

optimize_lfn_sample_replicate.tsv ¶

Output of optimize. TSV file with the following columns:

run: Name of the sequencing run

marker: Name of the marker (e.g. MFZR)

sample: Name of the sample

replicate: ID of the replicate

variant: Variant ID

N_ijk: Number of reads of variant i, in sample j and replicate k

N_jk: Number of reads in sample j and replicate k (all variants)

N_ijk/N_jk

round_down: Rounded value of N_ijk/N_jk

sequence: Variant sequence

optimize_lfn_read_count_and_lfn_variant.tsv OR optimize_lfn_read_count_and_lfn_variant_replicate.tsv ¶

Output of optimize. TSV file with the following columns:

occurrence_nb_keep: Number of keep occurrence left after filtering with lfn_nijk_cutoff and lfn_variant_cutoff values

occurrence_nb_delete: Number of delete occurrence left after filtering with lfn_nijk_cutoff and lfn_variant_cutoff values

lfn_nijk_cutoff: lfn_read_count_cutoff

lfn_variant_cutoff or lfn_variant_replicate_cutoff

run: Name of the sequencing run

marker: Name of the marker (e.g. MFZR)

optimize_lfn_variant_specific.tsv OR optimize_lfn_variant_replicate_specific.tsv ¶

Output of optimize. TSV file with the following columns:

run: Name of the sequencing run

marker: Name of the marker (e.g. MFZR)

variant: Variant ID

replicate: (if optimize_lfn_variant_replicate_specific.tsv) ID of the replicate

action: Type d’occurrece (delete/keep)

read_count_max: Max of N_ijk for a given i

N_i (optimize_lfn_variant_specific.tsv) : Number of reads of variant i

N_ik (optimize_lfn_variant_replicate_specific.tsv): Number of reads of variant i in replicate k

lfn_variant_cutoff: read_count_max/N_i or read_count_max/N_ik

sequence: Variant sequence

optimize_pcr_error.tsv ¶

Output of optimize. TSV file with the following columns:

run: Name of the sequencing run

marker: Name of the marker (e.g. MFZR)

sample: Name of the sample

variant_expected: ID of a keep variant

N_ij_expected: Number of reads of the expected variant in the sample (all replicates)

variant_unexpected: ID of an unexpected variants with one mismatch to the keep variant

N_ij_unexpected: Number of reads of the unexpected variant in the sample (all replicates)

N_ij_unexpected_to_expected_ratio: N_ij_unexpected/N_ij_expected

sequence_expected: Sequence of the expected variant

sequence_unexpected: Sequence of the unexpected variant

output (taxassign)¶

Output of taxassign The input asvtable completed with the following columns:

ltg_tax_id: TaxID of the LTG (Lowest Taxonomic Group)

ltg_tax_name ltg_rank: Name of the LTG

identity: Percentage of identity used to determine the LTG

blast_db: Name of the taxonomic BLAST database files (without extensions)

phylum: Phylum of LTG

class: class of LTG

order: order of LTG

family: family of LTG

genus: genus of LTG

species: species of LTG

taxonomy¶

Output of taxonomy, input of taxassign. TSV file with information of all taxa in the reference (BLAST) database.

tax_id: Taxonomic identifier of the taxon

parent_tax_id: Taxonomic identifier of the direct parent of the taxon

rank: Taxonomic rank of the taxon (e.g. class, species, no rank)

name_txt: Name of the taxon

old_tax_id: TaxID of taxa merged to taxon (not valid any more)

taxlevel index (optional; 0 = root, 1 = superkingdom, 2 = kingdom, 3 = phylum, 4 = class, 5 = order, 6 = family, 7 = genus, 8 = species, x.5 for intermediate levels)

runmarker¶

Input of pool. TSV file with the list of all run-marker combinations to be pooled.

run: Name of the sequencing run

marker: Name of the marker (e.g. MFZR)