Input/Output Files¶
Help on the usage and complete list of I/O arguments of each command can be obtained using the command line help
vtam COMMAND --help
i.e.
vtam filter --help
Here we detail the content of the I/O files
params¶
Input of most commands. YML file with numerical parameters. Can be omitted if all parameters are by default. Simple text file with a “parameter name: parameter value” format. One parameter per line e.g.
lfn_variant_cutoff: 0.001
lfn_sample_replicate_cutoff: 0.003
lfn_read_count_cutoff: 70
pcr_error_var_prop: 0.05
fastqinfo¶
Input of merge. TSV file with the following columns:
- TagFwd: Sequence of the tag on the forward primer (5’=>3’)
- PrimerFwd: Sequence of the forward primer (5’=>3’)
- TagRev: Sequence of the tag on the reverse primer (5’=>3’)
- PrimerRev: Sequence of the reverse primer (5’=>3’)
- Marker: Name of the marker (e.g. MFZR)
- Sample: Name of the sample
- Replicate: ID of the replicate
- Run: Name of the sequencing run
- FastqFwd: Name of the forward fastq file
- FastqRev: Name of the reverse fastq file
fastainfo¶
Output of merge, input of sortreads. TSV file with the following columns:
- run: Name of the sequencing run
- marker: Name of the marker (e.g. MFZR)
- sample: Name of the sample
- replicate: ID of the replicate
- tagfwd: Sequence of the tag on the forward primer (5’=>3’)
- primerfwd: Sequence of the forward primer (5’=>3’)
- tagrev: Sequence of the tag on the revrese primer (5’=>3’)
- primerrev: Sequence of the reverse primer (5’=>3’)
- mergedfasta: name of the fasta file with merged sequences
sortedinfo¶
Output of sortreads, input of filter and optimize. TSV file with the following columns:
- run: Name of the sequencing run
- marker: Name of the marker (e.g. MFZR)
- sample: Name of the sample
- replicate: ID of the replicate
- sortedfasta: name of the fasta file containing merged, demultiplexed, trimmed sequeces
db¶
I/O of filter, taxassign. Input of optimize, pool. Sqlite database containing variants, samples, replicates, read counts, information on filtering steps, taxonomic assignations.
asvtable¶
Output of filter or pool, input of taxassign. TSV file with the variants (in lines) that passed all filtering steps, samples (in columns), presence-absence (output of pool) or read counts (output of filter) in cells and additional columns:
- run: Name of the sequencing run
- marker: Name of the marker (e.g. MFZR)
- variant: Variant ID
- pooled_variants (only in output of pool): IDs of variants pooled since identical in their overlapping regions
- sequence_length length of the variant
- read_count: Total number of reads of the variants in the samples listed in the table
- [one column per sample] : presence-absence (output of pool) or read counts (output of filter)
- clusterid: ID of the centroïd of the cluster (0.97 clustering of all variants of the asv table)
- clustersize: Number of variants in the cluster
- chimera_borderline (only in output of filter): Potential chimeras (very similar to one of the parental sequence)
- [keep_mockXX; One column per mock sample, if known_occurrences option is used]: 1 if variant is expected in the mock sample, 0 otherwise
- pooled_sequences (only in output of pool): Sequences of pooled_variants
- sequence: Sequence of the variant
known_occurrences¶
Input of filter and optimize. Output of make_known_occurrences. TSV file with expected occurrences (keep) and known false positives (delete).
- marker: Name of the marker (e.g. MFZR)
- run: Name of the sequencing run
- sample: Name of the sample
- mock: 1 if sample is a mock, 0 otherwise
- variant: Varinat ID (can be empty)
- action: keep (occurrences that should be kept after filtering) or delete (clear false positives)
- sequence: Sequence of the variant
- tax_name: optional, not used by optimize
mock_composition¶
Input of filter. TSV file with expected sequences (keep) in mock samples.
- marker: Name of the marker (e.g. MFZR)
- run: Name of the sequencing run
- sample: Name of the sample
- mock: 1 if sample is a mock, 0 otherwise
- variant: Variant ID (can be empty)
- action: keep (occurrences that should be kept after filtering) or delete (clear false positives) or tolerate (variant present in a mock sample but amplifies badly)
- sequence: Sequence of the variant
- tax_name: optional, not used by optimize
sample_types¶
Input of make_known_occurrences. TSV file.
- run: Name of the sequencing run
- sample: Name of the sample
- sample_type: real/negative(negative control)/mock
- habitat: habitat type (e.g. freshwater, marine), NA for negative contol samples. It is used to detect occurrences that do not correspond to the habitat type.
missing_occurrences¶
Output of make_known_occurrences. TSV file with keep occurrences that are missing from the input ASV table.
- marker: Name of the marker (e.g. MFZR)
- run: Name of the sequencing run
- sample: Name of the sample
- mock: 1 if sample is a mock, 0 otherwise
- variant: Variant ID (can be empty)
- action: keep (occurrences that should be kept after filtering) or delete (clear false positives)
- sequence: Sequence of the variant
- tax_name: optional, not used by optimize
optimize_lfn_sample_replicate.tsv¶
Output of optimize. TSV file with the following columns:
- run: Name of the sequencing run
- marker: Name of the marker (e.g. MFZR)
- sample: Name of the sample
- replicate: ID of the replicate
- variant: Variant ID
- N_ijk: Number of reads of variant i, in sample j and replicate k
- N_jk: Number of reads in sample j and replicate k (all variants)
- N_ijk/N_jk
- round_down: Rounded value of N_ijk/N_jk
- sequence: Variant sequence
optimize_lfn_read_count_and_lfn_variant.tsv OR optimize_lfn_read_count_and_lfn_variant_replicate.tsv¶
Output of optimize. TSV file with the following columns:
- occurrence_nb_keep: Number of keep occurrence left after filtering with lfn_nijk_cutoff and lfn_variant_cutoff values
- occurrence_nb_delete: Number of delete occurrence left after filtering with lfn_nijk_cutoff and lfn_variant_cutoff values
- lfn_nijk_cutoff: lfn_read_count_cutoff
- lfn_variant_cutoff or lfn_variant_replicate_cutoff
- run: Name of the sequencing run
- marker: Name of the marker (e.g. MFZR)
optimize_lfn_variant_specific.tsv OR optimize_lfn_variant_replicate_specific.tsv¶
Output of optimize. TSV file with the following columns:
- run: Name of the sequencing run
- marker: Name of the marker (e.g. MFZR)
- variant: Variant ID
- replicate: (if optimize_lfn_variant_replicate_specific.tsv) ID of the replicate
- action: Type d’occurrece (delete/keep)
- read_count_max: Max of N_ijk for a given i
- N_i (optimize_lfn_variant_specific.tsv) : Number of reads of variant i
- N_ik (optimize_lfn_variant_replicate_specific.tsv): Number of reads of variant i in replicate k
- lfn_variant_cutoff: read_count_max/N_i or read_count_max/N_ik
- sequence: Variant sequence
optimize_pcr_error.tsv¶
Output of optimize. TSV file with the following columns:
- run: Name of the sequencing run
- marker: Name of the marker (e.g. MFZR)
- sample: Name of the sample
- variant_expected: ID of a keep variant
- N_ij_expected: Number of reads of the expected variant in the sample (all replicates)
- variant_unexpected: ID of an unexpected variants with one mismatch to the keep variant
- N_ij_unexpected: Number of reads of the unexpected variant in the sample (all replicates)
- N_ij_unexpected_to_expected_ratio: N_ij_unexpected/N_ij_expected
- sequence_expected: Sequence of the expected variant
- sequence_unexpected: Sequence of the unexpected variant
output (taxassign)¶
Output of taxassign The input asvtable completed with the following columns:
- ltg_tax_id: TaxID of the LTG (Lowest Taxonomic Group)
- ltg_tax_name ltg_rank: Name of the LTG
- identity: Percentage of identity used to determine the LTG
- blast_db: Name of the taxonomic BLAST database files (without extensions)
- phylum: Phylum of LTG
- class: class of LTG
- order: order of LTG
- family: family of LTG
- genus: genus of LTG
- species: species of LTG
taxonomy¶
Output of taxonomy, input of taxassign. TSV file with information of all taxa in the reference (BLAST) database.
- tax_id: Taxonomic identifier of the taxon
- parent_tax_id: Taxonomic identifier of the direct parent of the taxon
- rank: Taxonomic rank of the taxon (e.g. class, species, no rank)
- name_txt: Name of the taxon
- old_tax_id: TaxID of taxa merged to taxon (not valid any more)
- taxlevel index (optional; 0 = root, 1 = superkingdom, 2 = kingdom, 3 = phylum, 4 = class, 5 = order, 6 = family, 7 = genus, 8 = species, x.5 for intermediate levels)