VTAM - Validation and Taxonomic Assignation of Metabarcoding Data¶
VTAM is a metabarcoding package with various commands to process high throughput sequencing (HTS) data of amplicons of one or several metabarcoding markers in FASTQ format and produce a table of amplicon sequence variants (ASVs) assigned to taxonomic groups. If you use VTAM in scientific works, please cite the following article:
González, A., Dubut, V., Corse, E., Mekdad, R., Dechartre, T. and Meglécz, E.. VTAM: A robust pipeline for processing metabarcoding data using internal controls. Submitted to Methods in Ecology and Evolution.
Commands for a quick installation:
conda create --name vtam python=3.7 -y
python3 -m pip install --upgrade cutadapt
conda install -c bioconda blast
conda install -c bioconda vsearch
python3 -m pip install --upgrade vtam
Commands for a quick working example:
vtam example
cd example
snakemake --printshellcmds --resources db=1 --snakefile snakefile.yml --cores 4 --configfile asper1/user_input/snakeconfig_mfzr.yml --until asvtable_taxa
The table of amplicon sequence variants (ASV) is here:
(vtam) user@host:~/vtam/example$ head -n4 asper1/run1_mfzr/asvtable_default_taxa.tsv
run marker variant sequence_length read_count tpos1_run1 tnegtag_run1 14ben01 14ben02 clusterid clustersize chimera_borderlineltg_tax_id ltg_tax_name ltg_rank identity blast_db phylum class order family genus species sequence
run1 MFZR 25 181 478 478 0 0 0 25 1 False 131567 cellular organisms no rank 80 coi_blast_db_20200420 ACTATACCTTATCTTCGCAGTATTCTCAGGAATGCTAGGAACTGCTTTTAGTGTTCTTATTCGAATGGAACTAACATCTCCAGGTGTACAATACCTACAGGGAAACCACCAACTTTACAATGTAATCATTACAGCTCACGCATTCCTAATGATCTTTTTCATGGTTATGCCAGGACTTGTT
run1 MFZR 51 181 165 0 0 0 165 51 1 False coi_blast_db_20200420 ACTATATTTAATTTTTGCTGCAATTTCTGGTGTAGCAGGAACTACGCTTTCATTGTTTATTAGAGCTACATTAGCGACACCAAATTCTGGTGTTTTAGATTATAATTACCATTTGTATAATGTTATAGTTACGGGTCATGCTTTTTTGATGATCTTTTTTTTAGTAATGCCTGCTTTATTG
run1 MFZR 88 175 640 640 0 0 0 88 1 False 1592914 Caenis pusilla species 100 coi_blast_db_20200420 Arthropoda Insecta Ephemeroptera Caenidae Caenis Caenis pusilla ACTATATTTTATTTTTGGGGCTTGATCCGGAATGCTGGGCACCTCTCTAAGCCTTCTAATTCGTGCCGAGCTGGGGCACCCGGGTTCTTTAATTGGCGACGATCAAATTTACAATGTAATCGTCACAGCCCATGCTTTTATTATGATTTTTTTCATGGTTATGCCTATTATAATC
The database of intermediate data is here:
(vtam) user@host:~/vtam/example$ sqlite3 asper1/db.sqlite '.tables'
FilterChimera Sample
FilterChimeraBorderline SampleInformation
FilterCodonStop SortedReadFile
FilterIndel TaxAssign
FilterLFN Variant
FilterLFNreference VariantReadCount
FilterMinReplicateNumber wom_Execution
FilterMinReplicateNumber2 wom_FileInputOutputInformation
FilterMinReplicateNumber3 wom_Option
FilterPCRerror wom_TableInputOutputInformation
FilterRenkonen wom_TableModificationTime
Marker wom_ToolWrapper
ReadCountAverageOverReplicates wom_TypeInputOrOutput
Run
Table of Contents¶
- Overview of VTAM
- Installation
- Tutorial
- Data
- merge: Merge FASTQ files
- random_seq: Create a smaller randomized dataset from the main dataset (Optionnal)
- sortreads: Demultiplex and trim the reads
- filter: Filter variants and create the ASV table
- taxassign: Assign variants of ASV table to taxa
- make_known_occurrences: Create file containing the known_occurences.tsv to be used as an inut for optimize
- optimize: Compute optimal filter parameters based on mock and negative samples
- filter: Create an ASV table with optimal parameters and assign variants to taxa
- Add new run-marker data to the existing database
- Running VTAM for data with several run-marker combinations
- Run VTAM with snakemake
- Reference
- The numerical parameter file
- The command merge
- The command sortreads
- The command random_seq (OPTIONNAL)
- The command filter
- The command taxassign
- The command make_known_occurrences (OPTIONNAL)
- The command optimize
- The command pool
- The command taxonomy and the taxonomic lineage input
- The BLAST database
- Traceability
- Input/Output Files
- params
- fastqinfo
- fastainfo
- sortedinfo
- db
- asvtable
known_occurrences
mock_composition
sample_types
missing_occurrences
optimize_lfn_sample_replicate.tsv
optimize_lfn_read_count_and_lfn_variant.tsv OR optimize_lfn_read_count_and_lfn_variant_replicate.tsv
optimize_lfn_variant_specific.tsv OR optimize_lfn_variant_replicate_specific.tsv
optimize_pcr_error.tsv
- output (taxassign)
- taxonomy
- runmarker
- Glossary
- List of References
- Citation
- Change Log
- Contributor Guide