nf-core/detaxizer      
 A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxon to identify (and remove) is Homo sapiens. Removal is optional.
 de-identificationdecontaminationednafastqfilterlong-readsmetabarcodingmetagenomicsmicrobiomenanoporeshort-readsshotguntaxonomic-classificationtaxonomic-profiling 
   Version history
Summary of changes
- filtering is set now by default
- defaults reflect best settings from benchmarking human decontamination
- improvements to memory and time requirements
Detailed changes
Added
- PR #70 - Filtering is now default, --skip_filterwas added (by @d4straub)
- PR #71 - Add usage information learned from our benchmarking (by @d4straub)
Changed
- PR #65,PR #69 - Template update for nf-core/tools 3.3.2 (by @d4straub)
- PR #72 - Default for --kraken2dbwas changed from ‘https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240904.tar.gz’ to ‘https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240605.tar.gz’. That database is much larger (60GB) but default settings will therefore reflect best decontamination performance in benchmarks (by @d4straub)
- PR #73 - Doubled memory allocation for ISOLATE_BBDUK_IDS (by @d4straub)
- PR #75 - Updated version and contributors (by @d4straub)
Fixed
- PR #62 - Use dnaio to reduce memory spikes during renaming (by @bede)
- PR #77 - Fixed conda versions to exactly follow container versions (by @d4straub)
- PR #78 - Update typos, re-add code comments (by @d4straub)
Dependencies
| Software | Previous version | New version | 
|---|---|---|
| MultiQC | 1.27 | 1.29 | 
| tar | 1.3 | 1.34 | 
Removed
 Added
- PR #34 - Added bbduk to the classification step (kraken2 as default, both can be run together) (by @jannikseidelQBiC)
- PR #34 - Added --fasta_bbdukparameter to provide a fasta file with contaminants (by @jannikseidelQBiC)
- PR #34 - Rewrote summary step of classification to be usable with bbduk and/or kraken2 (by @jannikseidelQBiC)
- PR #34 - Made preprocessing with fastp optional and added the parameter --fastp_eval_duplicationto turn on duplication removal (off as default, was on/not changeable in v1.0.0) (by @jannikseidelQBiC)
- PR #34 - Optionally the removed reads can now be written to the output folder (by @jannikseidelQBiC)
- PR #34 - Added optional classification of filtered and removed reads via kraken2 (by @jannikseidelQBiC)
- PR #39 - Added generation of input samplesheet for nf-core/mag, nf-core/taxprofiler (by @Joon-Klaps)
Parameters
Added parameters:
| Parameter | 
|---|
| --fasta_bbduk | 
| --preprocessing | 
| --output_removed_reads | 
| --classification_kraken2 | 
| --classification_bbduk | 
| --kraken2confidence_filtered | 
| --kraken2confidence_removed | 
| --classification_kraken2_post_filtering | 
| --fastp_eval_duplication | 
| --bbduk_kmers | 
Changed default values of parameters:
| Parameter | Old default value | New default value | 
|---|---|---|
| --fastp_cut_mean_quality | 15 | 1 | 
| --kraken2db | ’https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20231009.tar.gz' | 'https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240605.tar.gz’ | 
| --kraken2confidence | 0.05 | 0.00 | 
| --tax2filter | ’Homo' | 'Homo sapiens’ | 
| --cutoff_tax2filter | 2 | 0 | 
| --cutoff_tax2keep | 0.5 | 0.0 | 
Changed
- PR #42 - Template update for nf-core/tools 3.0.2, for details read this blog post
Fixed
- PR #33 - Addition of quotation marks in parse_kraken2report.nfprevents failure of the pipeline when using a taxon with space (e.g. Homo sapiens) with the--tax2filterparameter (by @jannikseidelQBiC)
- PR #34 - Made validation via blastn optional by default (by @jannikseidelQBiC)
- PR #34 - Changed parameter --fastato--fasta_blastn(by @jannikseidelQBiC)
Dependencies
Updated and added dependencies
| Tool | Previous version | Current version | 
|---|---|---|
| bbmap | - | 39.10 | 
| blastn | 2.14.1 | 2.15.0 | 
| multiQC | 1.21 | 1.25.1 | 
| kraken2 | 2.1.2 | 2.1.3 | 
| seqkit | 2.8.0 | 2.8.2 | 
Deprecated
| Parameter | New parameter | Reason | 
|---|---|---|
| --fasta | --fasta_blastn | Introduction of fasta_bbduk; necessary to further distinguish the two parameters | 
| --skip_blastn | --validation_blastn | blastn is now to be enabled on purpose; too resource intensive for a default setting | 
| --max_cpus | - | New behavior of nextflow, resourceLimitscan now be set via a config | 
| --max_memory | - | New behavior of nextflow, resourceLimitscan now be set via a config | 
| --max_time | - | New behavior of nextflow, resourceLimitscan now be set via a config | 
First release of nf-core/detaxizer!
This is the initial version of the pipeline:
- Read QC (FastQC)
- Pre-processing (fastp)
- Classification of reads (Kraken2)
- Optional validation of searched taxon/taxa (blastn)
- Optional filtering of the searched taxon/taxa from the reads (either from the raw files or the preprocessed reads, using either the output from kraken2 or blastn)
- Summary of the processes (how many reads were initially present after preprocessing, how many were classified as the tax2filterplus potential taxonomic subtree and optionally how many were validated)
- Present QC for raw reads (MultiQC)