Skip to content
Commits on Source (7)
...@@ -87,7 +87,7 @@ NextFlow pipeline used by the Developmental Cognitive Neuroscience Lab (DCNL) to ...@@ -87,7 +87,7 @@ NextFlow pipeline used by the Developmental Cognitive Neuroscience Lab (DCNL) to
## Pipeline parameters ## Pipeline parameters
The default values for all parameters are set in `src/nexflow.config`. Please notice that it's required to overwrite a few because they depend on the procedure you need to run (e.g., `--step`). The specifics of each are described next. The default values for all parameters are set in `src/nextflow.config`. Please notice that it's required to overwrite a few because they depend on the procedure you need to run (e.g., `--step`). The specifics of each are described next.
### Global options ### Global options
...@@ -100,7 +100,7 @@ The default values for all parameters are set in `src/nexflow.config`. Please no ...@@ -100,7 +100,7 @@ The default values for all parameters are set in `src/nexflow.config`. Please no
```txt ```txt
--step --step
<Type: String. Options are "1", "2_from_step_1", "2_from_minknow", "3". Select 1 for basecalling. Select "2_from_step_1" for alignment filtering and quality control continuing after executing step 1 of this pipeline. Select "2_from_minknow" for alignment filtering and quality control from input basecalled using MinKNOW. Select "3" for methylation call pre-processing with modkit and generating a thorough multiQC report for sequencing stats. This parameter needs to be set for the pipeline to run. Default: "None"> <Type: String. Options are "1", "2_from_step_1", "2_from_minknow", "3". Select 1 for basecalling. Select "2_from_step_1" for alignment filtering and quality control continuing after executing step 1 of this pipeline. Select "2_from_minknow" for alignment filtering and quality control from input basecalled using MinKNOW. Select "3" for methylation call pre-processing with modkit and generating a thorough multiQC report for sequencing stats. This parameter needs to be set for the pipeline to run. Default: null>
``` ```
```txt ```txt
...@@ -112,13 +112,13 @@ The default values for all parameters are set in `src/nexflow.config`. Please no ...@@ -112,13 +112,13 @@ The default values for all parameters are set in `src/nexflow.config`. Please no
```txt ```txt
--steps_2_and_3_input_directory --steps_2_and_3_input_directory
<Type: Path. When performing a step other than 1, this parameter must be set to the output path of the step 1. Example: "./results/<out_dir>". Default = "None"> <Type: Path. When performing a step other than 1, this parameter must be set to the output path of the step 1. Example: "./results/<out_dir>". Default = null>
``` ```
```txt ```txt
--prefix --prefix
<Type: String. Adds a prefix to the beggining of your filenames, good when wanting to keep track of batches of data. Example: "Batch_1". Default: "None"> <Type: String. Adds a prefix to the beggining of your filenames, good when wanting to keep track of batches of data. Example: "Batch_1". Default: null>
``` ```
```txt ```txt
...@@ -158,7 +158,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [ ...@@ -158,7 +158,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [
```txt ```txt
--basecall_config --basecall_config
<Type: String Configuration name for basecalling setting. This is not necessary since dorado is able to automatically determine the appropriate configuration. Default: "None"> <Type: String Configuration name for basecalling setting. This is not necessary since dorado is able to automatically determine the appropriate configuration. Default: null>
``` ```
```txt ```txt
...@@ -170,7 +170,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [ ...@@ -170,7 +170,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [
```txt ```txt
--barcoding_kit --barcoding_kit
<Type: String. Kit name used to barcode the samples. Use "None" to skip --kit-name in basecalling. Default: "SQK-RBK114-24"> <Type: String. Kit name used to barcode the samples. Use null to skip --kit-name in basecalling. Default: "SQK-RBK114-24">
``` ```
```txt ```txt
...@@ -182,13 +182,13 @@ Many of the parameters for this step are based on dorado basecaller, see their [ ...@@ -182,13 +182,13 @@ Many of the parameters for this step are based on dorado basecaller, see their [
```txt ```txt
--basecall_demux --basecall_demux
<Type: Boolean. "True", "False". Whether you want the data to be demultiplexed setting it to "True" will perform demultiplexing. Default: false> <Type: Boolean. "true", "false". Whether you want the data to be demultiplexed setting it to "true" will perform demultiplexing. Default: false>
``` ```
```txt ```txt
--trimmed_barcodes --trimmed_barcodes
<Type: Boolean. "True", "False". Only relevant is --demux is set to "True". if set to "True" barcodes will be trimmed during demultiplexing and will not be present in output "fastq" files. Default: "False"> <Type: Boolean. "true", "false". Only relevant is --demux is set to "true". if set to "true" barcodes will be trimmed during demultiplexing and will not be present in output "fastq" files. Default: true>
``` ```
```txt ```txt
...@@ -226,7 +226,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [ ...@@ -226,7 +226,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [
```txt ```txt
--is_barcoded --is_barcoded
<Type: Boolean. Only applies if performing "step_2_from_minknow", this parameter will be ignore for "step_2_from_step_1". If is_barcoded is set to "True" the files will be grouped by barcode, otherwise all files will be grouped by sequencing run regardless of barcode. Default: True> <Type: Boolean. Only applies if performing "step_2_from_minknow", this parameter will be ignore for "step_2_from_step_1". If is_barcoded is set to "True" the files will be grouped by barcode, otherwise all files will be grouped by sequencing run regardless of barcode. Default: true>
``` ```
### Step 3: Methylation Calling and MultiQC ### Step 3: Methylation Calling and MultiQC
...@@ -310,10 +310,8 @@ The following examples assume your current directory is the root directory of th ...@@ -310,10 +310,8 @@ The following examples assume your current directory is the root directory of th
--gpu_devices "all" \ --gpu_devices "all" \
--basecall_mods "5mC_5hmC" \ --basecall_mods "5mC_5hmC" \
--qscore_thresh 9 \ --qscore_thresh 9 \
--basecall_config "False" \
--basecall_trim "none" \ --basecall_trim "none" \
--basecall_compute "gpu" \ --basecall_compute "gpu" \
--basecall_demux "False" \
--queue_size 1 \ --queue_size 1 \
--out_dir "$OUTPUT_DIR_NAME" --out_dir "$OUTPUT_DIR_NAME"
``` ```
...@@ -335,7 +333,7 @@ The following examples assume your current directory is the root directory of th ...@@ -335,7 +333,7 @@ The following examples assume your current directory is the root directory of th
nextflow ./src/main.nf \ nextflow ./src/main.nf \
--steps_2_and_3_input_directory "./results/$OUTPUT_DIR_NAME/" \ --steps_2_and_3_input_directory "./results/$OUTPUT_DIR_NAME/" \
--min_mapped_reads_thresh 500 \ --min_mapped_reads_thresh 500 \
--is_barcoded "True" \ --is_barcoded \
--qscore_thresh 9 \ --qscore_thresh 9 \
--mapq 10 \ --mapq 10 \
--step "2_from_step_1" --step "2_from_step_1"
......
...@@ -56,16 +56,16 @@ workflow { ...@@ -56,16 +56,16 @@ workflow {
samtools_threads = Channel.value(params.samtools_threads) samtools_threads = Channel.value(params.samtools_threads)
// step conditionals // step conditionals
if (params.step == 1) { if (params.step == 1) {
if (params.prefix == "None") { if (params.prefix == null) {
fast5_path = Channel.fromPath("${params.basecall_path}/**.fast5").map{file -> tuple(file.parent.toString().split("/")[-3] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple() fast5_path = Channel.fromPath("${params.basecall_path}/**.fast5").map{file -> tuple(file.parent.toString().split("/")[-3] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
pod5_path = Channel.fromPath("${params.basecall_path}/**.pod5").map{file -> tuple(file.parent.toString().split("/")[-3] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple() pod5_path = Channel.fromPath("${params.basecall_path}/**.pod5").map{file -> tuple(file.parent.toString().split("/")[-3] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
} else { } else {
fast5_path = Channel.fromPath("${params.basecall_path}/**.fast5").map{file -> tuple("${params.prefix}_" + file.parent.toString().split("/")[-2] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple() fast5_path = Channel.fromPath("${params.basecall_path}/**.fast5").map{file -> tuple("${params.prefix}_" + file.parent.toString().split("/")[-2] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
pod5_path = Channel.fromPath("${params.basecall_path}/**.pod5").map{file -> tuple("${params.prefix}_" + file.parent.toString().split("/")[-2] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple() pod5_path = Channel.fromPath("${params.basecall_path}/**.pod5").map{file -> tuple("${params.prefix}_" + file.parent.toString().split("/")[-2] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
} }
basecall_speed = Channel.value(params.basecall_speed) basecall_speed = params.basecall_speed
basecall_mods = Channel.value(params.basecall_mods) basecall_mods = params.basecall_mods
basecall_config = Channel.value(params.basecall_config) basecall_config = params.basecall_config
basecall_trim = Channel.value(params.basecall_trim) basecall_trim = Channel.value(params.basecall_trim)
qscore_thresh = Channel.value(params.qscore_thresh) qscore_thresh = Channel.value(params.qscore_thresh)
barcoding_kit = Channel.value(params.barcoding_kit) barcoding_kit = Channel.value(params.barcoding_kit)
......
...@@ -24,9 +24,7 @@ process BASECALL { ...@@ -24,9 +24,7 @@ process BASECALL {
input: input:
tuple val(id), path(pod5_dir) tuple val(id), path(pod5_dir)
val basecall_speed val basecall_arg
val basecall_mods
val basecall_config
val basecall_trim val basecall_trim
val qscore_thresh val qscore_thresh
val barcoding_kit val barcoding_kit
...@@ -42,30 +40,12 @@ process BASECALL { ...@@ -42,30 +40,12 @@ process BASECALL {
script: script:
""" """
echo "Basecalling started for: ${id}" echo "Basecalling started for: ${id}"
if [[ "${basecall_config}" == "None" ]]; then dorado basecaller "${basecall_arg}" . \
if [[ "${basecall_mods}" == "None" ]]; then ${barcoding_kit != null ? "--kit-name ${barcoding_kit}" : ""} \
dorado basecaller "${basecall_speed}" . \ --trim "${basecall_trim}" \
${barcoding_kit != "None" ? "--kit-name ${barcoding_kit}" : ""} \ --min-qscore "${qscore_thresh}" \
--trim "${basecall_trim}" \ --reference "${reference_file}" \
--min-qscore "${qscore_thresh}" \ --device "cuda:${gpu_devices}" > "${id}.bam"
--reference "${reference_file}" \
--device "cuda:${gpu_devices}" > "${id}.bam"
else
dorado basecaller "${basecall_speed},${basecall_mods}" . \
${barcoding_kit != "None" ? "--kit-name ${barcoding_kit}" : ""} \
--trim "${basecall_trim}" \
--min-qscore "${qscore_thresh}" \
--reference "${reference_file}" \
--device "cuda:${gpu_devices}" > "${id}.bam"
fi
else
dorado basecaller "${basecall_config}" . \
${barcoding_kit != "None" ? "--kit-name ${barcoding_kit}" : ""} \
--trim "${basecall_trim}" \
--min-qscore "${qscore_thresh}" \
--reference "${reference_file}" \
--device "cuda:${gpu_devices}" > "${id}.bam"
fi
echo "Basecalling completed, sorting bams..." echo "Basecalling completed, sorting bams..."
samtools sort -@ ${samtools_threads} "${id}.bam" -o "${id}_sorted.bam" samtools sort -@ ${samtools_threads} "${id}.bam" -o "${id}_sorted.bam"
...@@ -73,7 +53,7 @@ process BASECALL { ...@@ -73,7 +53,7 @@ process BASECALL {
mv "${id}_sorted.bam" "${id}.bam" mv "${id}_sorted.bam" "${id}.bam"
echo "Bams sorted, demultiplexing..." echo "Bams sorted, demultiplexing..."
if [[ "${trimmed_barcodes}" == "True" ]]; then if [[ ${trimmed_barcodes} ]]; then
echo "Demultiplexing with barcode trimming..." echo "Demultiplexing with barcode trimming..."
dorado demux --output-dir "./demux_data/" --no-classify "${id}.bam" dorado demux --output-dir "./demux_data/" --no-classify "${id}.bam"
else else
......
...@@ -58,7 +58,7 @@ process CONVERT_INPUT_FROM_MINKNOW_NOT_BARCODED { ...@@ -58,7 +58,7 @@ process CONVERT_INPUT_FROM_MINKNOW_NOT_BARCODED {
script: script:
""" """
# Define the input directory path # Define the input directory path
input_dir="${input.toString()}" input_dir="${input}"
# Check if the input directory exists # Check if the input directory exists
if [ -d "\${input_dir}" ]; then if [ -d "\${input_dir}" ]; then
echo "Input directory exists." echo "Input directory exists."
......
...@@ -8,9 +8,9 @@ params { ...@@ -8,9 +8,9 @@ params {
// Project name (used to identify which project you're working on) // Project name (used to identify which project you're working on)
project_name = "default" project_name = "default"
// Input reference fasta file // Input reference fasta file
reference_file = "None" reference_file = null
// Step of pipeline to execute // Step of pipeline to execute
step = "None" step = null
// Output directory for pipeline results // Output directory for pipeline results
out_dir = "results_${params.project_name}/" out_dir = "results_${params.project_name}/"
// directory of basecalling data // directory of basecalling data
...@@ -23,24 +23,24 @@ params { ...@@ -23,24 +23,24 @@ params {
basecall_speed = "sup@latest" basecall_speed = "sup@latest"
// Desired basecaller modifications (4mC_5mC, 5mCG_5hmCG, 5mC_5hmC, 6mA). Can't use more than one modification per nucleotide. // Desired basecaller modifications (4mC_5mC, 5mCG_5hmCG, 5mC_5hmC, 6mA). Can't use more than one modification per nucleotide.
basecall_mods = "5mC_5hmC" basecall_mods = "5mC_5hmC"
// Kit name (kit used to barcode the samples (e.g. SQK-RBK114-24); Use "None" to skip --kit-name in basecalling) // Kit name (kit used to barcode the samples (e.g. SQK-RBK114-24); Use null to skip --kit-name in basecalling)
barcoding_kit = "SQK-RBK114-24" barcoding_kit = "SQK-RBK114-24"
// Threshold for mapped reasds // Threshold for mapped reasds
min_mapped_reads_thresh = 500 min_mapped_reads_thresh = 500
// Desired basecall model version as a path (e.g. ./models/dna_r10.4.1_e8.2_400bps_sup@v5.2.0) // Desired basecall model version as a path (e.g. ./models/dna_r10.4.1_e8.2_400bps_sup@v5.2.0)
basecall_config = "None" basecall_config = null
// Type of read trimming during basecalling ("all", "primers", "adapters", "none"); You should change to "none" if you don't want to trim in the basecalling // Type of read trimming during basecalling ("all", "primers", "adapters", "none"); You should change to "none" if you don't want to trim in the basecalling
basecall_trim = "all" basecall_trim = "all"
// Basecalling demultiplexing // Basecalling demultiplexing
basecall_demux = false basecall_demux = false
// Barcodes were trimmed? (if True = demux will only separate the files; if False = demux will trim after basecalling and separate them) // Barcodes were trimmed? (if True = demux will only separate the files; if False = demux will trim after basecalling and separate them)
trimmed_barcodes = "True" trimmed_barcodes = true
// Add prefix to all output files // Add prefix to all output files
prefix = "None" prefix = null
// Which GPU devices to use for basecalling? // Which GPU devices to use for basecalling?
gpu_devices = "all" gpu_devices = "all"
// Previous results // Previous results
steps_2_and_3_input_directory = "None" steps_2_and_3_input_directory = null
// MultiQC config // MultiQC config
multiqc_config = "./references/multiqc_config.yaml" multiqc_config = "./references/multiqc_config.yaml"
// Are the files from MinKNOW barcoded or not // Are the files from MinKNOW barcoded or not
......
...@@ -18,8 +18,18 @@ workflow BASECALLING { ...@@ -18,8 +18,18 @@ workflow BASECALLING {
main: main:
FAST5_to_POD5(fast5_path, pod5_threads) FAST5_to_POD5(fast5_path, pod5_threads)
pod5_path = FAST5_to_POD5.out.mix(pod5_path) pod5_path = FAST5_to_POD5.out.mix(pod5_path)
BASECALL(pod5_path, basecall_speed, basecall_mods, basecall_config, basecall_trim, qscore_thresh, barcoding_kit, trimmed_barcodes, gpu_devices, reference_file, samtools_threads) // Saves model, modifications and speed on a separate variable for the basecall
basecall_arg = null
if(basecall_config != null) {
basecall_arg = basecall_config
} else if(basecall_mods != null) {
basecall_arg = "${basecall_speed},${basecall_mods}"
} else {
basecall_arg = basecall_speed
}
basecall_arg = Channel.value(basecall_arg)
BASECALL(pod5_path, basecall_arg, basecall_trim, qscore_thresh, barcoding_kit, trimmed_barcodes, gpu_devices, reference_file, samtools_threads)
bams = BASECALL.out.bam.toSortedList { a, b -> a[0] <=> b[0] }.flatten().buffer(size: 2) bams = BASECALL.out.bam.toSortedList { a, b -> a[0] <=> b[0] }.flatten().buffer(size: 2)
txts = BASECALL.out.txt.toSortedList { a, b -> a.baseName <=> b.baseName }.flatten() txts = BASECALL.out.txt.toSortedList { a, b -> a.baseName <=> b.baseName }.flatten()
......