Nicolas Cendron · Nicolas Cendron · Nicolas Cendron · Nicolas Cendron · Carlos Gomes · Carlos Gomes
--- a/README.md
+++ b/README.md
@@ -87,7 +87,7 @@ NextFlow pipeline used by the Developmental Cognitive Neuroscience Lab (DCNL) to

 ## Pipeline parameters

-The default values for all parameters are set in `src/nexflow.config`. Please notice that it's required to overwrite a few because they depend on the procedure you need to run (e.g., `--step`). The specifics of each are described next.
+The default values for all parameters are set in `src/nextflow.config`. Please notice that it's required to overwrite a few because they depend on the procedure you need to run (e.g., `--step`). The specifics of each are described next.

 ### Global options

@@ -100,7 +100,7 @@ The default values for all parameters are set in `src/nexflow.config`. Please no
 ```txt
 --step

-<Type: String. Options are "1", "2_from_step_1", "2_from_minknow", "3". Select 1 for basecalling. Select "2_from_step_1" for alignment filtering and quality control continuing after executing step 1 of this pipeline. Select "2_from_minknow" for alignment filtering and quality control from input basecalled using MinKNOW. Select "3" for methylation call pre-processing with modkit and generating a thorough multiQC report for sequencing stats. This parameter needs to be set for the pipeline to run. Default: "None">
+<Type: String. Options are "1", "2_from_step_1", "2_from_minknow", "3". Select 1 for basecalling. Select "2_from_step_1" for alignment filtering and quality control continuing after executing step 1 of this pipeline. Select "2_from_minknow" for alignment filtering and quality control from input basecalled using MinKNOW. Select "3" for methylation call pre-processing with modkit and generating a thorough multiQC report for sequencing stats. This parameter needs to be set for the pipeline to run. Default: null>
 ```

 ```txt
@@ -112,13 +112,13 @@ The default values for all parameters are set in `src/nexflow.config`. Please no
 ```txt
 --steps_2_and_3_input_directory

-<Type: Path. When performing a step other than 1, this parameter must be set to the output path of the step 1. Example: "./results/<out_dir>". Default = "None">
+<Type: Path. When performing a step other than 1, this parameter must be set to the output path of the step 1. Example: "./results/<out_dir>". Default = null>
 ```

 ```txt
 --prefix

-<Type: String. Adds a prefix to the beggining of your filenames, good when wanting to keep track of batches of data. Example: "Batch_1". Default: "None">
+<Type: String. Adds a prefix to the beggining of your filenames, good when wanting to keep track of batches of data. Example: "Batch_1". Default: null>
 ```

 ```txt
@@ -158,7 +158,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [
 ```txt
 --basecall_config

-<Type: String Configuration name for basecalling setting. This is not necessary since dorado  is able to automatically determine the appropriate configuration. Default: "None">
+<Type: String Configuration name for basecalling setting. This is not necessary since dorado  is able to automatically determine the appropriate configuration. Default: null>
 ```

 ```txt
@@ -170,7 +170,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [
 ```txt
 --barcoding_kit

-<Type: String. Kit name used to barcode the samples. Use "None" to skip --kit-name in basecalling. Default: "SQK-RBK114-24">
+<Type: String. Kit name used to barcode the samples. Use null to skip --kit-name in basecalling. Default: "SQK-RBK114-24">
 ```

 ```txt
@@ -182,13 +182,13 @@ Many of the parameters for this step are based on dorado basecaller, see their [
 ```txt
 --basecall_demux

-<Type: Boolean. "True", "False". Whether you want the data to be demultiplexed setting it to "True" will perform demultiplexing. Default: false>
+<Type: Boolean. "true", "false". Whether you want the data to be demultiplexed setting it to "true" will perform demultiplexing. Default: false>
 ```

 ```txt
 --trimmed_barcodes

-<Type: Boolean. "True", "False". Only relevant is --demux is set to "True". if set to "True" barcodes will be trimmed during demultiplexing and will not be present in output "fastq" files. Default: "False">
+<Type: Boolean. "true", "false". Only relevant is --demux is set to "true". if set to "true" barcodes will be trimmed during demultiplexing and will not be present in output "fastq" files. Default: true>
 ```

 ```txt
@@ -226,7 +226,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [
 ```txt
 --is_barcoded

-<Type: Boolean. Only applies if performing "step_2_from_minknow", this parameter will be ignore for "step_2_from_step_1". If is_barcoded is set to "True" the files will be grouped by barcode, otherwise all files will be grouped by sequencing run regardless of barcode. Default: True>
+<Type: Boolean. Only applies if performing "step_2_from_minknow", this parameter will be ignore for "step_2_from_step_1". If is_barcoded is set to "True" the files will be grouped by barcode, otherwise all files will be grouped by sequencing run regardless of barcode. Default: true>
 ```

 ### Step 3: Methylation Calling and MultiQC
@@ -310,10 +310,8 @@ The following examples assume your current directory is the root directory of th
            --gpu_devices "all" \
            --basecall_mods "5mC_5hmC" \
            --qscore_thresh 9 \
-            --basecall_config "False" \
            --basecall_trim "none" \
            --basecall_compute "gpu" \
-            --basecall_demux "False" \
            --queue_size 1 \
            --out_dir "$OUTPUT_DIR_NAME"
    ```
@@ -335,7 +333,7 @@ The following examples assume your current directory is the root directory of th
    nextflow ./src/main.nf \
              --steps_2_and_3_input_directory "./results/$OUTPUT_DIR_NAME/" \
              --min_mapped_reads_thresh 500 \
-              --is_barcoded "True" \
+              --is_barcoded \
              --qscore_thresh 9 \
              --mapq 10 \
              --step "2_from_step_1"

--- a/src/main.nf
+++ b/src/main.nf
@@ -56,16 +56,16 @@ workflow {
    samtools_threads = Channel.value(params.samtools_threads)
    // step conditionals
    if (params.step == 1) {
-        if (params.prefix == "None") {
+        if (params.prefix == null) {
            fast5_path = Channel.fromPath("${params.basecall_path}/**.fast5").map{file -> tuple(file.parent.toString().split("/")[-3] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
            pod5_path = Channel.fromPath("${params.basecall_path}/**.pod5").map{file -> tuple(file.parent.toString().split("/")[-3] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
        } else {
            fast5_path = Channel.fromPath("${params.basecall_path}/**.fast5").map{file -> tuple("${params.prefix}_" + file.parent.toString().split("/")[-2] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
            pod5_path = Channel.fromPath("${params.basecall_path}/**.pod5").map{file -> tuple("${params.prefix}_" +  file.parent.toString().split("/")[-2] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
        }
-        basecall_speed = Channel.value(params.basecall_speed)
-        basecall_mods = Channel.value(params.basecall_mods)
-        basecall_config = Channel.value(params.basecall_config)
+        basecall_speed = params.basecall_speed
+        basecall_mods = params.basecall_mods
+        basecall_config = params.basecall_config
        basecall_trim = Channel.value(params.basecall_trim)
        qscore_thresh = Channel.value(params.qscore_thresh)
        barcoding_kit = Channel.value(params.barcoding_kit)

--- a/src/modules/basecall.nf
+++ b/src/modules/basecall.nf
@@ -24,9 +24,7 @@ process BASECALL {

    input:
        tuple val(id), path(pod5_dir)
-        val basecall_speed
-        val basecall_mods
-        val basecall_config
+        val basecall_arg
        val basecall_trim
        val qscore_thresh
        val barcoding_kit
@@ -42,30 +40,12 @@ process BASECALL {
    script:
        """
        echo "Basecalling started for: ${id}"
-        if [[ "${basecall_config}" == "None" ]]; then
-            if [[ "${basecall_mods}" == "None" ]]; then
-                dorado basecaller "${basecall_speed}" . \
-                ${barcoding_kit != "None" ? "--kit-name ${barcoding_kit}" : ""} \
-                --trim "${basecall_trim}" \
-                --min-qscore "${qscore_thresh}" \
-                --reference "${reference_file}" \
-                --device "cuda:${gpu_devices}" > "${id}.bam" 
-            else
-                dorado basecaller "${basecall_speed},${basecall_mods}" . \
-                ${barcoding_kit != "None" ? "--kit-name ${barcoding_kit}" : ""} \
-                --trim "${basecall_trim}" \
-                --min-qscore "${qscore_thresh}" \
-                --reference "${reference_file}" \
-                --device "cuda:${gpu_devices}" > "${id}.bam"
-            fi
-        else
-                dorado basecaller "${basecall_config}" . \
-                ${barcoding_kit != "None" ? "--kit-name ${barcoding_kit}" : ""} \
-                --trim "${basecall_trim}" \
-                --min-qscore "${qscore_thresh}" \
-                --reference "${reference_file}" \
-                --device "cuda:${gpu_devices}" > "${id}.bam"
-        fi
+        dorado basecaller "${basecall_arg}" . \
+        ${barcoding_kit != null ? "--kit-name ${barcoding_kit}" : ""} \
+        --trim "${basecall_trim}" \
+        --min-qscore "${qscore_thresh}" \
+        --reference "${reference_file}" \
+        --device "cuda:${gpu_devices}" > "${id}.bam" 

        echo "Basecalling completed, sorting bams..."
        samtools sort -@ ${samtools_threads} "${id}.bam" -o "${id}_sorted.bam"
@@ -73,7 +53,7 @@ process BASECALL {
        mv "${id}_sorted.bam" "${id}.bam"

        echo "Bams sorted, demultiplexing..."
-        if [[ "${trimmed_barcodes}" == "True" ]]; then
+        if [[ ${trimmed_barcodes} ]]; then
            echo "Demultiplexing with barcode trimming..."
            dorado demux --output-dir "./demux_data/" --no-classify "${id}.bam"
        else

--- a/src/modules/convert_input_from_minknow.nf
+++ b/src/modules/convert_input_from_minknow.nf
@@ -58,7 +58,7 @@ process CONVERT_INPUT_FROM_MINKNOW_NOT_BARCODED {
    script:
        """
        # Define the input directory path
-        input_dir="${input.toString()}"
+        input_dir="${input}"
        # Check if the input directory exists
        if [ -d "\${input_dir}" ]; then
            echo "Input directory exists."

--- a/src/nextflow.config
+++ b/src/nextflow.config
@@ -8,9 +8,9 @@ params {
    // Project name (used to identify which project you're working on)
    project_name = "default"
    // Input reference fasta file
-    reference_file = "None" 
+    reference_file = null
    // Step of pipeline to execute
-    step = "None"
+    step = null 
    // Output directory for pipeline results
    out_dir = "results_${params.project_name}/" 
    // directory of basecalling data
@@ -23,24 +23,24 @@ params {
    basecall_speed = "sup@latest"
    // Desired basecaller modifications (4mC_5mC, 5mCG_5hmCG, 5mC_5hmC, 6mA). Can't use more than one modification per nucleotide.
    basecall_mods = "5mC_5hmC"
-    // Kit name (kit used to barcode the samples (e.g. SQK-RBK114-24); Use "None" to skip --kit-name in basecalling)
+    // Kit name (kit used to barcode the samples (e.g. SQK-RBK114-24); Use null to skip --kit-name in basecalling)
    barcoding_kit = "SQK-RBK114-24"
    // Threshold for mapped reasds
    min_mapped_reads_thresh = 500
    // Desired basecall model version as a path (e.g. ./models/dna_r10.4.1_e8.2_400bps_sup@v5.2.0)
-    basecall_config = "None"
+    basecall_config = null
    // Type of read trimming during basecalling ("all", "primers", "adapters", "none"); You should change to "none" if you don't want to trim in the basecalling
    basecall_trim = "all"
    // Basecalling demultiplexing
    basecall_demux = false
    // Barcodes were trimmed? (if True = demux will only separate the files; if False = demux will trim after basecalling and separate them)
-    trimmed_barcodes = "True"
+    trimmed_barcodes = true
    // Add prefix to all output files
-    prefix = "None"
+    prefix = null
    // Which GPU devices to use for basecalling?
    gpu_devices = "all"
    // Previous results
-    steps_2_and_3_input_directory = "None"
+    steps_2_and_3_input_directory = null
    // MultiQC config
    multiqc_config = "./references/multiqc_config.yaml"
    // Are the files from MinKNOW barcoded or not 

--- a/src/sub_workflows/BASECALLING.nf
+++ b/src/sub_workflows/BASECALLING.nf
@@ -18,8 +18,18 @@ workflow BASECALLING {
        
    main:
        FAST5_to_POD5(fast5_path, pod5_threads)
-        pod5_path = FAST5_to_POD5.out.mix(pod5_path)       
-        BASECALL(pod5_path, basecall_speed, basecall_mods, basecall_config, basecall_trim, qscore_thresh, barcoding_kit, trimmed_barcodes, gpu_devices, reference_file, samtools_threads)
+        pod5_path = FAST5_to_POD5.out.mix(pod5_path)
+        // Saves model, modifications and speed on a separate variable for the basecall
+        basecall_arg = null
+        if(basecall_config != null) {
+            basecall_arg = basecall_config
+        } else if(basecall_mods != null) {
+            basecall_arg = "${basecall_speed},${basecall_mods}"
+        } else {
+            basecall_arg = basecall_speed
+        }
+        basecall_arg = Channel.value(basecall_arg)
+        BASECALL(pod5_path, basecall_arg, basecall_trim, qscore_thresh, barcoding_kit, trimmed_barcodes, gpu_devices, reference_file, samtools_threads)
        bams = BASECALL.out.bam.toSortedList { a, b -> a[0] <=> b[0] }.flatten().buffer(size: 2)
        txts = BASECALL.out.txt.toSortedList { a, b -> a.baseName <=> b.baseName }.flatten()