Skip to content
Commits on Source (7)
......@@ -87,7 +87,7 @@ NextFlow pipeline used by the Developmental Cognitive Neuroscience Lab (DCNL) to
## Pipeline parameters
The default values for all parameters are set in `src/nexflow.config`. Please notice that it's required to overwrite a few because they depend on the procedure you need to run (e.g., `--step`). The specifics of each are described next.
The default values for all parameters are set in `src/nextflow.config`. Please notice that it's required to overwrite a few because they depend on the procedure you need to run (e.g., `--step`). The specifics of each are described next.
### Global options
......@@ -100,7 +100,7 @@ The default values for all parameters are set in `src/nexflow.config`. Please no
```txt
--step
<Type: String. Options are "1", "2_from_step_1", "2_from_minknow", "3". Select 1 for basecalling. Select "2_from_step_1" for alignment filtering and quality control continuing after executing step 1 of this pipeline. Select "2_from_minknow" for alignment filtering and quality control from input basecalled using MinKNOW. Select "3" for methylation call pre-processing with modkit and generating a thorough multiQC report for sequencing stats. This parameter needs to be set for the pipeline to run. Default: "None">
<Type: String. Options are "1", "2_from_step_1", "2_from_minknow", "3". Select 1 for basecalling. Select "2_from_step_1" for alignment filtering and quality control continuing after executing step 1 of this pipeline. Select "2_from_minknow" for alignment filtering and quality control from input basecalled using MinKNOW. Select "3" for methylation call pre-processing with modkit and generating a thorough multiQC report for sequencing stats. This parameter needs to be set for the pipeline to run. Default: null>
```
```txt
......@@ -112,13 +112,13 @@ The default values for all parameters are set in `src/nexflow.config`. Please no
```txt
--steps_2_and_3_input_directory
<Type: Path. When performing a step other than 1, this parameter must be set to the output path of the step 1. Example: "./results/<out_dir>". Default = "None">
<Type: Path. When performing a step other than 1, this parameter must be set to the output path of the step 1. Example: "./results/<out_dir>". Default = null>
```
```txt
--prefix
<Type: String. Adds a prefix to the beggining of your filenames, good when wanting to keep track of batches of data. Example: "Batch_1". Default: "None">
<Type: String. Adds a prefix to the beggining of your filenames, good when wanting to keep track of batches of data. Example: "Batch_1". Default: null>
```
```txt
......@@ -158,7 +158,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [
```txt
--basecall_config
<Type: String Configuration name for basecalling setting. This is not necessary since dorado is able to automatically determine the appropriate configuration. Default: "None">
<Type: String Configuration name for basecalling setting. This is not necessary since dorado is able to automatically determine the appropriate configuration. Default: null>
```
```txt
......@@ -170,7 +170,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [
```txt
--barcoding_kit
<Type: String. Kit name used to barcode the samples. Use "None" to skip --kit-name in basecalling. Default: "SQK-RBK114-24">
<Type: String. Kit name used to barcode the samples. Use null to skip --kit-name in basecalling. Default: "SQK-RBK114-24">
```
```txt
......@@ -182,13 +182,13 @@ Many of the parameters for this step are based on dorado basecaller, see their [
```txt
--basecall_demux
<Type: Boolean. "True", "False". Whether you want the data to be demultiplexed setting it to "True" will perform demultiplexing. Default: false>
<Type: Boolean. "true", "false". Whether you want the data to be demultiplexed setting it to "true" will perform demultiplexing. Default: false>
```
```txt
--trimmed_barcodes
<Type: Boolean. "True", "False". Only relevant is --demux is set to "True". if set to "True" barcodes will be trimmed during demultiplexing and will not be present in output "fastq" files. Default: "False">
<Type: Boolean. "true", "false". Only relevant is --demux is set to "true". if set to "true" barcodes will be trimmed during demultiplexing and will not be present in output "fastq" files. Default: true>
```
```txt
......@@ -226,7 +226,7 @@ Many of the parameters for this step are based on dorado basecaller, see their [
```txt
--is_barcoded
<Type: Boolean. Only applies if performing "step_2_from_minknow", this parameter will be ignore for "step_2_from_step_1". If is_barcoded is set to "True" the files will be grouped by barcode, otherwise all files will be grouped by sequencing run regardless of barcode. Default: True>
<Type: Boolean. Only applies if performing "step_2_from_minknow", this parameter will be ignore for "step_2_from_step_1". If is_barcoded is set to "True" the files will be grouped by barcode, otherwise all files will be grouped by sequencing run regardless of barcode. Default: true>
```
### Step 3: Methylation Calling and MultiQC
......@@ -310,10 +310,8 @@ The following examples assume your current directory is the root directory of th
--gpu_devices "all" \
--basecall_mods "5mC_5hmC" \
--qscore_thresh 9 \
--basecall_config "False" \
--basecall_trim "none" \
--basecall_compute "gpu" \
--basecall_demux "False" \
--queue_size 1 \
--out_dir "$OUTPUT_DIR_NAME"
```
......@@ -335,7 +333,7 @@ The following examples assume your current directory is the root directory of th
nextflow ./src/main.nf \
--steps_2_and_3_input_directory "./results/$OUTPUT_DIR_NAME/" \
--min_mapped_reads_thresh 500 \
--is_barcoded "True" \
--is_barcoded \
--qscore_thresh 9 \
--mapq 10 \
--step "2_from_step_1"
......
......@@ -56,16 +56,16 @@ workflow {
samtools_threads = Channel.value(params.samtools_threads)
// step conditionals
if (params.step == 1) {
if (params.prefix == "None") {
if (params.prefix == null) {
fast5_path = Channel.fromPath("${params.basecall_path}/**.fast5").map{file -> tuple(file.parent.toString().split("/")[-3] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
pod5_path = Channel.fromPath("${params.basecall_path}/**.pod5").map{file -> tuple(file.parent.toString().split("/")[-3] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
} else {
fast5_path = Channel.fromPath("${params.basecall_path}/**.fast5").map{file -> tuple("${params.prefix}_" + file.parent.toString().split("/")[-2] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
pod5_path = Channel.fromPath("${params.basecall_path}/**.pod5").map{file -> tuple("${params.prefix}_" + file.parent.toString().split("/")[-2] + "_" + file.simpleName.split('_')[0] + "_" + file.simpleName.split('_')[-3..-2].join("_"), file) }.groupTuple()
}
basecall_speed = Channel.value(params.basecall_speed)
basecall_mods = Channel.value(params.basecall_mods)
basecall_config = Channel.value(params.basecall_config)
basecall_speed = params.basecall_speed
basecall_mods = params.basecall_mods
basecall_config = params.basecall_config
basecall_trim = Channel.value(params.basecall_trim)
qscore_thresh = Channel.value(params.qscore_thresh)
barcoding_kit = Channel.value(params.barcoding_kit)
......
......@@ -24,9 +24,7 @@ process BASECALL {
input:
tuple val(id), path(pod5_dir)
val basecall_speed
val basecall_mods
val basecall_config
val basecall_arg
val basecall_trim
val qscore_thresh
val barcoding_kit
......@@ -42,30 +40,12 @@ process BASECALL {
script:
"""
echo "Basecalling started for: ${id}"
if [[ "${basecall_config}" == "None" ]]; then
if [[ "${basecall_mods}" == "None" ]]; then
dorado basecaller "${basecall_speed}" . \
${barcoding_kit != "None" ? "--kit-name ${barcoding_kit}" : ""} \
--trim "${basecall_trim}" \
--min-qscore "${qscore_thresh}" \
--reference "${reference_file}" \
--device "cuda:${gpu_devices}" > "${id}.bam"
else
dorado basecaller "${basecall_speed},${basecall_mods}" . \
${barcoding_kit != "None" ? "--kit-name ${barcoding_kit}" : ""} \
--trim "${basecall_trim}" \
--min-qscore "${qscore_thresh}" \
--reference "${reference_file}" \
--device "cuda:${gpu_devices}" > "${id}.bam"
fi
else
dorado basecaller "${basecall_config}" . \
${barcoding_kit != "None" ? "--kit-name ${barcoding_kit}" : ""} \
--trim "${basecall_trim}" \
--min-qscore "${qscore_thresh}" \
--reference "${reference_file}" \
--device "cuda:${gpu_devices}" > "${id}.bam"
fi
dorado basecaller "${basecall_arg}" . \
${barcoding_kit != null ? "--kit-name ${barcoding_kit}" : ""} \
--trim "${basecall_trim}" \
--min-qscore "${qscore_thresh}" \
--reference "${reference_file}" \
--device "cuda:${gpu_devices}" > "${id}.bam"
echo "Basecalling completed, sorting bams..."
samtools sort -@ ${samtools_threads} "${id}.bam" -o "${id}_sorted.bam"
......@@ -73,7 +53,7 @@ process BASECALL {
mv "${id}_sorted.bam" "${id}.bam"
echo "Bams sorted, demultiplexing..."
if [[ "${trimmed_barcodes}" == "True" ]]; then
if [[ ${trimmed_barcodes} ]]; then
echo "Demultiplexing with barcode trimming..."
dorado demux --output-dir "./demux_data/" --no-classify "${id}.bam"
else
......
......@@ -58,7 +58,7 @@ process CONVERT_INPUT_FROM_MINKNOW_NOT_BARCODED {
script:
"""
# Define the input directory path
input_dir="${input.toString()}"
input_dir="${input}"
# Check if the input directory exists
if [ -d "\${input_dir}" ]; then
echo "Input directory exists."
......
......@@ -8,9 +8,9 @@ params {
// Project name (used to identify which project you're working on)
project_name = "default"
// Input reference fasta file
reference_file = "None"
reference_file = null
// Step of pipeline to execute
step = "None"
step = null
// Output directory for pipeline results
out_dir = "results_${params.project_name}/"
// directory of basecalling data
......@@ -23,24 +23,24 @@ params {
basecall_speed = "sup@latest"
// Desired basecaller modifications (4mC_5mC, 5mCG_5hmCG, 5mC_5hmC, 6mA). Can't use more than one modification per nucleotide.
basecall_mods = "5mC_5hmC"
// Kit name (kit used to barcode the samples (e.g. SQK-RBK114-24); Use "None" to skip --kit-name in basecalling)
// Kit name (kit used to barcode the samples (e.g. SQK-RBK114-24); Use null to skip --kit-name in basecalling)
barcoding_kit = "SQK-RBK114-24"
// Threshold for mapped reasds
min_mapped_reads_thresh = 500
// Desired basecall model version as a path (e.g. ./models/dna_r10.4.1_e8.2_400bps_sup@v5.2.0)
basecall_config = "None"
basecall_config = null
// Type of read trimming during basecalling ("all", "primers", "adapters", "none"); You should change to "none" if you don't want to trim in the basecalling
basecall_trim = "all"
// Basecalling demultiplexing
basecall_demux = false
// Barcodes were trimmed? (if True = demux will only separate the files; if False = demux will trim after basecalling and separate them)
trimmed_barcodes = "True"
trimmed_barcodes = true
// Add prefix to all output files
prefix = "None"
prefix = null
// Which GPU devices to use for basecalling?
gpu_devices = "all"
// Previous results
steps_2_and_3_input_directory = "None"
steps_2_and_3_input_directory = null
// MultiQC config
multiqc_config = "./references/multiqc_config.yaml"
// Are the files from MinKNOW barcoded or not
......
......@@ -18,8 +18,18 @@ workflow BASECALLING {
main:
FAST5_to_POD5(fast5_path, pod5_threads)
pod5_path = FAST5_to_POD5.out.mix(pod5_path)
BASECALL(pod5_path, basecall_speed, basecall_mods, basecall_config, basecall_trim, qscore_thresh, barcoding_kit, trimmed_barcodes, gpu_devices, reference_file, samtools_threads)
pod5_path = FAST5_to_POD5.out.mix(pod5_path)
// Saves model, modifications and speed on a separate variable for the basecall
basecall_arg = null
if(basecall_config != null) {
basecall_arg = basecall_config
} else if(basecall_mods != null) {
basecall_arg = "${basecall_speed},${basecall_mods}"
} else {
basecall_arg = basecall_speed
}
basecall_arg = Channel.value(basecall_arg)
BASECALL(pod5_path, basecall_arg, basecall_trim, qscore_thresh, barcoding_kit, trimmed_barcodes, gpu_devices, reference_file, samtools_threads)
bams = BASECALL.out.bam.toSortedList { a, b -> a[0] <=> b[0] }.flatten().buffer(size: 2)
txts = BASECALL.out.txt.toSortedList { a, b -> a.baseName <=> b.baseName }.flatten()
......