Development #34

lnblum · 2025-08-19T14:33:23Z

No description provided.

… a mac (once tasmanian is updated with an installable pandas version)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tiny readme changes

…ument for gc bias - switching back to picard from picard-slim

…complete work

rest of genome index cleanup

…BED6 format and simpler downstream logic

we are more explicit on using read1 not read{1,2}.

…ze checking, UNTESTED

* placeholder_r2 should be created before enough_reads * ensure emseq after complition of create_placeholder * placeholder.r2 before emseq workflow

… a mac (once tasmanian is updated with an installable pandas version) modified: main.nf modified: modules/alignment.nf modified: modules/compute_statistics.nf modified: modules/methylation.nf modified: run_test.sh new file: test_data/emseq_test_regions.bed new file: test_data/reference.fa.fai

…ument for gc bias - switching back to picard from picard-slim modified: modules/alignment.nf modified: modules/compute_statistics.nf modified: modules/methylation.nf

rest of genome index cleanup modified: main.nf new file: modules/bed_processing.nf modified: modules/methylation.nf

…add to snapshot since it might be easy to mess up these steps with future edits

mattsoup

That's a lot! Looks good to me though, just some mostly minor comments.

.DS_Store

conf/base.config

fastq_to_ubam.nf

main.nf

mattsoup · 2025-09-09T15:12:59Z

modules/methyldackel_mbias.nf

@@ -1,26 +1,26 @@
-
-
 process methylDackel_mbias {


I think you've done some performance testing with this, but if it ends up being problematic with larger datasets this may be low-ish hanging fruit in the future.

mattsoup · 2025-09-09T15:15:10Z

modules/merge_and_mark_duplicates.nf

+
+    optical_distance=\$(echo \${inst_name} | awk '{if (\$1~/^M0|^NS|^NB/) {print 100} else {print 2500}}')
+
+    samtools merge --threads ${task.cpus} ${library}.bam ${bams}


should this be in a separate process? maybe it doesn't matter.

mattsoup · 2025-09-09T15:44:51Z

modules/insert_size_metrics.nf

+    picard -Xmx${task.memory.toGiga()}g CollectInsertSizeMetrics \
+        --INCLUDE_DUPLICATES --VALIDATION_STRINGENCY SILENT \
+        -I "\$good_mapq_pipe" -O ${library}.good_mapq.insert_size_metrics.txt \
+        --MINIMUM_PCT 0 -H /dev/null &
+    picard_good_mapq_pid=\$!
+
+    picard -Xmx${task.memory.toGiga()}g CollectInsertSizeMetrics \
+        --INCLUDE_DUPLICATES --VALIDATION_STRINGENCY SILENT \
+        -I "\$bad_mapq_pipe" -O ${library}.bad_mapq.insert_size_metrics.txt \
+        --MINIMUM_PCT 0 -H /dev/null &
+    picard_bad_mapq_pid=\$!
+
+    wait \$samtools_pid
+    wait \$picard_good_mapq_pid
+    wait \$picard_bad_mapq_pid


if we're having to run the tool multiple times and wait for its outputs to be done, does it seem like it should be split into a 'CollectInsertSize' process, and a 'CombineInsertSize' process?

mattsoup · 2025-09-09T15:45:56Z

modules/insert_size_metrics.nf

+                no_cols[i] = 0
+                if      (\$i ~ /insert_size/) {isize=i}
+                else if (\$i ~ /rf_count/)    {rf=i}
+                else if (\$i ~ /fr_count/)    {fr=i}
+                else if (\$i ~ /tandem/)      {tandem=i}
+                else                         {no_cols[i]++}
+            }
+            # columns that are not present still need to be printed (with 0 value)
+            for (i in no_cols) {
+                if (no_cols[i] > 0) {
+                    if      (! isize )  { isize = i}
+                    else if (! rf )     { rf = i}
+                    else if (! fr )     { fr = i}
+                    else if (! tandem ) { tandem = i}
+                }
+            }
+        }


i'm not exactly following this, but can we just cut/join the files together?

mattsoup · 2025-09-09T16:40:11Z

main.nf

+        if (params.single_end) {
+            fastq_chunks = fastp.out.trimmed_fastq
+            .flatMap { library, fq_files ->                 
+                def fq_list = fq_files instanceof List ? fq_files : [fq_files]


i'd like to play around with this a bit to see if we can't find something more elegant, but in the meantime, if it works then it's fine.

yeah it took me a while to get here but I didn't try to pare it down too much.

lnblum · 2025-09-17T17:27:54Z

modules/group_bed_intersections.nf

+    output_file="\${intersect_basename}_intersections.tsv"
+    summary_file="\${intersect_basename}_positional_summary.tsv"
+
+    echo -e "methylkit_file\\tchr\\tstart\\tend\\tcontext\\tmethylation\\ttarget_locus\\ttarget_name" > \${output_file}


@bwlang does splitting by chromosome for the position/length summary align with your current plan for this tool?

we need to break out the control contigs for detailed reporting (intersections.tsv)... but not the human contigs i think. i think there is somewhere else we do something similar in this code. I'm planning to add the control bed region s to the bed file so we'll include those in both - but then we need to refine this output.

bwlang and others added 30 commits June 15, 2025 18:06

updated readme, gitignore, config cleanup

bd5a7d1

adds bed intersection feature, updated test suite that can now run on…

c29ace8

… a mac (once tasmanian is updated with an installable pandas version)

Update modules/alignment.nf

6e5b3f4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

docs in response to copilot review

b013bee

adds artifacts to gh actions for easier debugging

c368195

adds timeouts

557758f

Update README.md

839fde3

tiny readme changes

we don't use the chart

3c706e2

Merge branch 'master' into bed_intersect

5ad3c70

simpler handling of genome and index files, --CHART is a required arg…

b74cfb3

…ument for gc bias - switching back to picard from picard-slim

narrows artifact generation to just emseq output, a bit more time to …

2e95cfb

…complete work

moves bed processing to its own file

4254927

rest of genome index cleanup

enhance BED processing module with validation and standardization to …

91531e6

…BED6 format and simpler downstream logic

Merge branch 'master' into bed_intersect

350975a

Update README.md

3bd00a2

we are more explicit on using read1 not read{1,2}.

Spelling fix, wording update

8c504b8

removing email from channels, nextflow email sending, simpler file si…

43cb228

…ze checking, UNTESTED

accept fq.gz

cc0991d

syntax

de9fa0b

syntax

e286acd

syntax

6a062c8

syntax

7b5bdb2

syntax

61d5a4e

syntax

68c612d

original bam_fq does not provide num_reads_used

3356388

placeholder_r2 should be created before enough_reads (#33)

18365ef

* placeholder_r2 should be created before enough_reads * ensure emseq after complition of create_placeholder * placeholder.r2 before emseq workflow

simpler implementation of placeholder for read2

0339961

simpler handling of genome and index files, --CHART is a required arg…

85d4e4e

…ument for gc bias - switching back to picard from picard-slim modified: modules/alignment.nf modified: modules/compute_statistics.nf modified: modules/methylation.nf

moves bed processing to its own file

872698a

rest of genome index cleanup modified: main.nf new file: modules/bed_processing.nf modified: modules/methylation.nf

lnblum added 3 commits September 5, 2025 09:19

split positional summaries by chr so we can distinguish controls and …

69c7f52

…add to snapshot since it might be easy to mess up these steps with future edits

try to make this faster

6239412

add support for single end

9c6e725

lnblum requested a review from mattsoup September 8, 2025 20:40

mattsoup reviewed Sep 9, 2025

View reviewed changes

lnblum added 17 commits September 9, 2025 14:08

use process_single whenenver we want 1 cpu

da9cee3

explicit param for fai

3dd7628

reduce conditionals

6e5317e

doesn't appear to be needed

ad3688d

Merge branch 'master' into development

db61264

making tests more specific, they reproduce locally

23c6e3a

this should be faster

15a6c23

resources for more data

92d7d38

for testing

ddb20b9

add target_bed file for t2t

95df7a6

taking header from the first file makes it a little more dry

8ffdba8

add workflow name modifier

2ecbd73

doesn't need to be a function if we only use it once

cfc06b9

match process name to tool name

d078391

more specific output name

cf0e629

make output file paths more specific where possible

d2fb48b

Merge branch 'master' into development

0d65b0c

lnblum commented Sep 17, 2025

View reviewed changes

lnblum added 6 commits September 22, 2025 16:10

Merge branch 'master' into development

6edcde7

Merge branch 'master' into development

9d71547

update readme

9c284d1

update readme

9d37d08

make compatible for local mac

015b669

development workflow uses submodule in seq shepherd

3316fba

lnblum merged commit 41159c3 into master Oct 9, 2025
1 check passed


		optical_distance=\$(echo \${inst_name} \| awk '{if (\$1~/^M0\|^NS\|^NB/) {print 100} else {print 2500}}')

		samtools merge --threads ${task.cpus} ${library}.bam ${bams}

Development #34

Development #34

Uh oh!

Conversation

lnblum commented Aug 19, 2025

Uh oh!

mattsoup left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants