Skip to content

Conversation

@lnblum
Copy link
Contributor

@lnblum lnblum commented Aug 19, 2025

No description provided.

bwlang and others added 30 commits June 15, 2025 18:06
… a mac (once tasmanian is updated with an installable pandas version)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
tiny readme changes
…ument for gc bias - switching back to picard from picard-slim
rest of genome index cleanup
we are more explicit on using read1 not read{1,2}.
* placeholder_r2 should be created before enough_reads

* ensure emseq after complition of create_placeholder

* placeholder.r2 before emseq workflow
… a mac (once tasmanian is updated with an installable pandas version)

	modified:   main.nf
	modified:   modules/alignment.nf
	modified:   modules/compute_statistics.nf
	modified:   modules/methylation.nf
	modified:   run_test.sh
	new file:   test_data/emseq_test_regions.bed
	new file:   test_data/reference.fa.fai
…ument for gc bias - switching back to picard from picard-slim

	modified:   modules/alignment.nf
	modified:   modules/compute_statistics.nf
	modified:   modules/methylation.nf
rest of genome index cleanup

	modified:   main.nf
	new file:   modules/bed_processing.nf
	modified:   modules/methylation.nf
@lnblum lnblum requested a review from mattsoup September 8, 2025 20:40
Copy link
Contributor

@mattsoup mattsoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot! Looks good to me though, just some mostly minor comments.

@@ -1,26 +1,26 @@


process methylDackel_mbias {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've done some performance testing with this, but if it ends up being problematic with larger datasets this may be low-ish hanging fruit in the future.

optical_distance=\$(echo \${inst_name} | awk '{if (\$1~/^M0|^NS|^NB/) {print 100} else {print 2500}}')
samtools merge --threads ${task.cpus} ${library}.bam ${bams}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in a separate process? maybe it doesn't matter.

Comment on lines +29 to +43
picard -Xmx${task.memory.toGiga()}g CollectInsertSizeMetrics \
--INCLUDE_DUPLICATES --VALIDATION_STRINGENCY SILENT \
-I "\$good_mapq_pipe" -O ${library}.good_mapq.insert_size_metrics.txt \
--MINIMUM_PCT 0 -H /dev/null &
picard_good_mapq_pid=\$!
picard -Xmx${task.memory.toGiga()}g CollectInsertSizeMetrics \
--INCLUDE_DUPLICATES --VALIDATION_STRINGENCY SILENT \
-I "\$bad_mapq_pipe" -O ${library}.bad_mapq.insert_size_metrics.txt \
--MINIMUM_PCT 0 -H /dev/null &
picard_bad_mapq_pid=\$!
wait \$samtools_pid
wait \$picard_good_mapq_pid
wait \$picard_bad_mapq_pid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're having to run the tool multiple times and wait for its outputs to be done, does it seem like it should be split into a 'CollectInsertSize' process, and a 'CombineInsertSize' process?

Comment on lines +57 to +73
no_cols[i] = 0
if (\$i ~ /insert_size/) {isize=i}
else if (\$i ~ /rf_count/) {rf=i}
else if (\$i ~ /fr_count/) {fr=i}
else if (\$i ~ /tandem/) {tandem=i}
else {no_cols[i]++}
}
# columns that are not present still need to be printed (with 0 value)
for (i in no_cols) {
if (no_cols[i] > 0) {
if (! isize ) { isize = i}
else if (! rf ) { rf = i}
else if (! fr ) { fr = i}
else if (! tandem ) { tandem = i}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not exactly following this, but can we just cut/join the files together?

Comment on lines +73 to +76
if (params.single_end) {
fastq_chunks = fastp.out.trimmed_fastq
.flatMap { library, fq_files ->
def fq_list = fq_files instanceof List ? fq_files : [fq_files]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to play around with this a bit to see if we can't find something more elegant, but in the meantime, if it works then it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it took me a while to get here but I didn't try to pare it down too much.

output_file="\${intersect_basename}_intersections.tsv"
summary_file="\${intersect_basename}_positional_summary.tsv"
echo -e "methylkit_file\\tchr\\tstart\\tend\\tcontext\\tmethylation\\ttarget_locus\\ttarget_name" > \${output_file}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwlang does splitting by chromosome for the position/length summary align with your current plan for this tool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to break out the control contigs for detailed reporting (intersections.tsv)... but not the human contigs i think. i think there is somewhere else we do something similar in this code. I'm planning to add the control bed region s to the bed file so we'll include those in both - but then we need to refine this output.

@lnblum lnblum merged commit 41159c3 into master Oct 9, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants