-
Notifications
You must be signed in to change notification settings - Fork 8
Development #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development #34
Conversation
… a mac (once tasmanian is updated with an installable pandas version)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
tiny readme changes
…ument for gc bias - switching back to picard from picard-slim
rest of genome index cleanup
…BED6 format and simpler downstream logic
we are more explicit on using read1 not read{1,2}.
…ze checking, UNTESTED
* placeholder_r2 should be created before enough_reads * ensure emseq after complition of create_placeholder * placeholder.r2 before emseq workflow
… a mac (once tasmanian is updated with an installable pandas version) modified: main.nf modified: modules/alignment.nf modified: modules/compute_statistics.nf modified: modules/methylation.nf modified: run_test.sh new file: test_data/emseq_test_regions.bed new file: test_data/reference.fa.fai
…ument for gc bias - switching back to picard from picard-slim modified: modules/alignment.nf modified: modules/compute_statistics.nf modified: modules/methylation.nf
rest of genome index cleanup modified: main.nf new file: modules/bed_processing.nf modified: modules/methylation.nf
…add to snapshot since it might be easy to mess up these steps with future edits
mattsoup
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot! Looks good to me though, just some mostly minor comments.
| @@ -1,26 +1,26 @@ | |||
|
|
|||
|
|
|||
| process methylDackel_mbias { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've done some performance testing with this, but if it ends up being problematic with larger datasets this may be low-ish hanging fruit in the future.
| optical_distance=\$(echo \${inst_name} | awk '{if (\$1~/^M0|^NS|^NB/) {print 100} else {print 2500}}') | ||
| samtools merge --threads ${task.cpus} ${library}.bam ${bams} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be in a separate process? maybe it doesn't matter.
| picard -Xmx${task.memory.toGiga()}g CollectInsertSizeMetrics \ | ||
| --INCLUDE_DUPLICATES --VALIDATION_STRINGENCY SILENT \ | ||
| -I "\$good_mapq_pipe" -O ${library}.good_mapq.insert_size_metrics.txt \ | ||
| --MINIMUM_PCT 0 -H /dev/null & | ||
| picard_good_mapq_pid=\$! | ||
| picard -Xmx${task.memory.toGiga()}g CollectInsertSizeMetrics \ | ||
| --INCLUDE_DUPLICATES --VALIDATION_STRINGENCY SILENT \ | ||
| -I "\$bad_mapq_pipe" -O ${library}.bad_mapq.insert_size_metrics.txt \ | ||
| --MINIMUM_PCT 0 -H /dev/null & | ||
| picard_bad_mapq_pid=\$! | ||
| wait \$samtools_pid | ||
| wait \$picard_good_mapq_pid | ||
| wait \$picard_bad_mapq_pid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're having to run the tool multiple times and wait for its outputs to be done, does it seem like it should be split into a 'CollectInsertSize' process, and a 'CombineInsertSize' process?
| no_cols[i] = 0 | ||
| if (\$i ~ /insert_size/) {isize=i} | ||
| else if (\$i ~ /rf_count/) {rf=i} | ||
| else if (\$i ~ /fr_count/) {fr=i} | ||
| else if (\$i ~ /tandem/) {tandem=i} | ||
| else {no_cols[i]++} | ||
| } | ||
| # columns that are not present still need to be printed (with 0 value) | ||
| for (i in no_cols) { | ||
| if (no_cols[i] > 0) { | ||
| if (! isize ) { isize = i} | ||
| else if (! rf ) { rf = i} | ||
| else if (! fr ) { fr = i} | ||
| else if (! tandem ) { tandem = i} | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not exactly following this, but can we just cut/join the files together?
| if (params.single_end) { | ||
| fastq_chunks = fastp.out.trimmed_fastq | ||
| .flatMap { library, fq_files -> | ||
| def fq_list = fq_files instanceof List ? fq_files : [fq_files] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd like to play around with this a bit to see if we can't find something more elegant, but in the meantime, if it works then it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it took me a while to get here but I didn't try to pare it down too much.
| output_file="\${intersect_basename}_intersections.tsv" | ||
| summary_file="\${intersect_basename}_positional_summary.tsv" | ||
| echo -e "methylkit_file\\tchr\\tstart\\tend\\tcontext\\tmethylation\\ttarget_locus\\ttarget_name" > \${output_file} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwlang does splitting by chromosome for the position/length summary align with your current plan for this tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to break out the control contigs for detailed reporting (intersections.tsv)... but not the human contigs i think. i think there is somewhere else we do something similar in this code. I'm planning to add the control bed region s to the bed file so we'll include those in both - but then we need to refine this output.
No description provided.