-
Notifications
You must be signed in to change notification settings - Fork 7
Enhancement/bam2fastq #960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed earlier today, we should consider accommodating cases in which the bam has the OQ tag, so that we can get something as close to the original fastq as possible. This can be done with gatk's RevertSam tool. We might have to first parse the input bam file to see if the OQ tag is present, before running the tool. We should consider adding OQ to our aligned bam outputs as well, which was brought up in #947
Yes. I think this can be done by first check if |
|
We need to bring this back to our radar. |
|
Currently if the input bam mapping file is named bamMapping.tsv (default tempo output) it will be overwritten by the script. If this happens the user will lose their original bamMapping file. We should add a check: if bam2fastq is enabled and bammapping filename is the default bamMapping.tsv then exit. We should force the user to rename their input bamMapping file when bam2fastq is used. |
…2fastq from bamMapping.tsv file
Solved here: |
|
Currently unpaired reads and secondary alignments may mess with Sam2fastq. Eithery they need to be filtered out with samtools view like so:
or set these to false: |
|
The issue is this samtools view may be too strict, |
|
It is a valid concern about secondary and supplementary alignments being duplicated when converting back to fastq files. My understanding is by setting
And unmapped reads will be kept by The only question that is not so clear is even we keep the original quality score in the BAM file, it's not clear that So the plan it to run test to fully test these two points, and also run fastp and compare the original fastq file and the reverted fastq files before we merge this PR. |
|
I found some options Use RevertOriginalBaseQualitiesAndAddMateCigar in picard before we run SamtoFastq. Use RevertSam in picard before we run SamtoFastq. I'll do some exploring. |
|
And so far the parameters I'm planning to use are: |
|
Since TEMPO expects files to have the following format:
and tempo/modules/subworkflow/alignment_wf.nf Line 59 in 803ad84
However running RevertSame and SamToFastq with parameters in this branch and then manually renaming the files to conform to this fastq naming convention, TEMPO runs no problem |
This might also be useful since we added #1008 and planning to just store BAMs instead of FASTQs.