Skip to content

Overestimation of number of reads from nanopore data (flagstat) #31

@rebeelouise

Description

@rebeelouise

Same issue as mentioned on the minimap2 tool: lh3/minimap2#236 (comment)

For example nanopore reads aligned to the host transcriptome the flagstat output is:

5953480 + 0 in total (QC-passed reads + QC-failed reads)
2961480 + 0 secondary
22696 + 0 supplementary
0 + 0 duplicates
4195469 + 0 mapped (70.47% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

However, the number of actual reads is: 2969304 - the read length of these are about 750nt. I am assuming this over reporting is due to the presence of long reads, is there a more appropriate way of calculating the number of reads and the % of reads mapped in an alignment file? Can the % of reads mapped still be a trusted value?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions