-
Notifications
You must be signed in to change notification settings - Fork 3
Description
At present, all sequences in the reference database are used if they are among the best hits, irrespective of the resolution of their taxon. Some are assigned to a species level, others to a higher level.
This can reduce the taxonomic resolution: For example if we have 2 hits at 97% identity, where 1 reference sequence is identified to the species, but the other only to the family, the variant will be assigned to the family.
I suggest that the users should be able to set the minimum resolution of the reference sequences for each %identity.
It can be something like this
100% species
97% genus
95% family
90% order
85% class
80% phylum
I have already made a taxonomy file with an additional column that contains the resolution index:
8: species
7: genus
6 : family
5 : order
4 : class
3 : phylum
2 : kingdom
1 : superkingdom
For other levels the index is a non-integer. e.g. 7.5 for subgenus.
This simplifies greatly the selection of the reference sequences.