Skip to content

Conversation

@yinshiyi
Copy link
Contributor

It was unclear for me if the unknown group contain the artefact counts (theoretically it does).
9 is the prev format, and 2 artefact. But for user is unclear is the 2 artefact are counted within the 9 unknown.

Previous json format example
Screenshot 2025-12-22 at 11 58 30 AM

New json format, clearly shows the breakdown for user.
Screenshot 2025-12-22 at 11 54 00 AM

@yinshiyi yinshiyi marked this pull request as ready for review December 23, 2025 21:26
@jakob-schuster
Copy link
Collaborator

Thanks Shiyi! Been thinking about it for a few days, and chatted with @QGouil. Here's some thoughts, keen to hear what you think as well.

In the current format, the fields artefactStats and strandStats each represent independent, separate views breaking down the 100 total reads. If a user has intuited that, then they might work out that the 9 unknowns include the 2 artefacts, since TSO-TSO and RTP-RTP artefacts have no clear strandedness and they must be included in one of the strandStats categories for the counts to add up. If there were a category of artefacts we could still restrand, we would, and so we would count them both as artefacts and as stranded reads. I really agree though with your observation that it's unintuitive and not immediately obvious to the user, I've been irked by it for a while too.

My main issue with this new proposed format is that it only makes sense when, in the config json, 'report-artefacts': true. When it is set to false, what should happen? Either;

  • The artefact field is removed and only the no artefact field is retained, as in {"stats": {"artefactBreakdown": {"no artefact": {"+": 49, "-": 42, "?": 9}}, "totalReads": 100}. This may be misinformative since we didn't actually search for artefacts and yet are claiming there are none, and also it looks like a confusing structure to show to a user who turned off artefact detection.
  • The whole structure of the output changes to something more minimal like {"stats": { "+": 49, "-": 42, "?": 7, "totalReads": 100}}. But I don't think the structure of the output json should change based on input parameters, only that fields should be added or removed as appropriate. If a user is feeding this output json into another script for parsing, it seems problematic that changing Restrander's input config should break their json parsing.

One additional benefit of the current approach is if we were ever to add other stats categories, e.g. readLengthStats, they would just be added as an additional field within stats as another independent view on totalReads. The proposed format, whereby the artefact stats are on top and the strand stats are hierarchically nested under the no artefact category, would require a redesign when adding new features like this. Not sure if this will ever come up but worth noting.

Still, I agree it should be made clearer whether artefacts are included in ?. Maybe if we added total as a (redundant but harmless) field to each category, that would make it obvious that they are each separate views on the total reads, and make it a little easier for the user to then intuit that artefacts are included in ?. And/or make a note on the README that artefacts are included in the ? category, so it's at least written down somewhere. Still not sure that that's a perfect solve, and i'm open to any other ideas.

{
	"stats": {
		"artefactStats": {
			"RTP-RTP": 1,
			"TSO-TSO": 1,
			"no artefact": 98,
			"total": 100
		},
		"strandStats": {
			"+": 49,
			"-": 42,
			"?": 9,
			"total": 100
		},
		"totalReads": 100
	}
}

@yinshiyi
Copy link
Contributor Author

yinshiyi commented Jan 2, 2026

Great points, especially on 1) future feature and 2) config independent stable json.
I agree.
I would like to suggest we explicit express clarification in README similar to the following:

Only the ? category can contain artefacts such as RTP–RTP and TSO–TSO reads. The + and − categories never include these artefacts.
Importantly, the ? category is not limited to artefacts: a read may be classified as ? for reasons other than being an artefact, for example when there is insufficient information to confidently determine its strand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants