WIP: GRIN2: Implementing dynamic y-max cap for manhattan plot #4059
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
WIP: Using a fixed bin histogram approach to bin points and see if we need to raise the y max cap (
calculate_dynamic_y_cap). We first evaluate if the y max of the data is less than or equal to the default cap and don't apply any capping if this is the case. We then evaluate if the number of points above the default cap are greater than the threshold number (default 5). If the number of points above the default cap do not exceed the threshold number the default cap value (40) is used. If the number of points exceed the threshold number we dynamically generate a new cap via the fixed histogram bins. These bins are of width 10 and our number of bins are the number we need to go from the default cap value of 40 to our hard cap value of 200. We start from the lowest bucket and accumulate points as we walk up the bins until we encounter a bucket where the number of points exceeds the threshold cap. We cap right before this bin. We have a hard cap (default value of 200) that regardless of if the data distribution says we should have a higher cap we cap the data at this cap. This means we can often have cases where the number of points at the top of the plot exceed our threshold value of 5. We send hard cap and maxCappedPoints from client to rust. Default cap comes frommanhattan.js'sMANHATTAN_LOG_QVALUE_CUTOFFconstant. We introduce the jitter and the additional golden/yellow jitter box as even when no hard cap is applied there are still lines of samples and right now this is only with thousands of samples. Thinking long term I imagine the GWAS will certainly need the jitter when operating on millions of dots. Rust will need to be recompiledCloses
GRIN2 roadmap number 4
To test
Recompile rust. Go to ASH and run with all lesion data types checked. Also test with no filters on all ASHOP samples. You should see capping applied for both. Finally, test with tdbtest. For this you should see no capping applied
Checklist
Check each task that has been performed or verified to be not applicable.