-
Notifications
You must be signed in to change notification settings - Fork 4
Development #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development #44
Conversation
Madc2vcf strawberry fix
Added MADC Functions
madc2vcf exception added: presence of N
Added thinSNP function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces significant new functionality to the BIGr package including MADC file filtering, genomic relationship matrix generation, and SNP thinning capabilities. It also updates package metadata, dependencies, and includes various bug fixes.
- Added
filterMADCfunction for quality control filtering of MADC files - Added
madc2gmatfunction to convert MADC files to additive genomic relationship matrices - Added
thinSNPfunction for SNP pruning based on physical distance
Reviewed Changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testthat/test-thinSNP.R | Test coverage for new thinSNP function |
| tests/testthat/test-madc2vcf_targets.R | Enhanced tests with bottom strand marker validation |
| tests/testthat/test-madc2gmat.R | Test coverage for new madc2gmat function |
| tests/testthat/test-filterMADC.R | Test coverage for new filterMADC function |
| R/thinSNP.R | New SNP thinning function implementation |
| R/madc2gmat.R | New genomic relationship matrix generation function |
| R/filterMADC.R | New MADC file filtering function |
| R/breedtools_functions.R | Exported previously internal functions |
| R/updog2vcf.R | Added RefAlt parameter for reference/alternate allele specification |
| R/madc2vcf_targets.R | Fixed bottom strand marker REF/ALT handling |
| R/madc2vcf_all.R | Fixed column name preservation and chromosome parsing |
| Documentation files | Updated documentation for new and modified functions |
| Package metadata | Version bump, maintainer email change, dependency updates |
Comments suppressed due to low confidence (2)
R/filterMADC.R:1
- The condition checks
alt_basebut then uses the same condition to filterref_baseandpos_ref_idx. This should filter based on the positions wherealt_basecontains invalid nucleotides.
#' Filter MADC Files
R/filterMADC.R:1
- The condition checks
alt_basebut then uses the same condition to filterref_baseandpos_ref_idx. This should filter based on the positions wherealt_basecontains invalid nucleotides.
#' Filter MADC Files
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if(isBotLoci) one_tag$AlleleSequence <- sapply(one_tag$AlleleSequence, function(x) as.character(reverseComplement(DNAString(x)))) | ||
|
|
||
| chr <- sapply(strsplit(cloneID, "_"), function(x) x[-length(x)]) | ||
| if(length(chr > 1)) chr <- paste(chr, collapse = "_") |
Copilot
AI
Sep 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison chr > 1 should be length(chr) > 1 to check if the chromosome vector has more than one element.
| if(length(chr > 1)) chr <- paste(chr, collapse = "_") | |
| if(length(chr) > 1) chr <- paste(chr, collapse = "_") |
|
|
||
| test_that("Thinning SNPs",{ | ||
|
|
||
| ##' # Create sample SNP data |
Copilot
AI
Sep 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment uses roxygen2 syntax ##' which is not appropriate in test files. Use regular R comments # instead.
| ##' # Create sample SNP data | |
| # Create sample SNP data |
| #Calculate the denominator | ||
| #Note: need to use 1/ploidy for ratios, where ploidy should be used for dosages. | ||
| #This is because the ratios are from 0-1, which is what you get when dosage/ploidy, whereas dosages are from 0 to ploidy | ||
| denominator <- (1/as.numeric(ploidy))*sum(p * q, na.rm = TRUE) |
Copilot
AI
Sep 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The magic number calculation (1/as.numeric(ploidy)) should be clarified or extracted to a named variable. The comment explains the logic but the implementation could be more explicit.
| denominator <- (1/as.numeric(ploidy))*sum(p * q, na.rm = TRUE) | |
| ploidy_ratio <- 1 / as.numeric(ploidy) # Scaling factor for allele ratios based on ploidy | |
| denominator <- ploidy_ratio * sum(p * q, na.rm = TRUE) |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #44 +/- ##
==========================================
- Coverage 84.13% 83.35% -0.78%
==========================================
Files 16 19 +3
Lines 1147 1358 +211
==========================================
+ Hits 965 1132 +167
- Misses 182 226 +44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This pull request introduces several new features and improvements to the BIGr package, including new functions for filtering and processing MADC files, generating genomic relationship matrices, and enhancements to existing functions. It also updates package metadata and dependencies. The most important changes are summarized below:
New Functionality
filterMADCfunction to filter and process MADC files with various quality control options. This function supports filtering based on mean read depth, number of microhaplotypes per locus, and other criteria, and can optionally save filtered results to a file.madc2gmatfunction to convert MADC files into additive genomic relationship matrices using the VanRaden (2008) method, supporting both "unique" and "collapsed" processing modes.thinSNPfunction (exported), and new exports forallele_freq_poly,solve_composition_poly, andmadc2gmatto the package namespace. [1] [2] [3]Improvements and Bug Fixes
madc2vcf_allto read MADC files without altering column names, preserving original headers.NEWS.md.Package Metadata and Dependency Updates
rlangandstringr. [1] [2] [3]NEWS.mdwith new features, bug fixes, and version history.