Skip to content

Conversation

@alex-sandercock
Copy link
Collaborator

This pull request introduces several new features and improvements to the BIGr package, including new functions for filtering and processing MADC files, generating genomic relationship matrices, and enhancements to existing functions. It also updates package metadata and dependencies. The most important changes are summarized below:

New Functionality

  • Added filterMADC function to filter and process MADC files with various quality control options. This function supports filtering based on mean read depth, number of microhaplotypes per locus, and other criteria, and can optionally save filtered results to a file.
  • Added madc2gmat function to convert MADC files into additive genomic relationship matrices using the VanRaden (2008) method, supporting both "unique" and "collapsed" processing modes.
  • Added thinSNP function (exported), and new exports for allele_freq_poly, solve_composition_poly, and madc2gmat to the package namespace. [1] [2] [3]

Improvements and Bug Fixes

  • Updated madc2vcf_all to read MADC files without altering column names, preserving original headers.
  • Various bug fixes and improvements to existing functions, as described in the updated NEWS.md.

Package Metadata and Dependency Updates

  • Updated package version to 0.6.1, changed maintainer email, and added new dependencies on rlang and stringr. [1] [2] [3]
  • Updated NEWS.md with new features, bug fixes, and version history.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces significant new functionality to the BIGr package including MADC file filtering, genomic relationship matrix generation, and SNP thinning capabilities. It also updates package metadata, dependencies, and includes various bug fixes.

  • Added filterMADC function for quality control filtering of MADC files
  • Added madc2gmat function to convert MADC files to additive genomic relationship matrices
  • Added thinSNP function for SNP pruning based on physical distance

Reviewed Changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/testthat/test-thinSNP.R Test coverage for new thinSNP function
tests/testthat/test-madc2vcf_targets.R Enhanced tests with bottom strand marker validation
tests/testthat/test-madc2gmat.R Test coverage for new madc2gmat function
tests/testthat/test-filterMADC.R Test coverage for new filterMADC function
R/thinSNP.R New SNP thinning function implementation
R/madc2gmat.R New genomic relationship matrix generation function
R/filterMADC.R New MADC file filtering function
R/breedtools_functions.R Exported previously internal functions
R/updog2vcf.R Added RefAlt parameter for reference/alternate allele specification
R/madc2vcf_targets.R Fixed bottom strand marker REF/ALT handling
R/madc2vcf_all.R Fixed column name preservation and chromosome parsing
Documentation files Updated documentation for new and modified functions
Package metadata Version bump, maintainer email change, dependency updates
Comments suppressed due to low confidence (2)

R/filterMADC.R:1

  • The condition checks alt_base but then uses the same condition to filter ref_base and pos_ref_idx. This should filter based on the positions where alt_base contains invalid nucleotides.
#' Filter MADC Files

R/filterMADC.R:1

  • The condition checks alt_base but then uses the same condition to filter ref_base and pos_ref_idx. This should filter based on the positions where alt_base contains invalid nucleotides.
#' Filter MADC Files

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

if(isBotLoci) one_tag$AlleleSequence <- sapply(one_tag$AlleleSequence, function(x) as.character(reverseComplement(DNAString(x))))

chr <- sapply(strsplit(cloneID, "_"), function(x) x[-length(x)])
if(length(chr > 1)) chr <- paste(chr, collapse = "_")
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison chr > 1 should be length(chr) > 1 to check if the chromosome vector has more than one element.

Suggested change
if(length(chr > 1)) chr <- paste(chr, collapse = "_")
if(length(chr) > 1) chr <- paste(chr, collapse = "_")

Copilot uses AI. Check for mistakes.

test_that("Thinning SNPs",{

##' # Create sample SNP data
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment uses roxygen2 syntax ##' which is not appropriate in test files. Use regular R comments # instead.

Suggested change
##' # Create sample SNP data
# Create sample SNP data

Copilot uses AI. Check for mistakes.
#Calculate the denominator
#Note: need to use 1/ploidy for ratios, where ploidy should be used for dosages.
#This is because the ratios are from 0-1, which is what you get when dosage/ploidy, whereas dosages are from 0 to ploidy
denominator <- (1/as.numeric(ploidy))*sum(p * q, na.rm = TRUE)
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The magic number calculation (1/as.numeric(ploidy)) should be clarified or extracted to a named variable. The comment explains the logic but the implementation could be more explicit.

Suggested change
denominator <- (1/as.numeric(ploidy))*sum(p * q, na.rm = TRUE)
ploidy_ratio <- 1 / as.numeric(ploidy) # Scaling factor for allele ratios based on ploidy
denominator <- ploidy_ratio * sum(p * q, na.rm = TRUE)

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Sep 18, 2025

Codecov Report

❌ Patch coverage is 79.46429% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.35%. Comparing base (297f058) to head (a248e93).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
R/madc2gmat.R 74.41% 22 Missing ⚠️
R/updog2vcf.R 16.66% 15 Missing ⚠️
R/filterMADC.R 95.23% 3 Missing ⚠️
R/madc2vcf_all.R 70.00% 3 Missing ⚠️
R/madc2vcf_targets.R 90.00% 2 Missing ⚠️
R/thinSNP.R 96.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #44      +/-   ##
==========================================
- Coverage   84.13%   83.35%   -0.78%     
==========================================
  Files          16       19       +3     
  Lines        1147     1358     +211     
==========================================
+ Hits          965     1132     +167     
- Misses        182      226      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alex-sandercock alex-sandercock merged commit 77085b8 into main Sep 18, 2025
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants