Skip to content

Conversation

@ramosv
Copy link
Member

@ramosv ramosv commented Nov 8, 2025

First commit includes changes to utils.data:
Inside the utils.data module to we added several functions enhance data preprocessing and reproducibility:

- impute_omics_knn: Imputes missing values (NaNs) in omics data using K-Nearest Neighbors (KNN) imputation.
- normalize_omics: Normalizes omics data using specified methods: standard (Z-score), minmax, or log2.
- set_seed: Sets global random seed for reproducibility across Python, NumPy, and PyTorch.
- impute_omics: Imputes missing values (NaNs) using simple methods: mean, median, or zero.
- beta_to_m: Converts methylation Beta-values to M-values using log2 transformation for statistical analysis.

Second commits includes the respective pytests for the new functions.

Inside the utils.data module to we added several functions enhance data preprocessing and reproducibility:

    - impute_omics_knn: Imputes missing values (NaNs) in omics data using K-Nearest Neighbors (KNN) imputation.
    - normalize_omics: Normalizes omics data using specified methods: standard (Z-score), minmax, or log2.
    - set_seed: Sets global random seed for reproducibility across Python, NumPy, and PyTorch.
    - impute_omics: Imputes missing values (NaNs) using simple methods: mean, median, or zero.
    - beta_to_m: Converts methylation Beta-values to M-values using log2 transformation for statistical analysis.

More commits will follow to develop the respective tests and documentation for these new functions.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds new data preprocessing utilities to the bioneuralnet.utils.data module and updates the test suite accordingly. The changes include imputation methods (mean, median, zero, KNN), normalization strategies (standard, minmax, log2), beta-to-M-value conversion for methylation data, and a seed-setting function for reproducibility. The PR also updates documentation references and fixes a parameter inconsistency in the SAGE model initialization.

  • Adds five new utility functions: impute_omics, impute_omics_knn, normalize_omics, beta_to_m, and set_seed
  • Enhances existing function documentation with detailed docstrings
  • Updates explore_data_stats to use logger instead of print statements
  • Adds comprehensive test coverage for all new functions

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
bioneuralnet/utils/data.py Implements new imputation, normalization, beta-to-M conversion, and seed-setting functions; enhances all function docstrings; migrates print statements to logger
tests/test_data_utils.py Adds test cases for all new utility functions and updates the existing test for logger-based output
bioneuralnet/utils/init.py Exports the new utility functions in module's public API
bioneuralnet/downstream_task/dpmon.py Comments out incorrect output_dim parameter from SAGE model initialization
bioneuralnet/datasets/dataset_loader.py Updates docstring examples and comments out placeholder code for future dataset support
docs/source/index.rst Corrects Zenodo DOI badge URL from .17503084 to .17503083
README.md Corrects Zenodo DOI badge URL from .17503084 to .17503083
.gitignore Adds patterns for new output directories and datasets
Comments suppressed due to low confidence (1)

bioneuralnet/downstream_task/dpmon.py:588

  • The GIN class constructor does not accept an output_dim parameter (verified in gnn_models.py line 176). This line should be removed or commented out like the SAGE model above it. If left as-is, this will cause a TypeError when 'GIN' model_type is used.
                output_dim=gnn_hidden_dim,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@ElyasYassin ElyasYassin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few minor comments for clarity, logic looks good.
Overall looks great, LGTM.

@ramosv ramosv merged commit 29f6183 into main Nov 8, 2025
17 checks passed
@ramosv ramosv deleted the imputation-util branch November 8, 2025 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants