Imputation Utils + Tests #94

ramosv · 2025-11-08T19:08:57Z

First commit includes changes to utils.data:
Inside the utils.data module to we added several functions enhance data preprocessing and reproducibility:

- impute_omics_knn: Imputes missing values (NaNs) in omics data using K-Nearest Neighbors (KNN) imputation.
- normalize_omics: Normalizes omics data using specified methods: standard (Z-score), minmax, or log2.
- set_seed: Sets global random seed for reproducibility across Python, NumPy, and PyTorch.
- impute_omics: Imputes missing values (NaNs) using simple methods: mean, median, or zero.
- beta_to_m: Converts methylation Beta-values to M-values using log2 transformation for statistical analysis.

Second commits includes the respective pytests for the new functions.

Inside the utils.data module to we added several functions enhance data preprocessing and reproducibility: - impute_omics_knn: Imputes missing values (NaNs) in omics data using K-Nearest Neighbors (KNN) imputation. - normalize_omics: Normalizes omics data using specified methods: standard (Z-score), minmax, or log2. - set_seed: Sets global random seed for reproducibility across Python, NumPy, and PyTorch. - impute_omics: Imputes missing values (NaNs) using simple methods: mean, median, or zero. - beta_to_m: Converts methylation Beta-values to M-values using log2 transformation for statistical analysis. More commits will follow to develop the respective tests and documentation for these new functions.

Copilot

Pull Request Overview

This PR adds new data preprocessing utilities to the bioneuralnet.utils.data module and updates the test suite accordingly. The changes include imputation methods (mean, median, zero, KNN), normalization strategies (standard, minmax, log2), beta-to-M-value conversion for methylation data, and a seed-setting function for reproducibility. The PR also updates documentation references and fixes a parameter inconsistency in the SAGE model initialization.

Adds five new utility functions: impute_omics, impute_omics_knn, normalize_omics, beta_to_m, and set_seed
Enhances existing function documentation with detailed docstrings
Updates explore_data_stats to use logger instead of print statements
Adds comprehensive test coverage for all new functions

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
bioneuralnet/utils/data.py	Implements new imputation, normalization, beta-to-M conversion, and seed-setting functions; enhances all function docstrings; migrates print statements to logger
tests/test_data_utils.py	Adds test cases for all new utility functions and updates the existing test for logger-based output
bioneuralnet/utils/init.py	Exports the new utility functions in module's public API
bioneuralnet/downstream_task/dpmon.py	Comments out incorrect `output_dim` parameter from SAGE model initialization
bioneuralnet/datasets/dataset_loader.py	Updates docstring examples and comments out placeholder code for future dataset support
docs/source/index.rst	Corrects Zenodo DOI badge URL from .17503084 to .17503083
README.md	Corrects Zenodo DOI badge URL from .17503084 to .17503083
.gitignore	Adds patterns for new output directories and datasets

Comments suppressed due to low confidence (1)

bioneuralnet/downstream_task/dpmon.py:588

The GIN class constructor does not accept an output_dim parameter (verified in gnn_models.py line 176). This line should be removed or commented out like the SAGE model above it. If left as-is, this will cause a TypeError when 'GIN' model_type is used.

                output_dim=gnn_hidden_dim,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bioneuralnet/utils/data.py

bioneuralnet/datasets/dataset_loader.py

bioneuralnet/downstream_task/dpmon.py

bioneuralnet/datasets/dataset_loader.py

tests/test_data_utils.py

ElyasYassin

Added a few minor comments for clarity, logic looks good.
Overall looks great, LGTM.

ramosv added 2 commits November 8, 2025 11:37

Added respective tests

6fbf9bf

ramosv requested review from ElyasYassin, SundousHussein, abdelhafizm and Copilot November 8, 2025 19:08

Copilot AI reviewed Nov 8, 2025

View reviewed changes

bioneuralnet/utils/data.py Show resolved Hide resolved

bioneuralnet/utils/data.py Show resolved Hide resolved

bioneuralnet/datasets/dataset_loader.py Show resolved Hide resolved

ElyasYassin reviewed Nov 8, 2025

View reviewed changes

bioneuralnet/downstream_task/dpmon.py Show resolved Hide resolved

ElyasYassin reviewed Nov 8, 2025

View reviewed changes

bioneuralnet/datasets/dataset_loader.py Show resolved Hide resolved

ElyasYassin reviewed Nov 8, 2025

View reviewed changes

tests/test_data_utils.py Show resolved Hide resolved

ElyasYassin approved these changes Nov 8, 2025

View reviewed changes

ramosv merged commit 29f6183 into main Nov 8, 2025
17 checks passed

ramosv deleted the imputation-util branch November 8, 2025 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Imputation Utils + Tests #94

Imputation Utils + Tests #94

Uh oh!

ramosv commented Nov 8, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ElyasYassin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Imputation Utils + Tests #94

Imputation Utils + Tests #94

Uh oh!

Conversation

ramosv commented Nov 8, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ElyasYassin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants