-
Notifications
You must be signed in to change notification settings - Fork 2
Imputation Utils + Tests #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Inside the utils.data module to we added several functions enhance data preprocessing and reproducibility:
- impute_omics_knn: Imputes missing values (NaNs) in omics data using K-Nearest Neighbors (KNN) imputation.
- normalize_omics: Normalizes omics data using specified methods: standard (Z-score), minmax, or log2.
- set_seed: Sets global random seed for reproducibility across Python, NumPy, and PyTorch.
- impute_omics: Imputes missing values (NaNs) using simple methods: mean, median, or zero.
- beta_to_m: Converts methylation Beta-values to M-values using log2 transformation for statistical analysis.
More commits will follow to develop the respective tests and documentation for these new functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds new data preprocessing utilities to the bioneuralnet.utils.data module and updates the test suite accordingly. The changes include imputation methods (mean, median, zero, KNN), normalization strategies (standard, minmax, log2), beta-to-M-value conversion for methylation data, and a seed-setting function for reproducibility. The PR also updates documentation references and fixes a parameter inconsistency in the SAGE model initialization.
- Adds five new utility functions:
impute_omics,impute_omics_knn,normalize_omics,beta_to_m, andset_seed - Enhances existing function documentation with detailed docstrings
- Updates
explore_data_statsto use logger instead of print statements - Adds comprehensive test coverage for all new functions
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| bioneuralnet/utils/data.py | Implements new imputation, normalization, beta-to-M conversion, and seed-setting functions; enhances all function docstrings; migrates print statements to logger |
| tests/test_data_utils.py | Adds test cases for all new utility functions and updates the existing test for logger-based output |
| bioneuralnet/utils/init.py | Exports the new utility functions in module's public API |
| bioneuralnet/downstream_task/dpmon.py | Comments out incorrect output_dim parameter from SAGE model initialization |
| bioneuralnet/datasets/dataset_loader.py | Updates docstring examples and comments out placeholder code for future dataset support |
| docs/source/index.rst | Corrects Zenodo DOI badge URL from .17503084 to .17503083 |
| README.md | Corrects Zenodo DOI badge URL from .17503084 to .17503083 |
| .gitignore | Adds patterns for new output directories and datasets |
Comments suppressed due to low confidence (1)
bioneuralnet/downstream_task/dpmon.py:588
- The GIN class constructor does not accept an
output_dimparameter (verified in gnn_models.py line 176). This line should be removed or commented out like the SAGE model above it. If left as-is, this will cause a TypeError when 'GIN' model_type is used.
output_dim=gnn_hidden_dim,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ElyasYassin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few minor comments for clarity, logic looks good.
Overall looks great, LGTM.
First commit includes changes to utils.data:
Inside the utils.data module to we added several functions enhance data preprocessing and reproducibility:
Second commits includes the respective pytests for the new functions.