-
Notifications
You must be signed in to change notification settings - Fork 2
Bug Fix DPMON CPU issues for tunning #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses CPU resource management issues in the DPMON hyperparameter tuning workflow and introduces several quality-of-life improvements across the clustering, plotting, and graph utilities modules. The main focus is on improving Ray Tune's resource allocation, retry logic, and reproducibility options to prevent resource exhaustion during hyperparameter search.
Key Changes
- DPMON Tuning Enhancements: Added retry mechanism for Ray Tune with automatic sample reduction on resource/OOM errors, improved GPU detection logic, and introduced
seed_trialsparameter for reproducible hyperparameter sampling - Clustering Improvements: Enhanced logging granularity (debug vs info levels), added DataFrame-to-NetworkX conversion support in HybridLouvain, and improved iteration metrics reporting
- Graph & Plotting Utilities: Added
self_loopsparameter to graph connectivity repair, improved label handling with.ravel(), newfind_omics_modalityfunction for feature-to-source mapping, and enhanced embedding visualization with custom legend labels
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| bioneuralnet/downstream_task/dpmon.py | Core CPU/GPU resource management fixes: added retry logic with adaptive num_samples reduction, proper CUDA availability checks, seed_trials parameter for reproducibility, removed unused variables, and adjusted dropout hyperparameter range |
| bioneuralnet/clustering/hybrid_louvain.py | Added Union[nx.Graph, pd.DataFrame] type support with automatic conversion, enhanced iteration logging with conductance/correlation metrics, trailing whitespace cleanup |
| bioneuralnet/clustering/correlated_louvain.py | Refined logging levels (info→debug for verbose operations, info→error for exceptions) and added cluster count summary |
| bioneuralnet/utils/graph_tools.py | Added self_loops parameter to repair_graph_connectivity and improved label handling with .ravel() for array flattening |
| bioneuralnet/metrics/plot.py | Added legend_labels parameter to plot_embeddings for custom legends, new find_omics_modality function for feature-to-source mapping |
| bioneuralnet/datasets/lgg/target.csv | Breaking change: renamed column from "vital_status" to "target" |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
|
|
||
| feature_to_source = {} | ||
| for src_name, df in zip(source_names, dfs): | ||
| for feat in df.columns: |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a feature appears in multiple DataFrames in dfs, the last occurrence will overwrite previous mappings in feature_to_source. This could lead to incorrect source attribution. Consider adding a check for duplicate features across DataFrames and either warn the user or use a different strategy (e.g., store all sources or raise an error).
| for feat in df.columns: | |
| for feat in df.columns: | |
| if feat in feature_to_source: | |
| logger.warning( | |
| f"Feature '{feat}' found in multiple sources: '{feature_to_source[feat]}' and '{src_name}'. " | |
| "The last occurrence will be used for source attribution." | |
| ) |
| "nn_hidden_dim1": tune.choice([128, 256, 512]), | ||
| "nn_hidden_dim2": tune.choice([32, 64, 128]), | ||
| "ae_encoding_dim": tune.choice([4, 8, 16]), | ||
| "num_epochs": tune.choice([512, 1024, 2048]), |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The dropout range has been changed from [0.0, 0.1, 0.2, 0.3, 0.4, 0.5] to [0.2, 0.3, 0.4, 0.5, 0.6], removing lower dropout values (0.0, 0.1) and adding a higher one (0.6). This suggests a deliberate shift toward more aggressive regularization. Consider documenting this change in comments or the commit message to explain why the lower dropout values were removed, as this could impact model performance.
| "num_epochs": tune.choice([512, 1024, 2048]), | |
| "num_epochs": tune.choice([512, 1024, 2048]), | |
| # Dropout range updated: removed lower values (0.0, 0.1) and added higher (0.6) to encourage more aggressive regularization. | |
| # This change may impact model performance; see commit message for rationale. |
|
|
||
| self.logger.info( | ||
| f"Iteration {iteration+1}: cluster size={new_size}, " | ||
| f"Conductance={cond:.3f} Correlation={corr:.3f} score={score:.3f}" |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of the values (conductance, correlation, or composite_score) are None, the string formatting with .3f will raise a TypeError. Add a check to handle None values gracefully, e.g., by using a default value or conditional formatting like f"{cond:.3f}" if cond is not None else "N/A".
| f"Conductance={cond:.3f} Correlation={corr:.3f} score={score:.3f}" | |
| f"Conductance={cond:.3f}" if cond is not None else "Conductance=N/A" + " " | |
| f"Correlation={corr:.3f}" if corr is not None else "Correlation=N/A" + " " | |
| f"score={score:.3f}" if score is not None else "score=N/A" |
| @@ -1,4 +1,4 @@ | |||
| patient,vital_status | |||
| patient,target | |||
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the column name from vital_status to target is a breaking change that will affect any code referencing this specific column name. Ensure that all references to vital_status in the codebase have been updated to target, or provide backward compatibility by supporting both column names.
| f"(cpu_per_trial={cpu_per_trial}, gpu_per_trial={gpu_per_trial})." | ||
| ) | ||
| num_samples = new_num_samples | ||
|
|
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else clause after the for loop is used here, which executes only if the loop completes without breaking. This is correct Python syntax but can be confusing. Consider adding a comment to clarify that this else clause triggers only if all retry attempts are exhausted without a successful break.
| # The else clause below executes only if the for loop completes without a break, | |
| # i.e., if all retry attempts are exhausted without a successful run. |
| max_iter (int): Maximum number of iterations. | ||
| weight (str): Edge weight parameter name. | ||
| tune (bool): Flag to enable tuning of parameters | ||
| tune (bool): Flag to enable tuning of parameters |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace should be removed for consistency.
| tune (bool): Flag to enable tuning of parameters | |
| tune (bool): Flag to enable tuning of parameters |
|
|
||
| if legend_labels is not None: | ||
| handles, _ = scatter.legend_elements(prop="colors") | ||
| ax.legend(handles, legend_labels, title="Omics Type", loc="best") |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The legend creation assumes that the number of unique colors in the scatter plot matches the length of legend_labels, but this is not validated. If legend_labels has a different number of elements than unique values in c_values, the legend will be incorrect or cause an error. Consider adding validation: if len(legend_labels) != len(set(c_values)) and either warn the user or handle the mismatch.
| ax.legend(handles, legend_labels, title="Omics Type", loc="best") | |
| num_unique = len(np.unique(c_values)) | |
| if len(legend_labels) != num_unique: | |
| logger.warning( | |
| f"Number of legend_labels ({len(legend_labels)}) does not match number of unique values in node_labels ({num_unique}). Skipping legend." | |
| ) | |
| else: | |
| ax.legend(handles, legend_labels, title="Omics Type", loc="best") |
| new_num_samples = max(1, num_samples // 2) | ||
| if new_num_samples == num_samples: | ||
| raise | ||
|
|
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition if new_num_samples == num_samples will only be true when num_samples == 1 (since max(1, 1 // 2) == 1). However, this check happens after the assignment, so if the initial num_samples was already 1, the code will raise an exception. Consider checking if num_samples <= 1 before attempting to halve it, to provide a clearer error path.
| new_num_samples = max(1, num_samples // 2) | |
| if new_num_samples == num_samples: | |
| raise | |
| if num_samples <= 1: | |
| raise | |
| new_num_samples = max(1, num_samples // 2) |
|
pushing without review. Running experiments on cloud environment. Will push updates with CHANGELOG updates on next PR. |
Bug Fix DPMON CPU issues for tunning