Skip to content

Conversation

@ramosv
Copy link
Member

@ramosv ramosv commented Nov 28, 2025

Bug Fix DPMON CPU issues for tunning

Copilot AI review requested due to automatic review settings November 28, 2025 23:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses CPU resource management issues in the DPMON hyperparameter tuning workflow and introduces several quality-of-life improvements across the clustering, plotting, and graph utilities modules. The main focus is on improving Ray Tune's resource allocation, retry logic, and reproducibility options to prevent resource exhaustion during hyperparameter search.

Key Changes

  • DPMON Tuning Enhancements: Added retry mechanism for Ray Tune with automatic sample reduction on resource/OOM errors, improved GPU detection logic, and introduced seed_trials parameter for reproducible hyperparameter sampling
  • Clustering Improvements: Enhanced logging granularity (debug vs info levels), added DataFrame-to-NetworkX conversion support in HybridLouvain, and improved iteration metrics reporting
  • Graph & Plotting Utilities: Added self_loops parameter to graph connectivity repair, improved label handling with .ravel(), new find_omics_modality function for feature-to-source mapping, and enhanced embedding visualization with custom legend labels

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
bioneuralnet/downstream_task/dpmon.py Core CPU/GPU resource management fixes: added retry logic with adaptive num_samples reduction, proper CUDA availability checks, seed_trials parameter for reproducibility, removed unused variables, and adjusted dropout hyperparameter range
bioneuralnet/clustering/hybrid_louvain.py Added Union[nx.Graph, pd.DataFrame] type support with automatic conversion, enhanced iteration logging with conductance/correlation metrics, trailing whitespace cleanup
bioneuralnet/clustering/correlated_louvain.py Refined logging levels (info→debug for verbose operations, info→error for exceptions) and added cluster count summary
bioneuralnet/utils/graph_tools.py Added self_loops parameter to repair_graph_connectivity and improved label handling with .ravel() for array flattening
bioneuralnet/metrics/plot.py Added legend_labels parameter to plot_embeddings for custom legends, new find_omics_modality function for feature-to-source mapping
bioneuralnet/datasets/lgg/target.csv Breaking change: renamed column from "vital_status" to "target"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.


feature_to_source = {}
for src_name, df in zip(source_names, dfs):
for feat in df.columns:
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a feature appears in multiple DataFrames in dfs, the last occurrence will overwrite previous mappings in feature_to_source. This could lead to incorrect source attribution. Consider adding a check for duplicate features across DataFrames and either warn the user or use a different strategy (e.g., store all sources or raise an error).

Suggested change
for feat in df.columns:
for feat in df.columns:
if feat in feature_to_source:
logger.warning(
f"Feature '{feat}' found in multiple sources: '{feature_to_source[feat]}' and '{src_name}'. "
"The last occurrence will be used for source attribution."
)

Copilot uses AI. Check for mistakes.
"nn_hidden_dim1": tune.choice([128, 256, 512]),
"nn_hidden_dim2": tune.choice([32, 64, 128]),
"ae_encoding_dim": tune.choice([4, 8, 16]),
"num_epochs": tune.choice([512, 1024, 2048]),
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The dropout range has been changed from [0.0, 0.1, 0.2, 0.3, 0.4, 0.5] to [0.2, 0.3, 0.4, 0.5, 0.6], removing lower dropout values (0.0, 0.1) and adding a higher one (0.6). This suggests a deliberate shift toward more aggressive regularization. Consider documenting this change in comments or the commit message to explain why the lower dropout values were removed, as this could impact model performance.

Suggested change
"num_epochs": tune.choice([512, 1024, 2048]),
"num_epochs": tune.choice([512, 1024, 2048]),
# Dropout range updated: removed lower values (0.0, 0.1) and added higher (0.6) to encourage more aggressive regularization.
# This change may impact model performance; see commit message for rationale.

Copilot uses AI. Check for mistakes.

self.logger.info(
f"Iteration {iteration+1}: cluster size={new_size}, "
f"Conductance={cond:.3f} Correlation={corr:.3f} score={score:.3f}"
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of the values (conductance, correlation, or composite_score) are None, the string formatting with .3f will raise a TypeError. Add a check to handle None values gracefully, e.g., by using a default value or conditional formatting like f"{cond:.3f}" if cond is not None else "N/A".

Suggested change
f"Conductance={cond:.3f} Correlation={corr:.3f} score={score:.3f}"
f"Conductance={cond:.3f}" if cond is not None else "Conductance=N/A" + " "
f"Correlation={corr:.3f}" if corr is not None else "Correlation=N/A" + " "
f"score={score:.3f}" if score is not None else "score=N/A"

Copilot uses AI. Check for mistakes.
@@ -1,4 +1,4 @@
patient,vital_status
patient,target
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the column name from vital_status to target is a breaking change that will affect any code referencing this specific column name. Ensure that all references to vital_status in the codebase have been updated to target, or provide backward compatibility by supporting both column names.

Copilot uses AI. Check for mistakes.
f"(cpu_per_trial={cpu_per_trial}, gpu_per_trial={gpu_per_trial})."
)
num_samples = new_num_samples

Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else clause after the for loop is used here, which executes only if the loop completes without breaking. This is correct Python syntax but can be confusing. Consider adding a comment to clarify that this else clause triggers only if all retry attempts are exhausted without a successful break.

Suggested change
# The else clause below executes only if the for loop completes without a break,
# i.e., if all retry attempts are exhausted without a successful run.

Copilot uses AI. Check for mistakes.
max_iter (int): Maximum number of iterations.
weight (str): Edge weight parameter name.
tune (bool): Flag to enable tuning of parameters
tune (bool): Flag to enable tuning of parameters
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace should be removed for consistency.

Suggested change
tune (bool): Flag to enable tuning of parameters
tune (bool): Flag to enable tuning of parameters

Copilot uses AI. Check for mistakes.

if legend_labels is not None:
handles, _ = scatter.legend_elements(prop="colors")
ax.legend(handles, legend_labels, title="Omics Type", loc="best")
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legend creation assumes that the number of unique colors in the scatter plot matches the length of legend_labels, but this is not validated. If legend_labels has a different number of elements than unique values in c_values, the legend will be incorrect or cause an error. Consider adding validation: if len(legend_labels) != len(set(c_values)) and either warn the user or handle the mismatch.

Suggested change
ax.legend(handles, legend_labels, title="Omics Type", loc="best")
num_unique = len(np.unique(c_values))
if len(legend_labels) != num_unique:
logger.warning(
f"Number of legend_labels ({len(legend_labels)}) does not match number of unique values in node_labels ({num_unique}). Skipping legend."
)
else:
ax.legend(handles, legend_labels, title="Omics Type", loc="best")

Copilot uses AI. Check for mistakes.
Comment on lines +930 to +933
new_num_samples = max(1, num_samples // 2)
if new_num_samples == num_samples:
raise

Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition if new_num_samples == num_samples will only be true when num_samples == 1 (since max(1, 1 // 2) == 1). However, this check happens after the assignment, so if the initial num_samples was already 1, the code will raise an exception. Consider checking if num_samples <= 1 before attempting to halve it, to provide a clearer error path.

Suggested change
new_num_samples = max(1, num_samples // 2)
if new_num_samples == num_samples:
raise
if num_samples <= 1:
raise
new_num_samples = max(1, num_samples // 2)

Copilot uses AI. Check for mistakes.
@ramosv
Copy link
Member Author

ramosv commented Nov 29, 2025

pushing without review. Running experiments on cloud environment. Will push updates with CHANGELOG updates on next PR.

@ramosv ramosv merged commit b2208f6 into main Nov 29, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants