-
Notifications
You must be signed in to change notification settings - Fork 0
Classification #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification #13
Conversation
Use native xgboost output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces gamma/hadron classification functionality and significantly refactors the codebase to reduce duplication and improve maintainability. The changes consolidate common patterns for both stereo reconstruction and classification tasks.
Key Changes:
- Adds new classification training and application scripts for gamma/hadron separation
- Introduces energy and zenith binning for classification models
- Refactors data processing, model loading, and evaluation code into shared modules
- Adds pre-training quality cuts for data filtering
- Removes all existing unit tests without replacement
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit_tests/test_*.py | All unit tests deleted - significant test coverage reduction |
| tests/resources/classify-parameter.json | New configuration file defining energy and zenith bins for classification |
| tests/conftest.py | Updated with shared fixture for telescope data |
| src/eventdisplay_ml/features.py | New module centralizing feature definitions for both analysis types |
| src/eventdisplay_ml/hyper_parameters.py | New module defining hyperparameters and pre-cuts |
| src/eventdisplay_ml/config.py | New configuration module handling CLI arguments for both training and application |
| src/eventdisplay_ml/models.py | New module consolidating model loading and application logic |
| src/eventdisplay_ml/scripts/train_xgb_classify.py | New classification training script |
| src/eventdisplay_ml/scripts/apply_xgb_classify.py | New classification application script |
| src/eventdisplay_ml/scripts/train_xgb_stereo.py | Refactored to use new config and model modules |
| src/eventdisplay_ml/scripts/apply_xgb_stereo.py | Refactored to use new model application infrastructure |
| src/eventdisplay_ml/evaluate.py | Enhanced with classification evaluation and improved feature importance handling |
| src/eventdisplay_ml/data_processing.py | Significantly refactored with support for both analysis types and energy/zenith binning |
| src/eventdisplay_ml/utils.py | Added model parameter loading and output filename generation utilities |
| src/eventdisplay_ml/training_variables.py | Deleted - functionality moved to features.py |
| pyproject.toml | Added new CLI entry points for classification scripts |
| docs/changes/*.md | Added changelog entries for new features and refactoring |
Comments suppressed due to low confidence (1)
tests/unit_tests/test_utils.py:1
- This PR deletes all existing unit tests (test_utils.py, test_training_variables.py, test_evaluate.py, test_data_processing.py, test_train_xgb_stereo.py, test_apply_xgb_stereo.py) without providing replacement tests. This represents a significant reduction in test coverage for the codebase. The new functionality introduced (classification routines, refactored data processing, new model loading, etc.) also lacks test coverage. Consider adding comprehensive unit tests to maintain code quality and prevent regressions.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
| flatten_data = flatten_data.reindex(columns=models[n_tel]["features"]) | ||
| model = models[n_tel]["model"] | ||
| preds[group_df.index] = model.predict(flatten_data) |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The apply_regression_models function uses group_df.index to index into the preds array on line 211. However, if the DataFrame has been reset with a non-default index or has gaps in indices, this could lead to IndexError or incorrect assignment. The function should use positional indices or ensure the index is properly aligned with the predictions array.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
No description provided.