-
Notifications
You must be signed in to change notification settings - Fork 1
Sam2 refactor #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Sam2 refactor #10
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Major Changes: - Migrated from manual SAM2 library to HuggingFace transformers - Moved legacy code to legacy/ directory - Implemented clean class-based architecture with CLI and YAML configs New Modules: - data/: MSLoader, Preprocessor, SAMDataset (clean data pipeline) - training/: SAM2Trainer with validation loss tracking - inference/: RFIPredictor with iterative flagging support - config/: YAML configuration loader - data_generation/: MS and synthetic data generators Features: - Training & validation loss plots (dual curves) - Iterative flagging: N-pass RFI detection with cumulative masking - GPU profiling: validate_gpu.py with memory/utilization monitoring - Batch size optimization for V100/A100 - Real unit tests (removed mock-heavy tests) CLI Commands: - generate-data: Create datasets from MS or synthetic - train: Train on pre-generated HuggingFace datasets - predict: Single-pass or iterative flagging - create-config/validate-config: Config management Package: - pyproject.toml with proper dependencies (numpy>=1.26, pandas>=2.2) - pytest configuration - Example configs for training and validation Fixes: - Resolved pandas/numpy version conflicts - Separated data generation from training - Clean imports, no legacy dependencies
… The dataset generation now directly save torch tensors which allows for direct GPU loading. So dataset generation and preprocessing are done together and avoid loading time compute
…on tools Per-file changes: preprocessor.py: - Add automatic padding in _patchify_single_waterfall for arrays smaller than patch_size - Pad to multiples of patch_size for patchify compatibility - Store original_shapes in metadata for reconstruction cropping predictor.py: - Add save_probabilities parameter to save raw probability maps - Implement adaptive thresholding (threshold=None uses mean of probabilities) - Add upscaling of SAM2 256x256 outputs to patch_size using scipy.ndimage.zoom - Calculate padded shape for reconstruction, crop result to original dimensions evaluation/statistics.py (new): - Add compute_statistics for before/after flagging analysis - Add compute_ffi for Flagging Fidelity Index metric - Add print_statistics_comparison for formatted output evaluation/__init__.py: - Export compute_statistics, compute_ffi, print_statistics_comparison scripts/validate_single_array.py (new): - Standalone validation for synthetic or real single arrays - Probability heatmaps and histograms - Adaptive threshold testing - 2x4 grid (synthetic with GT) or 2x3 grid (real with FFI) ms_loader.py: - Add load_single_baseline method for extracting single baseline/pol sam_dataset.py: - Fix empty mask bbox: use full image [0,0,W,H] instead of center box sam2_trainer.py: - Fix logging check: use hasHandlers() instead of checking root logger configs/validation.yaml: - Fix stretch: sqrt → null for synthetic data pyproject.toml: - Add viz extras for holoviews/datashader visualization tools - Add samrfi.visualization package docs/batched_dataset_training.md: - Fix file extension examples: .npz → .pt This commit message and the doc updates are all made using Claude Code.
where I have incorporated the calcquality metric. Introducing a test for the metrics module.
Lazy loading casa and making a ci setup for pip install without heavy deps
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a complete overhaul of SAM-RFI to be able to accommodate SAM2 models. We also explored SAM3 but have decided against it for now as the model native training values are not 1024 but rather 1008.