Skip to content

Conversation

@EiffL
Copy link
Member

@EiffL EiffL commented Nov 25, 2025

This pull request introduces the initial implementation of the SHINE pipeline, providing a complete configuration-driven workflow for Bayesian shear inference using JAX, NumPyro, and GalSim. The changes include new configuration models, data loading and synthetic generation, scene/model construction, inference engines, and a main entrypoint script. These updates establish a modular, extensible foundation for both synthetic and (future) real data analysis.

Configuration and Workflow Foundation:

  • Added a comprehensive configuration system using Pydantic models in shine/config.py, supporting flexible priors, image, PSF, galaxy, and inference options, with a YAML-based config handler.
  • Created an example YAML configuration file (configs/test_run.yaml) specifying all parameters for a test run, including galaxy, PSF, image, and inference settings.

Data Handling and Synthetic Generation:

  • Implemented shine/data.py with a DataLoader class that generates synthetic galaxy image data using GalSim when no real data is provided, including noise and PSF handling.

Model and Inference Engines:

  • Added shine/scene.py featuring a SceneBuilder class that constructs a NumPyro model for the scene, supporting flexible priors and differentiable rendering with JAX-GalSim.
  • Introduced shine/inference.py with HMC (NUTS) and MAP inference engines, supporting ArviZ output for posterior analysis and MAP parameter extraction.

Entrypoint and Workflow Integration:

  • Added shine/main.py, a CLI script to run the full pipeline: configuration loading, data generation, model building, inference execution (HMC or MAP), and result saving (including residuals and reduced chi-squared for MAP).
  • Updated the design diagram in DESIGN.md to reflect the new modular workflow and clarify component relationships.

Copy link

@CentofantiEze CentofantiEze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emma and I gave a good read to all the modules. The implementation is well structured and seems functional. We left a few comments and questions, specially in the parts that might need further refinement. We haven't tried to run the code yet.

Main comments

  • MAP: the main concern is the way the MAP is handled. In this code it seems that the MAP is a standalone inference option (we can do MCMC or MAP). However, we want the map to be an (optional) extra step to be performed prior to running the chains, to tune the initial starting points.
  • Ellipticities: the intrinsic ellipticity variables are missing.
  • Modularity. PSF modelling and galaxy morphology should be encapsulated in separate submodules/subpackages.
  • All the galaxies are being generated at the centre of the postage stamp (overlapped).
  • It seems that the synthetic observations are individual postage stamps but the scene modelling creates a single field.

type: Gaussian
sigma: 0.1

gal:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameters are for the data generation or the generative forward modelling?

Missing ellipticity prior information.

type: str = "Exponential" # Changed default from Sersic to Exponential
n: Optional[Union[float, DistributionConfig]] = None # Make optional for Exponential
flux: Union[float, DistributionConfig]
half_light_radius: Union[float, DistributionConfig] = Field(..., alias="half_light_radius")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Field do? If it is for adding an alias to half_light_radius shouldn't the alias be hlr? And why is Field only used here?

def generate_synthetic(config: ShineConfig) -> Observation:
import galsim

# 1. Define PSF

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually this will be a single line:

psf = psf_utils.get_psf(config.psf)

and all the code should be written in the psf_utils sub package/module.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies to the following lines.

g1 = get_mean(config.gal.shear.g1)
g2 = get_mean(config.gal.shear.g2)
shear = galsim.Shear(g1=g1, g2=g2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing intrinsic ellipticity.

g2 = get_mean(config.gal.shear.g2)
shear = galsim.Shear(g1=g1, g2=g2)

# Create Galaxy Object - Use Exponential (Sersic n=1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle if we follow the previous logic it should check the galaxy type:

if config.gal.type == "exponential":
...

sigma = self.config.image.noise.sigma
numpyro.sample("obs", dist.Normal(model_image, sigma), obs=observed_data)

return model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't return a log likelihood function but the forward model function.

if extra_args is None:
extra_args = {}

kernel = NUTS(self.model, dense_mass=self.dense_mass)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do NUTS and MCMC deal with initial samples?

from typing import Dict, Any
import arviz as az

class HMCInference:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be named simply Inference because it could implement various inference methods (HMC, MCMC, etc).

# Print summary
print(results.posterior)

elif args.mode == "map":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using map estimation for getting the initial samples closer to the solution, thus it should be run before the inference.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAP shouldn't be an inference mode option. It should be an optional step to run prior to inference (mcmc) to improve the starting point of the chains.

# We want 'model_image'.
# We didn't expose 'model_image' as a deterministic site in the builder.
# Let's modify the builder to expose it, OR we can just inspect the 'obs' distribution mean in the trace.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines below might not be useful given the comments above (MAP shouldn't be an independent inference method).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants