Skip to content

Conversation

@nickcorona
Copy link

@nickcorona nickcorona commented Dec 2, 2024

Description
This pull request implements the Tweedie distribution in the GAM package. The Tweedie distribution is critical for modeling data that exhibit characteristics of both continuous and discrete components, such as insurance claims or zero-inflated data. This addition enhances the package's flexibility in handling real-world datasets with mixed-type responses.

Key Changes

  1. Added Tweedie distribution support.
  2. Included associated tests to ensure robustness.
  3. Updated documentation and examples for ease of use.

@ouslan
Copy link
Contributor

ouslan commented Dec 9, 2024

Could you site the literature you used to implement the Tweedie distribution? Seems very interesting.

@nickcorona
Copy link
Author

Could you site the literature you used to implement the Tweedie distribution? Seems very interesting.

  1. Jørgensen, B. (1997). The Theory of Dispersion Models. Chapman & Hall.

    • This book provides an in-depth exploration of dispersion models, including the Tweedie family, discussing their theoretical foundations and practical applications.
  2. Gilchrist, R., & Drinkwater, D. (2000). The use of the Tweedie distribution in statistical modelling. In COMPSTAT (pp. 313–318). Physica, Heidelberg.

    • This paper focuses on parameter estimation for Tweedie distributions, particularly the compound Poisson (1 < p < 2) and stable form (p > 2) cases, and demonstrates their application in modeling data with zero observations and large dispersion.
  3. Dunn, P. K., & Smyth, G. K. (2005). Series evaluation of Tweedie exponential dispersion model densities. Statistics and Computing, 15(4), 267–280.

    • This article presents methods for evaluating the densities of Tweedie exponential dispersion models, which is crucial for implementing these distributions in statistical software.
  4. Smyth, G. K., & Jørgensen, B. (2002). Fitting Tweedie’s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin: The Journal of the IAA, 32(1), 143–157.

    • This paper discusses fitting Tweedie models to insurance claims data, highlighting the practical implementation of these models in actuarial science.

@nickcorona
Copy link
Author

can I get a review for this PR?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

On semi-superficial glance, the math looks correct.

Some API considerations:

  • in GAM, we are adding an undocumented parameter power. I think this is not a good idea, since the user will not be able to guess whether and how to pass this.
  • instead, I would suggest to allow for distribution objects to be passed in addition to strings, and "tweedie" to translate to TweedieDist with some sensible default for power.
  • this needs to be clearly documented for the distribution parameter, i.e., which values are allowed - which strings, and that distribution objects are also allowed (e.g., Tweedie)

minor things:

  • TweedieDist docstring should be completed.
  • Could you also kindly ensure that the linting passes, i.e., the code-quality job? For this, you ought to run pre-commit.

Related FYI but not necessary for merging this, in skpro we are currently looking to:

  • interface pygam as an estimator
  • add the Tweedie distribution - this PR is stuck and needs help! sktime/skpro#428

@fkiraly fkiraly changed the title Add Tweedie distribution implementation and tests [ENH] Tweedie distribution support in GAM Nov 18, 2025
@fkiraly
Copy link
Collaborator

fkiraly commented Nov 18, 2025

FYI @nickcorona, sorry for the long delay (handover/maintenance period which is now over)

@dswah
Copy link
Owner

dswah commented Nov 20, 2025

@fkiraly

instead, I would suggest to allow for distribution objects to be passed in addition to strings, and "tweedie" to translate to TweedieDist

This should be already fine, since the GAM class accepts both distribution strings or instantiaited distribution objects
https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108

with some sensible default for power.

This is the critical aspect, since we dont have a method for estimating the power parameter

that distribution objects are also allowed (e.g., Tweedie)

Our docstrings already document that: https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108.
Is that sufficient?

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 21, 2025

This should be already fine, since the GAM class accepts both distribution strings or instantiaited distribution objects
https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108

Thanks for the pointer!

Our docstrings already document that: https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108.
Is that sufficient?

I would say: no. The docstring should either list the possible distribution strings that can be passed, and the classes that can be passed, or link to a list thereof. Otherwise, the user has to start searching if they want to understand how they can use the distribution parameter, if they start at the docstring.

I would say, string options should be listed, and a link to a page with the distributions should also be provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants