-
Notifications
You must be signed in to change notification settings - Fork 187
[ENH] Tweedie distribution support in GAM #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ease sample size in test_sample
|
Could you site the literature you used to implement the Tweedie distribution? Seems very interesting. |
|
|
can I get a review for this PR? |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
On semi-superficial glance, the math looks correct.
Some API considerations:
- in
GAM, we are adding an undocumented parameterpower. I think this is not a good idea, since the user will not be able to guess whether and how to pass this. - instead, I would suggest to allow for distribution objects to be passed in addition to strings, and
"tweedie"to translate toTweedieDistwith some sensible default forpower. - this needs to be clearly documented for the
distributionparameter, i.e., which values are allowed - which strings, and that distribution objects are also allowed (e.g.,Tweedie)
minor things:
TweedieDistdocstring should be completed.- Could you also kindly ensure that the linting passes, i.e., the
code-qualityjob? For this, you ought to runpre-commit.
Related FYI but not necessary for merging this, in skpro we are currently looking to:
- interface
pygamas an estimator - add the Tweedie distribution - this PR is stuck and needs help! sktime/skpro#428
|
FYI @nickcorona, sorry for the long delay (handover/maintenance period which is now over) |
This should be already fine, since the GAM class accepts both distribution strings or instantiaited distribution objects
This is the critical aspect, since we dont have a method for estimating the power parameter
Our docstrings already document that: https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108. |
Thanks for the pointer!
I would say: no. The docstring should either list the possible distribution strings that can be passed, and the classes that can be passed, or link to a list thereof. Otherwise, the user has to start searching if they want to understand how they can use the I would say, string options should be listed, and a link to a page with the distributions should also be provided. |
Description
This pull request implements the Tweedie distribution in the GAM package. The Tweedie distribution is critical for modeling data that exhibit characteristics of both continuous and discrete components, such as insurance claims or zero-inflated data. This addition enhances the package's flexibility in handling real-world datasets with mixed-type responses.
Key Changes