feat(flagd): introduce fatalStatusCodes option #1624

leakonvalinka · 2025-11-10T14:11:06Z

This PR

adds the fatalStatusCode option + env variable

Related Issues

Notes

I'm not too happy with how the fatal error is communicated through the different components (received at SyncStreamQueueSource -> FlagStore -> InProcessResolver -> FlagdProvider, respective RpcResolver -> FlagdProvider). It "misuses" the STALE state to differentiate between normal errors and fatal errors. I couldn't find a cleaner solution for this unfortunately, so feedback on this would be highly appreciated!

Will work on the remaining failing tests once we agree on how to proceed!

Follow-up Tasks

How to test

...lagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/config/ConfigSteps.java

...ers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java

providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/Utils.java

providers/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/FlagdOptions.java

Signed-off-by: lea konvalinka <lea.konvalinka@dynatrace.com>

Signed-off-by: Konvalinka <lea.konvalinka@dynatrace.com>

Signed-off-by: lea konvalinka <lea.konvalinka@dynatrace.com>

Signed-off-by: Konvalinka <lea.konvalinka@dynatrace.com>

chrfwow

Since we do not want to introduce breaking changes into the api by adding a PROVIDER_FATAL type to ProviderEvent, I have two suggestions how we might be able to work around the "misuse" of the stale event:
We could add a isFatal flag to the FlagdProviderEvent to track the type of error. I don't really like it because this could also be set when the event is not an error event, and with this we split up information that should be stored in one place into two places.
Or, we create an enum class ExtendedProviderEvent, which is a copy of ProviderEvent (enums cannot be extended in Java), plus the additional PROVIDER_FATAL field. We would then have to map where needed between the two types (not 100% sure if this will work). I don't like this either, because we would duplicate the ProviderEvent enum

chrfwow · 2025-12-17T11:39:09Z

...e/contrib/providers/flagd/resolver/process/storage/connector/sync/SyncStreamQueueSource.java

    private final BlockingQueue<QueuePayload> outgoingQueue = new LinkedBlockingQueue<>(QUEUE_SIZE);
    private final FlagSyncServiceStub flagSyncStub;
    private final FlagSyncServiceBlockingStub metadataStub;
+    private final List<String> fatalStatusCodes;


Since we do lots of .contains operation on this data structure, a HashSet might be more performant. How many entries do we expect in this list?

That's hard for me to estimate, what do the others think? The currently defined default is an empty list

...rs/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/resolver/rpc/RpcResolver.java

chrfwow · 2025-12-17T11:45:30Z

providers/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/Config.java

+                    .map(String::trim)
+                    .collect(Collectors.toList()) : defaultValue;
+        } catch (Exception e) {
+            return defaultValue;


We should print an info/warn that the env vars are invalid

Just for this method? Or the other ones too? I'd either leave it or add it in all cases to be consistent

Then we should add it everywhere, but in a different PR

Alright, sounds good. Should we create a new issue for this or is that overkill?

chrfwow · 2025-12-17T11:46:38Z

providers/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/FlagdOptions.java

+     * Defaults to empty list
+     */
+    @Builder.Default
+    private List<String> fatalStatusCodes = fallBackToEnvOrDefaultList(Config.FATAL_STATUS_CODES_ENV_VAR_NAME, List.of());


Do we really want to retry on every error code per default? How is this handled in our other sdks?

No, you're right, I'll rephrase it to "for which the provider transitions into fatal mode upon first connection". The general retry policy is defined here and is the same for all sdks afaik

chrfwow · 2025-12-17T11:48:25Z

providers/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/FlagdProvider.java

-                    if (syncResources.getPreviousEvent() != ProviderEvent.PROVIDER_ERROR) {
-                        onError();
-                        syncResources.setPreviousEvent(ProviderEvent.PROVIDER_ERROR);
+                case PROVIDER_STALE:


Please update the javadoc above the switch, we do now use the STALE state

Will do once we agree on a final implementation

guidobrei · 2025-12-17T12:53:05Z

I'm not too happy with how the fatal error is communicated through the different components (received at SyncStreamQueueSource -> FlagStore -> InProcessResolver -> FlagdProvider...)

This is an implication of our provider design and there is not really something to do about that (in this PR).

github-actions bot assigned beeme1mr, Kavindu-Dodan, thisthat and toddbaert Nov 10, 2025

github-actions bot requested review from Kavindu-Dodan, beeme1mr, thisthat and toddbaert November 10, 2025 14:11

aepfli reviewed Nov 10, 2025

View reviewed changes

...lagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/config/ConfigSteps.java Outdated Show resolved Hide resolved

aepfli reviewed Nov 10, 2025

View reviewed changes

...ers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java Outdated Show resolved Hide resolved

aepfli reviewed Nov 10, 2025

View reviewed changes

providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/Utils.java Outdated Show resolved Hide resolved

aepfli reviewed Nov 10, 2025

View reviewed changes

providers/flagd/src/main/java/dev/openfeature/contrib/providers/flagd/FlagdOptions.java Outdated Show resolved Hide resolved

aepfli mentioned this pull request Nov 11, 2025

feat: add missing steps for config and improve wording open-feature/flagd-testbed#311

Merged

leakonvalinka force-pushed the fix/flagd-infinite-connection-retries branch from 6f89ff0 to 057751b Compare November 12, 2025 10:01

leakonvalinka added 2 commits November 20, 2025 12:48

fix(flagd): no retry for certain error codes, implement test steps

8b7f574

Signed-off-by: lea konvalinka <lea.konvalinka@dynatrace.com>

attempt to handle fatal error

f0a1db2

Signed-off-by: lea konvalinka <lea.konvalinka@dynatrace.com>

leakonvalinka force-pushed the fix/flagd-infinite-connection-retries branch from f7f1d97 to f0a1db2 Compare November 20, 2025 12:22

leakonvalinka added 5 commits November 24, 2025 10:49

fix(flagd): update testbed + step, fix event

654c8da

Signed-off-by: lea konvalinka <lea.konvalinka@dynatrace.com>

adjust rpc resolver

07195a7

Signed-off-by: lea konvalinka <lea.konvalinka@dynatrace.com>

Merge branch 'main' into fix/flagd-infinite-connection-retries

ccf5120

fix e2e tests

e6d4057

Signed-off-by: Konvalinka <lea.konvalinka@dynatrace.com>

Merge branch 'main' into fix/flagd-infinite-connection-retries

75392e6

Signed-off-by: lea konvalinka <lea.konvalinka@dynatrace.com>

leakonvalinka changed the title ~~fix(flagd): no retry for certain error codes, implement test steps~~ feat(flagd): introduce fatalStatusCodes option Dec 17, 2025

leakonvalinka added 2 commits December 17, 2025 10:50

clean up

95a880c

Signed-off-by: Konvalinka <lea.konvalinka@dynatrace.com>

fatal only on first connection

45a9822

Signed-off-by: Konvalinka <lea.konvalinka@dynatrace.com>

leakonvalinka marked this pull request as ready for review December 17, 2025 10:34

leakonvalinka requested a review from a team as a code owner December 17, 2025 10:34

remove exclusion of sync e2e test tag

e50aa7f

Signed-off-by: Konvalinka <lea.konvalinka@dynatrace.com>

chrfwow reviewed Dec 17, 2025

View reviewed changes

feat(flagd): introduce fatalStatusCodes option #1624

Are you sure you want to change the base?

feat(flagd): introduce fatalStatusCodes option #1624

Conversation

leakonvalinka commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR

Related Issues

Notes

Follow-up Tasks

How to test

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chrfwow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leakonvalinka Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guidobrei commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

leakonvalinka commented Nov 10, 2025 •

edited

Loading

leakonvalinka Dec 17, 2025 •

edited

Loading

guidobrei commented Dec 17, 2025 •

edited

Loading