Skip to content

Conversation

@poikilotherm
Copy link
Contributor

@poikilotherm poikilotherm commented Jul 18, 2025

What this PR does / why we need it:

Enable setting all the database settings in one go, using an atomic database operation.

Which issue(s) this PR closes:

Special notes for your reviewer:
Aside from the bulk operation itself, it required a lot more work to actually implement this properly. Unfortunately, this makes this PR a lot larger than anticipated, but hopefully it's worth it. (I suppose DB opts are not going to go away anytime soon...)

  1. I had to migrate any non-aligned database setting names. Reshaped them into a form that makes it easier to do input validation in the code. But also makes it easier for sysadmins to use, as there are no longer three different naming schemes for them. (This affects the infamous BuiltinUsers.KEY and limits for tabular file ingests.)
  2. All Admin API endpoints for settings didn't do any input validation at all. You could basically do whatever you want, even accidentally. This has been mitigated.
  3. The way the Settingclass was setup it impeded performance and did not enforce any constraints to avoid duplicated settings. (There was a Flyway script to take care of that, but it wasn't at all in the code, not even mentioned as comment.)
  4. The endpoint to retrieve all settings as JSON was faulty: it did not take the localization into account.
  5. Most of the code around DB settings was untested. For bits I added, I added tests to make sure things work as intended.
  6. Archive settings had to be moved to the enum. See 83589ba and Iqss/7140 google cloud archiver #7292 (comment) .
  • Decide whether or not to document in the Native API docs which (two) settings are supporting localization (within this PR).

Suggestions on how to test this:

  • Run the included tests
  • Roundtrip! Get all settings as JSON and put them back in.
  • Apply DB migrations on a DB dump from a real instance using Flyway Maven Plugin to verify before deploying
  • Use the script in configbaker to try a real use case scenario
  • Test the archive settings mentioned in the release note (some will be hard to test, unfortunately, because we may not have access to systems like DuraCloud, etc.) See 83589ba and Iqss/7140 google cloud archiver #7292 (comment) .

Please note: the Jenkins build / test run will always fail until gdcc/dataverse-ansible#399 is merged! You'll need to execute the tests locally, e.g. using a containerized approach, which also helps with trying the apply-db-settings.sh script. @donsizemore hooked us up with a dedicated Jenkins job for this PR. Check the status of this job rather than the normal one: https://jenkins.dataverse.org/job/IQSS-Dataverse-11639/

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope

Is there a release notes update needed for this change?:
To Be Done Included

Additional documentation:

Preview docs here:

@poikilotherm poikilotherm added this to the 6.8 milestone Jul 18, 2025
@poikilotherm poikilotherm self-assigned this Jul 18, 2025
@github-actions github-actions bot added Component: Code Infrastructure formerly "Feature: Code Infrastructure" Component: Containers Anything related to cloudy Dataverse, shipped in containers. Component: JSF Involves modifying JSF (Jakarta Server Faces) code, which is being replaced with React. Feature: Installation Guide Type: Feature a feature request User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh labels Jul 18, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

shellcheck

📝 [shellcheck] reported by reviewdog 🐶
Double quote to prevent globbing and word splitting. SC2086

curl -s -X POST -d'[":guest","@anAuthUser"]' -H "Content-type:application/json" $ENDPOINT/dataverses/permissionsTestDv/groups/PTG/roleAssignees?key=$ROOT_KEY


📝 [shellcheck] reported by reviewdog 🐶
Double quote to prevent globbing and word splitting. SC2086

curl -s -X POST -d"$ASSIGNMENT" -H "Content-type:application/json" $ENDPOINT/dataverses/permissionsTestDv/assignments/?key=$ROOT_KEY


📝 [shellcheck] reported by reviewdog 🐶
Double quote to prevent globbing and word splitting. SC2086

echo @anAuthUser $AN_AUTH_USER_KEY


📝 [shellcheck] reported by reviewdog 🐶
Double quote to prevent globbing and word splitting. SC2086

echo @anotherAuthUser $ANOTHER_AUTH_USER_KEY


📝 [shellcheck] reported by reviewdog 🐶
Double quote to prevent globbing and word splitting. SC2086

curl -si $ENDPOINT/dataverses/permissionsTestDv?key=$AN_AUTH_USER_KEY | head -n 1


📝 [shellcheck] reported by reviewdog 🐶
Double quote to prevent globbing and word splitting. SC2086

curl -si $ENDPOINT/dataverses/permissionsTestDv?key=$ANOTHER_AUTH_USER_KEY | head -n 1


📝 [shellcheck] reported by reviewdog 🐶
Double quote to prevent globbing and word splitting. SC2086

curl -si -X POST -d@assignment.json -H "Content-type:application/json" $ENDPOINT/dataverses/permissionsTestDv/assignments/?key=$ANOTHER_AUTH_USER_KEY | head -n 1


📝 [shellcheck] reported by reviewdog 🐶
Double quote to prevent globbing and word splitting. SC2086

curl -si -X POST -d@assignment.json -H "Content-type:application/json" $ENDPOINT/dataverses/permissionsTestDv/assignments/?key=$AN_AUTH_USER_KEY | head -n 1

@github-actions

This comment has been minimized.

@poikilotherm poikilotherm marked this pull request as ready for review July 21, 2025 09:30
@poikilotherm poikilotherm requested a review from pdurbin as a code owner July 21, 2025 09:30
@poikilotherm poikilotherm marked this pull request as draft July 21, 2025 09:30
@poikilotherm poikilotherm removed the request for review from pdurbin July 21, 2025 09:31
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@sekmiller
Copy link
Contributor

@poikilotherm the AdminIT$SettingsAPI#testSettingsRoundTrip fails if you have a bad setting name in your table. In my case I had put in a manual update for ":publicInstall" instead of ":PublicInstall". When I deleted the offending setting name from my local table the test passed.

@sekmiller
Copy link
Contributor

@poikilotherm deployment in Jenkins is failing. Here is one of the errors I'm seeing: TASK [dataverse : set user management quesadilla] ******************************
fatal: [localhost]: FAILED! => {"changed": false, "connection": "close", "content_length": "66", "content_type": "application/json;charset=UTF-8", "elapsed": 0, "json": {"message": "The name of the setting is invalid.", "status": "ERROR"}, "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request", "redirected": false, "server": "Payara Server 6.2025.3 #badassfish", "status": 400, "url": "http://localhost:8080/api/admin/settings/BuiltinUsers.KEY", "x_frame_options": "SAMEORIGIN", "x_powered_by": "Servlet/6.0 JSP/3.1 (Payara Server 6.2025.3 #badassfish Java/Red Hat, Inc./17)"}

@poikilotherm
Copy link
Contributor Author

poikilotherm commented Nov 14, 2025

@sekmiller thanks for looking into this!

In my case I had put in a manual update for ":publicInstall" instead of ":PublicInstall". When I deleted the offending setting name from my local table the test passed.

Is there anything you'd like me to do? Some migration, a startup check or similar to make things fail early?
No offense, but still: one of the points of this PR is to avoid such rogue setting names in the database going forward... 😉

deployment in Jenkins is failing.

Are we talking about this pipeline or that one? The later is expected to fail, as an update to dataverse-ansible is necessary. Don setup a distinct job that already has the patch applied for this PR, see the first link.

@sekmiller
Copy link
Contributor

@poikilotherm Yes, that's my concern that a real installation running an upgrade will fail because of a rogue setting name. Is there anything we can do about that? (and that the failure of the test didn't give me anything in the response to help me fix it - I had to find it for myself)

@sekmiller
Copy link
Contributor

It's the continuous integrations PR merge that is failing the Jenkins build - https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-11654/41/display/redirect

@pdurbin
Copy link
Member

pdurbin commented Nov 14, 2025

@sekmiller this is the one to look at for this PR: https://jenkins.dataverse.org/job/IQSS-Dataverse-11639/

@donsizemore set it up special for @poikilotherm

All tests are passing: https://jenkins.dataverse.org/job/IQSS-Dataverse-11639/30/testReport/

You can safely ignore https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-11654/41/display/redirect and https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-11654/ more generally, which will always fail.

We should definitely keep an eye on https://jenkins.dataverse.org/job/IQSS-dataverse-develop/ after merging but it should "just work" (famous last words).

…ter migrations #11654

Introduced a Flyway callback to clean up entries in the `setting` table with unknown keys post-migration. Updated `StartupFlywayMigrator` to register this callback.
@poikilotherm
Copy link
Contributor Author

poikilotherm commented Nov 14, 2025

@sekmiller I added a Java based Flyway Callback in 886a9cf

This will take all the settings defined in the SettingsServiceBean.Key enum and check the settings in the database against it. Any invalid settings will be automatically purged from the database on every deployment.

This means for the future that we can drop settings from the enum and they will be cleaned up. I went this way because you can't delete an invalid setting using the API...

Let me know if you think this'll work. Good catch finding this edge case - didn't think about it upfront!

@qqmyers
Copy link
Member

qqmyers commented Nov 14, 2025

FWIW: There are settings like

private static final String GOOGLECLOUD_BUCKET = ":GoogleCloudBucket";
private static final String GOOGLECLOUD_PROJECT = ":GoogleCloudProject";
not in the SettingsServiceBean - the original idea being that things like archivers would eventually be packaged separately and manage their own settings.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@poikilotherm
Copy link
Contributor Author

FWIW: There are settings like

private static final String GOOGLECLOUD_BUCKET = ":GoogleCloudBucket";
private static final String GOOGLECLOUD_PROJECT = ":GoogleCloudProject";
not in the SettingsServiceBean - the original idea being that things like archivers would eventually be packaged separately and manage their own settings.

For now, we added these settings to the enum. People should be safe.

If we extend the mechanism to discoverable settings, we can extend the Java based Flyway Callback, too. (Make it more selective in cleaning up or make them all available during cleanup etc)

@github-actions

This comment has been minimized.

1 similar comment
@github-actions
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11639-db-opts-idempotency
ghcr.io/gdcc/configbaker:11639-db-opts-idempotency

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@sekmiller sekmiller merged commit 0b69f23 into develop Nov 20, 2025
24 of 26 checks passed
@github-project-automation github-project-automation bot moved this from QA ✅ to Merged 🚀 in IQSS Dataverse Project Nov 20, 2025
@scolapasta scolapasta moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Nov 20, 2025
@poikilotherm
Copy link
Contributor Author

poikilotherm commented Nov 20, 2025

Thanks y'all @sekmiller @donsizemore @pdurbin @ofahimIQSS for bearing with me and your thorough work on this one! Glad it got merged! 😎🤗🤓🍻

@landreev
Copy link
Contributor

Unfortunately, the flyway script V6.8.0.1.sql from this PR failed on my personal instance.
Specifically:

at org.flywaydb.core.internal.command.DbMigrate.doMigrateGroup(DbMigrate.java:391)
        ... 112 more
Caused by: org.postgresql.util.PSQLException: ERROR: there is no unique or exclusion constraint matching the ON CONFLICT
 specification
  Where: SQL statement "INSERT INTO setting (name, content, lang)
        VALUES (':TabularIngestSizeLimit', json_object::TEXT, NULL)
        ON CONFLICT (name) WHERE lang IS NULL
            DO UPDATE SET content = EXCLUDED.content"
PL/pgSQL function inline_code_block line 65 at SQL statement

I had both :TabularIngestSizeLimit and :TabularIngestSizeLimit:xlsx in my settings. It worked once I deleted the latter.

@poikilotherm poikilotherm deleted the 11639-db-opts-idempotency branch November 23, 2025 08:48
@landreev
Copy link
Contributor

@poikilotherm @sekmiller
This really needs to be fixed. I'm sorry to be pushy, but we are very close to the 6.9 code freeze at this point.

Unfortunately, the flyway script V6.8.0.1.sql from this PR failed on my personal instance. Specifically:

at org.flywaydb.core.internal.command.DbMigrate.doMigrateGroup(DbMigrate.java:391)
        ... 112 more
Caused by: org.postgresql.util.PSQLException: ERROR: there is no unique or exclusion constraint matching the ON CONFLICT
 specification
  Where: SQL statement "INSERT INTO setting (name, content, lang)
        VALUES (':TabularIngestSizeLimit', json_object::TEXT, NULL)
        ON CONFLICT (name) WHERE lang IS NULL
            DO UPDATE SET content = EXCLUDED.content"
PL/pgSQL function inline_code_block line 65 at SQL statement

I had both :TabularIngestSizeLimit and :TabularIngestSizeLimit:xlsx in my settings. It worked once I deleted the latter.

poikilotherm added a commit that referenced this pull request Nov 24, 2025
…ration`

Introduced comprehensive tests for `V6_8_0_1__SettingsDataMigration` using Testcontainers and DBUnit. Includes scenarios for migrating settings to new formats, handling null and invalid values, and verifying JSON transformations.

This is a reproducer for an issue discovered after merging PR #11654 by @landreev. See also #11654 (comment)
@poikilotherm
Copy link
Contributor Author

poikilotherm commented Nov 25, 2025

@landreev I have a reproducer going over here: #12002
Will continue to fix this. Sorry we didn't catch this earlier. Absolutely one of the reasons why I want to introduce migration testing with JUnit, DBunit and Testcontainers (see PR). Happy to talk about this during tech hours, too.

@landreev
Copy link
Contributor

@poikilotherm
Great, glad to hear this is on your radar.
The script is definitely failing on the copy of our prod. db as well. Our own prod. instance is not my concern, since I can simply convert these settings by hand if I need to. But it would be a problem for every other installation that has format-specific ingest limits.

@landreev I have a reproducer going over here: #12002 Will continue to fix this. Sorry we didn't catch this earlier. Absolutely one of the reasons why I want to introduce migration testing with JUnit, DBunit and Testcontainers (see PR). Happy to talk about this during tech hours, too.

poikilotherm added a commit that referenced this pull request Nov 27, 2025
… simplify access methods #11996

Replaced hard-coded workflow keys with structured enum-based keys in `TriggerType`. Updated `WorkflowServiceBean` and `SettingsServiceBean` to use consistent key resolution methods, improving readability and maintainability. Updated related database migration script to align with new key naming schema.

This adds the missing keys after introduction of naming restrictions in #11654. The config keys for the default workflows will no longer be cleansed from the database during deployment.
@pdurbin pdurbin mentioned this pull request Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Code Infrastructure formerly "Feature: Code Infrastructure" Component: Containers Anything related to cloudy Dataverse, shipped in containers. Component: JSF Involves modifying JSF (Jakarta Server Faces) code, which is being replaced with React. Feature: Installation Guide FY26 Sprint 3 (2025-07-30 - 2025-08-13) FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) FY26 Sprint 5 FY26 Sprint 5 (2025-08-27 - 2025-09-10) FY26 Sprint 6 FY26 Sprint 6 (2025-09-10 - 2025-09-24) FY26 Sprint 7 FY26 Sprint 7 (2025-09-24 - 2025-10-08) FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) FY26 Sprint 9 FY26 Sprint 9 (2025-10-22 - 2025-11-05) FY26 Sprint 10 FY26 Sprint 10 (2025-11-05 - 2025-11-19) Size: 3 A percentage of a sprint. 2.1 hours. Type: Feature a feature request User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh

Projects

Status: Important
Status: Done 🧹

Development

Successfully merging this pull request may close these issues.

Feature Request: refactor and cleanup Settings Admin API to allow idempotent updates

10 participants