From f55353a39f1bcd0e147a81d2e7f38951e66df7de Mon Sep 17 00:00:00 2001 From: "t.jansen" Date: Tue, 16 Dec 2025 17:08:54 +0100 Subject: [PATCH 1/2] Feature Flags table split up to subsections for improved navigation on the page. #12005 --- .../source/installation/config.rst | 180 +++++++++++------- 1 file changed, 109 insertions(+), 71 deletions(-) diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index dc2f389a7f3..bcfbb5db971 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -3842,83 +3842,121 @@ Certain features might be deactivated because they are experimental and/or opt-i please find all known feature flags below. Any of these flags can be activated using a boolean value (case-insensitive, one of "true", "1", "YES", "Y", "ON") for the setting. -.. list-table:: - :widths: 35 50 15 - :header-rows: 1 - :align: left - - * - Flag Name - - Description - - Default status - * - api-session-auth - - Enables API authentication via session cookie (JSESSIONID). **Caution: Enabling this feature flag exposes the installation to CSRF risks!** We expect this feature flag to be temporary (only used by frontend developers, see `#9063 `_) and for the feature to be removed in the future. - - ``Off`` - * - api-bearer-auth - - Enables API authentication via Bearer Token. - - ``Off`` - * - api-bearer-auth-provide-missing-claims - - Enables sending missing user claims in the request JSON provided during OIDC user registration, when these claims are not returned by the identity provider and are required for registration. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this feature flag exposes the installation to potential user impersonation issues.** - - ``Off`` - * - api-bearer-auth-handle-tos-acceptance-in-idp - - Specifies that Terms of Service acceptance is handled by the IdP, eliminating the need to include ToS acceptance boolean parameter (termsAccepted) in the OIDC user registration request body. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. - - ``Off`` - * - api-bearer-auth-use-builtin-user-on-id-match - - Allows the use of a built-in user account when an identity match is found during API bearer authentication. This feature enables automatic association of an incoming IdP identity with an existing built-in user account, bypassing the need for additional user registration steps. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this flag could result in impersonation risks if (and only if) used with a misconfigured IdP.** - - ``Off`` - * - api-bearer-auth-use-shib-user-on-id-match - - Allows the use of a Shibboleth user account when an identity match is found during API bearer authentication. This feature enables automatic association of an incoming IdP identity with an existing Shibboleth user account, bypassing the need for additional user registration steps. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this flag could result in impersonation risks if (and only if) used with a misconfigured IdP.** - - ``Off`` - * - api-bearer-auth-use-oauth-user-on-id-match - - Allows the use of an OAuth user account (GitHub, Google, or ORCID) when an identity match is found during API bearer authentication. This feature enables automatic association of an incoming IdP identity with an existing OAuth user account, bypassing the need for additional user registration steps. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this flag could result in impersonation risks if (and only if) used with a misconfigured IdP.** - - ``Off`` - * - avoid-expensive-solr-join - - Changes the way Solr queries are constructed for public content (published Collections, Datasets and Files). It removes a very expensive Solr join on all such documents, improving overall performance, especially for large instances under heavy load. Before this feature flag is enabled, the corresponding indexing feature (see next feature flag) must be turned on and a full reindex performed (otherwise public objects are not going to be shown in search results). See :doc:`/admin/solr-search-index`. - - ``Off`` - * - add-publicobject-solr-field - - Adds an extra boolean field `PublicObject_b:true` for public content (published Collections, Datasets and Files). Once reindexed with these fields, we can rely on it to remove a very expensive Solr join on all such documents in Solr queries, significantly improving overall performance (by enabling the feature flag above, `avoid-expensive-solr-join`). These two flags are separate so that an instance can reindex their holdings before enabling the optimization in searches, thus avoiding having their public objects temporarily disappear from search results while the reindexing is in progress. - - ``Off`` - * - reduce-solr-deletes - - Avoids deleting and recreating solr documents for dataset files when reindexing. - - ``Off`` - * - disable-return-to-author-reason - - Removes the reason field in the `Publish/Return To Author` dialog that was added as a required field in v6.2 and makes the reason an optional parameter in the :ref:`return-a-dataset` API call. - - ``Off`` - * - disable-dataset-thumbnail-autoselect - - Turns off automatic selection of a dataset thumbnail from image files in that dataset. When set to ``On``, a user can still manually pick a thumbnail image or upload a dedicated thumbnail image. - - ``Off`` - * - globus-use-experimental-async-framework - - Activates a new experimental implementation of Globus polling of ongoing remote data transfers that does not rely on the instance staying up continuously for the duration of the transfers and saves the state information about Globus upload requests in the database. Added in v6.4; extended in v6.6 to cover download transfers, in addition to uploads. Affects :ref:`:GlobusPollingInterval`. Note that the JVM option :ref:`dataverse.files.globus-monitoring-server` described above must also be enabled on one (and only one, in a multi-node installation) Dataverse instance. - - ``Off`` - * - index-harvested-metadata-source - - Index the nickname or the source name (See the optional ``sourceName`` field in :ref:`create-a-harvesting-client`) of the harvesting client as the "metadata source" of harvested datasets and files. If enabled, the Metadata Source facet will show separate groupings of the content harvested from different sources (by harvesting client nickname or source name) instead of the default behavior where there is one "Harvested" grouping for all harvested content. - - ``Off`` - * - enable-version-note - - Turns on the ability to add/view/edit/delete per-dataset-version notes intended to provide :ref:`provenance` information about why the dataset/version was created. - - ``Off`` - * - shibboleth-use-wayfinder - - This flag allows an instance to use Shibboleth with InCommon federation services. Our original Shibboleth implementation that relies on DiscoFeed can no longer be used since InCommon discontinued their old-style metadata feed. An alternative mechanism had to be implemented in order to use WayFinder service, their recommended replacements, instead. - - ``Off`` - * - shibboleth-use-localhost - - A Shibboleth-using Dataverse instance needs to make network calls to the locally-running ``shibd`` service. The default behavior is to use the address configured via the ``siteUrl`` setting. There are however situations (firewalls, etc.) where localhost would be preferable. - - ``Off`` - * - add-local-contexts-permission-check - - Adds a permission check to ensure that the user calling the /api/localcontexts/datasets/{id} API can edit the dataset with that id. This is currently the only use case - see https://github.com/gdcc/dataverse-external-vocab-support/tree/main/packages/local_contexts. The flag adds additional security to stop other uses, but would currently have to be used in conjunction with the api-session-auth feature flag (the security implications of which have not been fully investigated) to still allow adding Local Contexts metadata to a dataset. - - ``Off`` - * - enable-pid-failure-log - - Turns on creation of a monthly log file (logs/PIDFailures_.log) showing failed requests for dataset/file PIDs. Can be used directly or with scripts at https://github.com/gdcc/dataverse-recipes/python/pid_reports to alert admins. - - ``Off`` - * - role-assignment-history - - Turns on tracking/display of role assignments and revocations for collections, datasets, and files - - ``Off`` - * - only-update-datacite-when-needed - - Only contact DataCite to update a DOI after checking to see if DataCite has outdated information (for efficiency, lighter load on DataCite, especially when using file DOIs). - - ``Off`` +The default status, as long there is not any other information, is Off. **Note:** Feature flags can be set via any `supported MicroProfile Config API source`_, e.g. the environment variable ``DATAVERSE_FEATURE_XXX`` (e.g. ``DATAVERSE_FEATURE_API_SESSION_AUTH=1``). These environment variables can be set in your shell before starting Payara. If you are using :doc:`Docker for development `, you can set them in the `docker compose `_ file. To check the status of feature flags via API, see :ref:`list-all-feature-flags` in the API Guide. +dataverse.feature.api-session-auth +++++++++++++++++++++++++++++++++++ + +Enables API authentication via session cookie (JSESSIONID). **Caution: Enabling this feature flag exposes the installation to CSRF risks!** We expect this feature flag to be temporary (only used by frontend developers, see `#9063 `_) and for the feature to be removed in the future. + +dataverse.feature.api-bearer-auth ++++++++++++++++++++++++++++++++++ + +Enables API authentication via Bearer Token. + +dataverse.feature.api-bearer-auth-provide-missing-claims +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Enables sending missing user claims in the request JSON provided during OIDC user registration, when these claims are not returned by the identity provider and are required for registration. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this feature flag exposes the installation to potential user impersonation issues.** + +dataverse.feature.api-bearer-auth-handle-tos-acceptance-in-idp +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Specifies that Terms of Service acceptance is handled by the IdP, eliminating the need to include ToS acceptance boolean parameter (termsAccepted) in the OIDC user registration request body. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. + +dataverse.feature.api-bearer-auth-use-builtin-user-on-id-match +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Allows the use of a built-in user account when an identity match is found during API bearer authentication. This feature enables automatic association of an incoming IdP identity with an existing built-in user account, bypassing the need for additional user registration steps. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this flag could result in impersonation risks if (and only if) used with a misconfigured IdP.** + +dataverse.feature.api-bearer-auth-use-shib-user-on-id-match ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Allows the use of a Shibboleth user account when an identity match is found during API bearer authentication. This feature enables automatic association of an incoming IdP identity with an existing Shibboleth user account, bypassing the need for additional user registration steps. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this flag could result in impersonation risks if (and only if) used with a misconfigured IdP.** + +dataverse.feature.api-bearer-auth-use-oauth-user-on-id-match +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Allows the use of an OAuth user account (GitHub, Google, or ORCID) when an identity match is found during API bearer authentication. This feature enables automatic association of an incoming IdP identity with an existing OAuth user account, bypassing the need for additional user registration steps. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this flag could result in impersonation risks if (and only if) used with a misconfigured IdP.** + +dataverse.feature.avoid-expensive-solr-join ++++++++++++++++++++++++++++++++++++++++++++ + +Changes the way Solr queries are constructed for public content (published Collections, Datasets and Files). It removes a very expensive Solr join on all such documents, improving overall performance, especially for large instances under heavy load. Before this feature flag is enabled, the corresponding indexing feature (see next feature flag) must be turned on and a full reindex performed (otherwise public objects are not going to be shown in search results). See :doc:`/admin/solr-search-index`. + +dataverse.feature.add-publicobject-solr-field ++++++++++++++++++++++++++++++++++++++++++++++ + +Adds an extra boolean field `PublicObject_b:true` for public content (published Collections, Datasets and Files). Once reindexed with these fields, we can rely on it to remove a very expensive Solr join on all such documents in Solr queries, significantly improving overall performance (by enabling the feature flag above, `avoid-expensive-solr-join`). These two flags are separate so that an instance can reindex their holdings before enabling the optimization in searches, thus avoiding having their public objects temporarily disappear from search results while the reindexing is in progress. + +dataverse.feature.reduce-solr-deletes ++++++++++++++++++++++++++++++++++++++ + +Avoids deleting and recreating solr documents for dataset files when reindexing. + +dataverse.feature.disable-return-to-author-reason ++++++++++++++++++++++++++++++++++++++++++++++++++ + +Removes the reason field in the `Publish/Return To Author` dialog that was added as a required field in v6.2 and makes the reason an optional parameter in the :ref:`return-a-dataset` API call. + +dataverse.feature.disable-dataset-thumbnail-autoselect +++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Turns off automatic selection of a dataset thumbnail from image files in that dataset. When set to ``On``, a user can still manually pick a thumbnail image or upload a dedicated thumbnail image. + +dataverse.feature.globus-use-experimental-async-framework ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Activates a new experimental implementation of Globus polling of ongoing remote data transfers that does not rely on the instance staying up continuously for the duration of the transfers and saves the state information about Globus upload requests in the database. Added in v6.4; extended in v6.6 to cover download transfers, in addition to uploads. Affects :ref:`:GlobusPollingInterval`. Note that the JVM option :ref:`dataverse.files.globus-monitoring-server` described above must also be enabled on one (and only one, in a multi-node installation) Dataverse instance. + +dataverse.feature.index-harvested-metadata-source ++++++++++++++++++++++++++++++++++++++++++++++++++ + +Index the nickname or the source name (See the optional ``sourceName`` field in :ref:`create-a-harvesting-client`) of the harvesting client as the "metadata source" of harvested datasets and files. If enabled, the Metadata Source facet will show separate groupings of the content harvested from different sources (by harvesting client nickname or source name) instead of the default behavior where there is one "Harvested" grouping for all harvested content. + +dataverse.feature.enable-version-note ++++++++++++++++++++++++++++++++++++++ + +Turns on the ability to add/view/edit/delete per-dataset-version notes intended to provide :ref:`provenance` information about why the dataset/version was created. + +dataverse.feature.shibboleth-use-wayfinder +++++++++++++++++++++++++++++++++++++++++++ + +This flag allows an instance to use Shibboleth with InCommon federation services. Our original Shibboleth implementation that relies on DiscoFeed can no longer be used since InCommon discontinued their old-style metadata feed. An alternative mechanism had to be implemented in order to use WayFinder service, their recommended replacements, instead. + +dataverse.feature.shibboleth-use-localhost +++++++++++++++++++++++++++++++++++++++++++ + +A Shibboleth-using Dataverse instance needs to make network calls to the locally-running ``shibd`` service. The default behavior is to use the address configured via the ``siteUrl`` setting. There are however situations (firewalls, etc.) where localhost would be preferable. + +dataverse.feature.add-local-contexts-permission-check ++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +Adds a permission check to ensure that the user calling the /api/localcontexts/datasets/{id} API can edit the dataset with that id. This is currently the only use case - see https://github.com/gdcc/dataverse-external-vocab-support/tree/main/packages/local_contexts. The flag adds additional security to stop other uses, but would currently have to be used in conjunction with the api-session-auth feature flag (the security implications of which have not been fully investigated) to still allow adding Local Contexts metadata to a dataset. + +dataverse.feature.enable-pid-failure-log +++++++++++++++++++++++++++++++++++++++++ + +Turns on creation of a monthly log file (logs/PIDFailures_.log) showing failed requests for dataset/file PIDs. Can be used directly or with scripts at https://github.com/gdcc/dataverse-recipes/python/pid_reports to alert admins. + +dataverse.feature.role-assignment-history ++++++++++++++++++++++++++++++++++++++++++ + +Turns on tracking/display of role assignments and revocations for collections, datasets, and files + +dataverse.feature.only-update-datacite-when-needed +++++++++++++++++++++++++++++++++++++++++++++++++++ + +Only contact DataCite to update a DOI after checking to see if DataCite has outdated information (for efficiency, lighter load on DataCite, especially when using file DOIs). + + + + .. _:ApplicationServerSettings: Application Server Settings From 49bd83a685730b67647484089df42d17c2ca1f53 Mon Sep 17 00:00:00 2001 From: Philip Durbin Date: Thu, 8 Jan 2026 14:01:19 -0500 Subject: [PATCH 2/2] add refs (and use them) for feature flags, other minor tweaks #12005 --- .../source/admin/big-data-administration.rst | 12 ++--- doc/sphinx-guides/source/api/native-api.rst | 10 ++-- .../source/developers/big-data-support.rst | 2 +- .../source/developers/configuration.rst | 3 +- .../source/developers/globus-api.rst | 2 +- .../source/developers/performance.rst | 4 +- .../source/installation/config.rst | 46 ++++++++++++++++++- .../source/installation/shibboleth.rst | 2 +- 8 files changed, 61 insertions(+), 20 deletions(-) diff --git a/doc/sphinx-guides/source/admin/big-data-administration.rst b/doc/sphinx-guides/source/admin/big-data-administration.rst index b3c7e79c382..c4a98a6987a 100644 --- a/doc/sphinx-guides/source/admin/big-data-administration.rst +++ b/doc/sphinx-guides/source/admin/big-data-administration.rst @@ -206,7 +206,7 @@ Challenges: Users will need to be made aware of these limitations and the possibilities for managing them (e.g. by aggregating multiple files in a single, larger file, or storing smaller files in the base-store via the normal Dataverse upload UI). - There is currently `a bug `_ that won't allow users to transfer files from/to endpoints where they do not have permission to list the overall file tree (i.e. an institution manages /institution_name but the user only has access to /institution_name/my_dir.) Until that is fixed, a work-around is to first transfer data to an endpoint without this restriction. -- An alternative, experimental implementation of Globus polling of ongoing upload transfers was added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. While it is now the recommended option, it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`. +- An alternative, experimental implementation of Globus polling of ongoing upload transfers was added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. While it is now the recommended option, it is not enabled by default. See the :ref:`dataverse.feature.globus-use-experimental-async-framework` feature flag and the JVM option :ref:`dataverse.files.globus-monitoring-server`. More details of the setup required to enable Globus is described in the `Community Dataverse-Globus Setup and Configuration document `_ and the references therein. @@ -280,11 +280,11 @@ Scaling-related Configuration There are a broad range of options (that are not turned on by default) for improving how well Solr indexing and searching scales and for handling more files per dataset. Some of these are useful for all installations while others are related to specific use cases, or are mostly for emergency use (e.g. disabling facets). (see :ref:`database-settings`, :ref:`jvm-options`, and :ref:`feature-flags` for more details): -- dataverse.feature.add-publicobject-solr-field=true - specifically marks unrestricted content as public in Solr. See :ref:`feature-flags`. -- dataverse.feature.avoid-expensive-solr-join=true - this tells Dataverse to use the feature above to speed up searches. See :ref:`feature-flags`. -- dataverse.feature.reduce-solr-deletes=true - when Solr entries are being updated, this avoids an unnecessary step (deletion of existing entries) for entries that are being replaced. See :ref:`feature-flags`. -- dataverse.feature.disable-dataset-thumbnail-autoselect=true - by default, Dataverse scans through all files in a dataset to find one that can be used as a thumbnail, which is expensive for many files. This disables that behavior to improve performance. See :ref:`feature-flags`. -- dataverse.feature.only-update-datacite-when-needed=true - reduces the load on DataCite and reduces Dataverse failures related to that load, which is important when using file PIDs on Datasets with many files. See :ref:`feature-flags`. +- :ref:`dataverse.feature.add-publicobject-solr-field` =true - specifically marks unrestricted content as public in Solr. +- :ref:`dataverse.feature.avoid-expensive-solr-join` =true - this tells Dataverse to use the feature above to speed up searches. +- :ref:`dataverse.feature.reduce-solr-deletes` =true - when Solr entries are being updated, this avoids an unnecessary step (deletion of existing entries) for entries that are being replaced. +- :ref:`dataverse.feature.disable-dataset-thumbnail-autoselect` =true - by default, Dataverse scans through all files in a dataset to find one that can be used as a thumbnail, which is expensive for many files. This disables that behavior to improve performance. +- :ref:`dataverse.feature.only-update-datacite-when-needed` =true - reduces the load on DataCite and reduces Dataverse failures related to that load, which is important when using file PIDs on Datasets with many files. - :ref:`dataverse.solr.min-files-to-use-proxy` = - improve performance/lower memory requirements when indexing datasets with many files, suggested value is in the range 200 to 500 - :ref:`dataverse.solr.concurrency.max-async-indexes` = - limits the number of index operations running in parallel. The default is 4, larger values may improve performance (if the Solr instance is appropriately sized) - :ref:`:SolrFullTextIndexing` - false improves performance at the expense of not indexing file contents diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index 790df985c1b..07d23611635 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -1709,7 +1709,7 @@ The fully expanded example above (without environment variables) looks like this The CSV response has column headers mirroring the JSON entries. They are internationalized (when internationalization is configured). -Note: This feature requires the "role-assignment-history" feature flag to be enabled (see :ref:`feature-flags`). +Note: This feature requires the :ref:`dataverse.feature.role-assignment-history` feature flag to be enabled. Datasets -------- @@ -3305,7 +3305,7 @@ The fully expanded example above (without environment variables) looks like this curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/datasets/:persistentId/returnToAuthor?persistentId=doi:10.5072/FK2/J8SJZB" -H "Content-type: application/json" -d @reason-for-return.json The review process can sometimes resemble a tennis match, with the authors submitting and resubmitting the dataset over and over until the curators are satisfied. Each time the curators send a "reason for return" via API, that reason is sent by email and is persisted into the database, stored at the dataset version level. -Note the reason is required, unless the `disable-return-to-author-reason` feature flag has been set (see :ref:`feature-flags`). Reason is a free text field and could be as simple as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation. +Note the reason is required, unless the :ref:`dataverse.feature.disable-return-to-author-reason` feature flag has been set. Reason is a free text field and could be as simple as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation. The :ref:`send-feedback-admin` Admin only API call may be useful as a way to move the conversation to email. However, note that these emails go to contacts (versus authors) and there is no database record of the email contents. (:ref:`dataverse.mail.cc-support-on-contact-email` will send a copy of these emails to the support email address which would provide a record.) The :ref:`send-feedback` API call may be useful as a way to move the conversation to email. However, note that these emails go to contacts (versus authors) and there is no database record of the email contents. (:ref:`dataverse.mail.cc-support-on-contact-email` will send a copy of these emails to the support email address which would provide a record.) @@ -4454,7 +4454,7 @@ The fully expanded example above (without environment variables) looks like this The CSV response has column headers mirroring the JSON entries. They are internationalized (when internationalization is configured). -Note: This feature requires the "role-assignment-history" feature flag to be enabled (see :ref:`feature-flags`). +Note: This feature requires the :ref:`dataverse.feature.role-assignment-history` feature flag to be enabled. Dataset Files Role Assignment History ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -4511,7 +4511,7 @@ The fully expanded example above (without environment variables) looks like this The CSV response for this call is the same as for the /api/datasets/{id}/assignments/history call above with the exception that definedOn will be a comma separated list of one or more file ids. -Note: This feature requires the "role-assignment-history" feature flag to be enabled (see :ref:`feature-flags`). +Note: This feature requires the :ref:`dataverse.feature.role-assignment-history` feature flag to be enabled. Update Dataset License ~~~~~~~~~~~~~~~~~~~~~~ @@ -7009,7 +7009,7 @@ To create a harvesting client you must supply a JSON file that describes the con The following optional fields are supported: -- ``sourceName``: When ``index-harvested-metadata-source`` is enabled (see :ref:`feature-flags`), sourceName will override the nickname in the Metadata Source facet. It can be used to group the content from many harvesting clients under the same name. +- ``sourceName``: When the :ref:`dataverse.feature.index-harvested-metadata-source` feature flag is enabled, sourceName will override the nickname in the Metadata Source facet. It can be used to group the content from many harvesting clients under the same name. - ``archiveDescription``: What the name suggests. If not supplied, will default to "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data." - ``set``: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything". (Note: see the note below on using sets when harvesting from DataCite; this is new as of v6.6). - ``style``: Defaults to "default" - a generic OAI archive. (Make sure to use "dataverse" when configuring harvesting from another Dataverse installation). diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index b2724ce01e3..7077fdfcd19 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -198,4 +198,4 @@ An overview of the control and data transfer interactions between components was See also :ref:`Globus settings <:GlobusSettings>` and :ref:`globus-stores`. -An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`. +An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the :ref:`dataverse.feature.globus-use-experimental-async-framework` feature flag and the JVM option :ref:`dataverse.files.globus-monitoring-server`. diff --git a/doc/sphinx-guides/source/developers/configuration.rst b/doc/sphinx-guides/source/developers/configuration.rst index d342c28efc6..8b6ea9ae143 100644 --- a/doc/sphinx-guides/source/developers/configuration.rst +++ b/doc/sphinx-guides/source/developers/configuration.rst @@ -122,5 +122,4 @@ convenient usage of it anywhere in the codebase. When adding a flag, please add status, add some Javadocs about the flagged feature and add a ``@since`` tag to make it easier to identify when a flag has been introduced. -We want to maintain a list of all :ref:`feature flags ` in the :ref:`configuration guide `, -please add yours to the list. \ No newline at end of file +Add the feature flag to the list at :ref:`feature flags `. \ No newline at end of file diff --git a/doc/sphinx-guides/source/developers/globus-api.rst b/doc/sphinx-guides/source/developers/globus-api.rst index 43c237546be..eb0eb465315 100644 --- a/doc/sphinx-guides/source/developers/globus-api.rst +++ b/doc/sphinx-guides/source/developers/globus-api.rst @@ -185,7 +185,7 @@ As the transfer can take significant time and the API call is asynchronous, the Once the transfer completes, Dataverse will remove the write permission for the principal. -An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This new framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`) and the JVM option :ref:`dataverse.files.globus-monitoring-server`. +An alternative, experimental implementation of Globus polling of ongoing upload transfers has been added in v6.4. This new framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. Due to its experimental nature it is not enabled by default. See the :ref:`dataverse.feature.globus-use-experimental-async-framework` feature flag and the JVM option :ref:`dataverse.files.globus-monitoring-server`. Note that when using a managed endpoint that uses the Globus S3 Connector, the checksum should be correct as Dataverse can validate it. For file-based endpoints, the checksum should be included if available but Dataverse cannot verify it. diff --git a/doc/sphinx-guides/source/developers/performance.rst b/doc/sphinx-guides/source/developers/performance.rst index 6c864bec257..4a0d5bf0bf2 100644 --- a/doc/sphinx-guides/source/developers/performance.rst +++ b/doc/sphinx-guides/source/developers/performance.rst @@ -120,8 +120,8 @@ While in the past Solr performance hasn't been much of a concern, in recent year We are tracking performance problems in `#10469 `_. -In a meeting with a Solr expert on 2024-05-10 we were advised to avoid joins as much as possible. (It was acknowledged that many Solr users make use of joins because they have to, like we do, to keep some documents private.) Toward that end we have added two feature flags called ``avoid-expensive-solr-join`` and ``add-publicobject-solr-field`` as explained under :ref:`feature-flags`. It was confirmed experimentally that performing the join on all the public objects (published collections, datasets and files), i.e., the bulk of the content in the search index, was indeed very expensive, especially on a large instance the size of the IQSS prod. archive, especially under indexing load. We confirmed that it was in fact unnecessary and were able to replace it with a boolean field directly in the indexed documents, which is achieved by the two feature flags above. However, as of writing this, this mechanism should still be considered experimental. -Another flag, ``reduce-solr-deletes``, avoids deleting solr documents for files in a dataset prior to sending updates. It also eliminates several causes of orphan permission documents. This is expected to improve indexing performance to some extent and is a step towards avoiding unnecessary updates (i.e. when a doc would not change). +In a meeting with a Solr expert on 2024-05-10 we were advised to avoid joins as much as possible. (It was acknowledged that many Solr users make use of joins because they have to, like we do, to keep some documents private.) Toward that end we have added two feature flags called :ref:`dataverse.feature.avoid-expensive-solr-join` and :ref:`dataverse.feature.add-publicobject-solr-field`. It was confirmed experimentally that performing the join on all the public objects (published collections, datasets and files), i.e., the bulk of the content in the search index, was indeed very expensive, especially on a large instance the size of the IQSS prod. archive, especially under indexing load. We confirmed that it was in fact unnecessary and were able to replace it with a boolean field directly in the indexed documents, which is achieved by the two feature flags above. However, as of writing this, this mechanism should still be considered experimental. +Another feature flag, :ref:`dataverse.feature.reduce-solr-deletes`, avoids deleting Solr documents for files in a dataset prior to sending updates. It also eliminates several causes of orphan permission documents. This is expected to improve indexing performance to some extent and is a step towards avoiding unnecessary updates (i.e. when a doc would not change). Datasets with Large Numbers of Files or Versions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 70b9c2f21de..c70611588da 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -830,7 +830,7 @@ Bearer tokens are defined in `RFC 6750`_ and can be used as an alternative to AP .. _RFC 6750: https://tools.ietf.org/html/rfc6750 -To enable bearer tokens, you must install and configure Keycloak (for now, see :ref:`oidc-dev` in the Developer Guide) and enable ``api-bearer-auth`` under :ref:`feature-flags`. +To enable bearer tokens, you must install and configure Keycloak (for now, see :ref:`oidc-dev` in the Developer Guide) and enable the :ref:`dataverse.feature.api-bearer-auth` feature flag. You can test that bearer tokens are working by following the example under :ref:`bearer-tokens` in the API Guide. @@ -3710,7 +3710,7 @@ Can also be set via *MicroProfile Config API* sources, e.g. the environment vari dataverse.files.globus-monitoring-server ++++++++++++++++++++++++++++++++++++++++ -This setting is required in conjunction with the ``globus-use-experimental-async-framework`` feature flag (see :ref:`feature-flags`). Setting it to true designates the Dataverse instance to serve as the dedicated polling server. It is needed so that the new framework can be used in a multi-node installation. +This setting is required in conjunction with the :ref:`dataverse.feature.globus-use-experimental-async-framework` feature flag. Setting it to true designates the Dataverse instance to serve as the dedicated polling server. It is needed so that the new framework can be used in a multi-node installation. .. _dataverse.csl.common-styles: @@ -3874,106 +3874,148 @@ The default status, as long there is not any other information, is Off. To check the status of feature flags via API, see :ref:`list-all-feature-flags` in the API Guide. +.. _dataverse.feature.api-session-auth: + dataverse.feature.api-session-auth ++++++++++++++++++++++++++++++++++ Enables API authentication via session cookie (JSESSIONID). **Caution: Enabling this feature flag exposes the installation to CSRF risks!** We expect this feature flag to be temporary (only used by frontend developers, see `#9063 `_) and for the feature to be removed in the future. +.. _dataverse.feature.api-bearer-auth: + dataverse.feature.api-bearer-auth +++++++++++++++++++++++++++++++++ Enables API authentication via Bearer Token. +.. _dataverse.feature.api-bearer-auth-provide-missing-claims: + dataverse.feature.api-bearer-auth-provide-missing-claims ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Enables sending missing user claims in the request JSON provided during OIDC user registration, when these claims are not returned by the identity provider and are required for registration. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this feature flag exposes the installation to potential user impersonation issues.** +.. _dataverse.feature.api-bearer-auth-handle-tos-acceptance-in-idp: + dataverse.feature.api-bearer-auth-handle-tos-acceptance-in-idp ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Specifies that Terms of Service acceptance is handled by the IdP, eliminating the need to include ToS acceptance boolean parameter (termsAccepted) in the OIDC user registration request body. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. +.. _dataverse.feature.api-bearer-auth-use-builtin-user-on-id-match: + dataverse.feature.api-bearer-auth-use-builtin-user-on-id-match ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Allows the use of a built-in user account when an identity match is found during API bearer authentication. This feature enables automatic association of an incoming IdP identity with an existing built-in user account, bypassing the need for additional user registration steps. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this flag could result in impersonation risks if (and only if) used with a misconfigured IdP.** +.. _dataverse.feature.api-bearer-auth-use-shib-user-on-id-match: + dataverse.feature.api-bearer-auth-use-shib-user-on-id-match +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Allows the use of a Shibboleth user account when an identity match is found during API bearer authentication. This feature enables automatic association of an incoming IdP identity with an existing Shibboleth user account, bypassing the need for additional user registration steps. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this flag could result in impersonation risks if (and only if) used with a misconfigured IdP.** +.. _dataverse.feature.api-bearer-auth-use-oauth-user-on-id-match: + dataverse.feature.api-bearer-auth-use-oauth-user-on-id-match ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Allows the use of an OAuth user account (GitHub, Google, or ORCID) when an identity match is found during API bearer authentication. This feature enables automatic association of an incoming IdP identity with an existing OAuth user account, bypassing the need for additional user registration steps. This feature only works when the feature flag ``api-bearer-auth`` is also enabled. **Caution: Enabling this flag could result in impersonation risks if (and only if) used with a misconfigured IdP.** +.. _dataverse.feature.avoid-expensive-solr-join: + dataverse.feature.avoid-expensive-solr-join +++++++++++++++++++++++++++++++++++++++++++ Changes the way Solr queries are constructed for public content (published Collections, Datasets and Files). It removes a very expensive Solr join on all such documents, improving overall performance, especially for large instances under heavy load. Before this feature flag is enabled, the corresponding indexing feature (see next feature flag) must be turned on and a full reindex performed (otherwise public objects are not going to be shown in search results). See :doc:`/admin/solr-search-index`. +.. _dataverse.feature.add-publicobject-solr-field: + dataverse.feature.add-publicobject-solr-field +++++++++++++++++++++++++++++++++++++++++++++ Adds an extra boolean field `PublicObject_b:true` for public content (published Collections, Datasets and Files). Once reindexed with these fields, we can rely on it to remove a very expensive Solr join on all such documents in Solr queries, significantly improving overall performance (by enabling the feature flag above, `avoid-expensive-solr-join`). These two flags are separate so that an instance can reindex their holdings before enabling the optimization in searches, thus avoiding having their public objects temporarily disappear from search results while the reindexing is in progress. +.. _dataverse.feature.reduce-solr-deletes: + dataverse.feature.reduce-solr-deletes +++++++++++++++++++++++++++++++++++++ Avoids deleting and recreating solr documents for dataset files when reindexing. +.. _dataverse.feature.disable-return-to-author-reason: + dataverse.feature.disable-return-to-author-reason +++++++++++++++++++++++++++++++++++++++++++++++++ Removes the reason field in the `Publish/Return To Author` dialog that was added as a required field in v6.2 and makes the reason an optional parameter in the :ref:`return-a-dataset` API call. +.. _dataverse.feature.disable-dataset-thumbnail-autoselect: + dataverse.feature.disable-dataset-thumbnail-autoselect ++++++++++++++++++++++++++++++++++++++++++++++++++++++ Turns off automatic selection of a dataset thumbnail from image files in that dataset. When set to ``On``, a user can still manually pick a thumbnail image or upload a dedicated thumbnail image. +.. _dataverse.feature.globus-use-experimental-async-framework: + dataverse.feature.globus-use-experimental-async-framework +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Activates a new experimental implementation of Globus polling of ongoing remote data transfers that does not rely on the instance staying up continuously for the duration of the transfers and saves the state information about Globus upload requests in the database. Added in v6.4; extended in v6.6 to cover download transfers, in addition to uploads. Affects :ref:`:GlobusPollingInterval`. Note that the JVM option :ref:`dataverse.files.globus-monitoring-server` described above must also be enabled on one (and only one, in a multi-node installation) Dataverse instance. +.. _dataverse.feature.index-harvested-metadata-source: + dataverse.feature.index-harvested-metadata-source +++++++++++++++++++++++++++++++++++++++++++++++++ Index the nickname or the source name (See the optional ``sourceName`` field in :ref:`create-a-harvesting-client`) of the harvesting client as the "metadata source" of harvested datasets and files. If enabled, the Metadata Source facet will show separate groupings of the content harvested from different sources (by harvesting client nickname or source name) instead of the default behavior where there is one "Harvested" grouping for all harvested content. +.. _dataverse.feature.enable-version-note: + dataverse.feature.enable-version-note +++++++++++++++++++++++++++++++++++++ Turns on the ability to add/view/edit/delete per-dataset-version notes intended to provide :ref:`provenance` information about why the dataset/version was created. +.. _dataverse.feature.shibboleth-use-wayfinder: + dataverse.feature.shibboleth-use-wayfinder ++++++++++++++++++++++++++++++++++++++++++ This flag allows an instance to use Shibboleth with InCommon federation services. Our original Shibboleth implementation that relies on DiscoFeed can no longer be used since InCommon discontinued their old-style metadata feed. An alternative mechanism had to be implemented in order to use WayFinder service, their recommended replacements, instead. +.. _dataverse.feature.shibboleth-use-localhost: + dataverse.feature.shibboleth-use-localhost ++++++++++++++++++++++++++++++++++++++++++ A Shibboleth-using Dataverse instance needs to make network calls to the locally-running ``shibd`` service. The default behavior is to use the address configured via the ``siteUrl`` setting. There are however situations (firewalls, etc.) where localhost would be preferable. +.. _dataverse.feature.add-local-contexts-permission-check: + dataverse.feature.add-local-contexts-permission-check +++++++++++++++++++++++++++++++++++++++++++++++++++++ Adds a permission check to ensure that the user calling the /api/localcontexts/datasets/{id} API can edit the dataset with that id. This is currently the only use case - see https://github.com/gdcc/dataverse-external-vocab-support/tree/main/packages/local_contexts. The flag adds additional security to stop other uses, but would currently have to be used in conjunction with the api-session-auth feature flag (the security implications of which have not been fully investigated) to still allow adding Local Contexts metadata to a dataset. +.. _dataverse.feature.enable-pid-failure-log: + dataverse.feature.enable-pid-failure-log ++++++++++++++++++++++++++++++++++++++++ Turns on creation of a monthly log file (logs/PIDFailures_.log) showing failed requests for dataset/file PIDs. Can be used directly or with scripts at https://github.com/gdcc/dataverse-recipes/python/pid_reports to alert admins. +.. _dataverse.feature.role-assignment-history: + dataverse.feature.role-assignment-history +++++++++++++++++++++++++++++++++++++++++ Turns on tracking/display of role assignments and revocations for collections, datasets, and files +.. _dataverse.feature.only-update-datacite-when-needed: + dataverse.feature.only-update-datacite-when-needed ++++++++++++++++++++++++++++++++++++++++++++++++++ diff --git a/doc/sphinx-guides/source/installation/shibboleth.rst b/doc/sphinx-guides/source/installation/shibboleth.rst index b8706017e4f..6f6e6d91f75 100644 --- a/doc/sphinx-guides/source/installation/shibboleth.rst +++ b/doc/sphinx-guides/source/installation/shibboleth.rst @@ -159,7 +159,7 @@ Rather than or in addition to specifying individual Identity Providers (see :ref For example, in the United States, you would register your Dataverse installation with `InCommon `_. For a list of federations around the world, see `REFEDS (the Research and Education FEDerations group) `_. The details of how to register with an identity federation are out of scope for this document. -If you are planning to use InCommon, please note that ``shibd`` needs to be configured to use the new MDQ protocol and WayFinder `service `_ `announced `_ `by `_ InCommon. The sample ``shibboleth2.xml`` provided already contains commented-out sections pre-configured to work with this new InCommon framework. Please see https://spaces.at.internet2.edu/display/MDQ/how-to-configure-shib-sp-to-use-mdq and https://spaces.at.internet2.edu/display/federation/how-to-configure-service-to-use-wayfinder for more information. You will also need to set the feature flag ``dataverse.feature.shibboleth-use-wayfinder=true`` (see :ref:`feature-flags`). +If you are planning to use InCommon, please note that ``shibd`` needs to be configured to use the new MDQ protocol and WayFinder `service `_ `announced `_ `by `_ InCommon. The sample ``shibboleth2.xml`` provided already contains commented-out sections pre-configured to work with this new InCommon framework. Please see https://spaces.at.internet2.edu/display/MDQ/how-to-configure-shib-sp-to-use-mdq and https://spaces.at.internet2.edu/display/federation/how-to-configure-service-to-use-wayfinder for more information. You will also need to set the :ref:`dataverse.feature.shibboleth-use-wayfinder` feature flag to true. For a successful login to Dataverse, certain :ref:`shibboleth-attributes` must be released by the Identity Provider (IdP). Otherwise, in the federation context, users will have the frustrating experience of selecting their IdP in the list but then getting an error like ``Problem with Identity Provider – The SAML assertion for "eppn" was null``. We definitely want to prevent this! There's even some guidance about this problem in the User Guide under the heading :ref:`fix-shib-login` that links back here.