-
Notifications
You must be signed in to change notification settings - Fork 38
feat: Add /catalogs route for Federated STAC API Support #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
great feature @jonhealy1 that I really missed. Probably would be great to add what you wronte in the readme in the descrition of the PR. My 2c |
|
I'm not really sure what this adds that is not already possible with the existing specs. Can someone explain this a bit better? |
|
@m-mohr I left a response to your comment on #308. In summary, a STAC API generally represents one root Catalog. Users have asked for this extension to avoid running numerous separate API instances. Discoverability across numerous running API instances, representing multiple catalogs, would be practically impossible. While traversing child links is possible, it treats the API like a static file server. It is inefficient for discovery and technically infeasible for extensions. Applying server-side operations (like Aggregations, Sort, or Filter) to a recursive link crawl is functionally impossible without massive latency. |
|
Then please have a look at the STAC API - Children extension, it solves your issue of latency in a very similar way (just with a different name). |
|
@m-mohr Thanks for pointing us towards the Children extension. I think this is something that we will definitely want to add to this project in the near future. |
|
The PR says it implements federated STAC API support, but then it also says it only works on a single infrastructure. How is this federation working? If I have e.g. three STAC APIs already and I want to offer them via a single API, does this PR solve the issue? |
|
If I understand what you're saying, yes. You would have three routes ie. |
|
So if I have three instances that should remain separate instances on different machines, but we want an additional proxy that can query all three instances at a time, this PR doesn't provide a solution, right? You'd need to change them to a single instance? |
|
Correct. It would be interesting to explore creating something like that - something that could query api instances across multiple machines - it could be done asynchronously and then the central api would gather the results. Pagination would be difficult. Sorting too I guess. It could be done though. |
|
There are tools that implement this already. I just wanted to understand the scope of the PR, thanks. |
|
@m-mohr I think the reality for many users is that they do not want to run 3 separate API and database instances just to host 3 logical Catalogs. They would rather maintain and fund one infrastructure. While scaling can be handled via cloud infrastructure, the application architecture needs to support this consolidation. The /catalogs extension provides the necessary routing to host these multiple contexts within that single, cost-effective instance. |
**Related Issue(s):** - None **Description:** #### Added - Environment variable `VALIDATE_QUERYABLES` to enable/disable validation of queryables in search/filter requests. When set to `true`, search requests will be validated against the defined queryables, returning an error for any unsupported fields. Defaults to `false` for backward compatibility.[#532](#532) - Environment variable `QUERYABLES_CACHE_TTL` to configure the TTL (in seconds) for caching queryables. Default is `1800` seconds (30 minutes) to balance performance and freshness of queryables data. [#532](#532) - Added optional `/catalogs` route support to enable federated hierarchical catalog browsing and navigation. [#547](#547) - Added DELETE `/catalogs/{catalog_id}/collections/{collection_id}` endpoint to support removing collections from catalogs. When a collection belongs to multiple catalogs, it removes only the specified catalog from the collection's parent_ids. When a collection belongs to only one catalog, the collection is deleted entirely. [#554](#554) - Added `parent_ids` internal field to collections to support multi-catalog hierarchies. Collections can now belong to multiple catalogs, with parent catalog IDs stored in this field for efficient querying and management. [#554](#554) - Added GET `/catalogs/{catalog_id}/children` endpoint implementing the STAC Children extension for efficient hierarchical catalog browsing. Supports type filtering (?type=Catalog|Collection), pagination, and returns numberReturned/numberMatched counts at the top level. [#558](#558) - Implemented context-aware dynamic linking: catalogs use dynamic `rel="children"` links pointing to the `/catalogs/{id}/children` endpoint, and collections have context-dependent `rel="parent"` links (pointing to catalog when accessed via `/catalogs/{id}/collections/{id}`, or root when accessed via `/collections/{id}`). Catalog links are only injected in catalog context. This eliminates race conditions and ensures consistency with parent_ids relationships. [#559](#559) #### Changed - Have opensearch datetime, geometry and collections fields defined as constant strings [#553](#553) #### Fixed - Fix unawaited coroutine in `stac_fastapi.core.core`. [#551](#551) - Parse `ES_TIMEOUT` environment variable as an integer. [#556](#556) - Implemented "Smart Unlink" logic in delete_catalog: when cascade=False (default), collections are unlinked from the catalog and become root-level orphans if they have no other parents, rather than being deleted. When cascade=True, collections are deleted entirely. This prevents accidental data loss and supports poly-hierarchy scenarios where collections belong to multiple catalogs. [#557](#557) - Fixed delete_catalog to use reverse lookup query on parent_ids field instead of fragile link parsing. This ensures all collections are found and updated correctly, preventing ghost relationships where collections remain tagged with deleted catalogs, especially in large catalogs or pagination scenarios. [#557](#557) **PR Checklist:** - [x] Code is formatted and linted (run `pre-commit run --all-files`) - [x] Tests pass (run `make test`) - [x] Documentation has been updated to reflect changes, if applicable - [x] Changes are added to the changelog
Related Issue(s):
Description
This PR introduces the Catalogs Extension, enabling a federated "Hub and Spoke" architecture within
stac-fastapi.Currently, the API assumes a single Root Catalog containing a flat list of Collections. This works for simple deployments but becomes unwieldy for large-scale implementations aggregating multiple providers, missions, or projects. This change adds a
/catalogsendpoint that acts as a Registry, allowing the API to serve multiple distinct sub-catalogs from a single infrastructure.Key Features
New Endpoints: Implements the full suite of hierarchical endpoints:
GET /catalogs(List all sub-catalogs)POST /catalogs(Create new sub-catalog)DELETE /catalogs/{catalog_id}(Delete a catalog (supports ?cascade=true to delete child collections))GET /catalogs/{catalog_id}(Sub-catalog Landing Page)GET /catalogs/{catalog_id}/collections(Scoped collections)POST /catalogs/{catalog_id}/collections(Create a new collection directly linked to a specific catalog)GET /catalogs/{catalog_id}/collections/{collection_id}(Get one collection)GET /catalogs/{catalog_id}/collections/{collection_id}/items(Scoped item search)GET /catalogs/{catalog_id}/collections/{collection_id}/items/{item_id}(Get one item)Serialization: Updates Pydantic models and serializers to support
type: "Catalog"objects within the API tree (previously restricted to Collections).Configuration: Controlled via
ENABLE_CATALOGS_ROUTEenvironment variable (default:false).Storage Strategy (Non-Breaking)
To ensure zero breaking changes and avoid complex database migrations, this implementation stores
Catalogobjects within the existingcollectionsindex.typefield (type: "Catalog"vs.type: "Collection").Architectural Alignment
This implementation follows the proposed STAC API Catalogs Endpoint Extension (Community Extension).
It addresses the "Data Silo" problem by allowing organizations to host distinct catalogs on a single API instance, rather than deploying separate containers for every project or provider.
Changes
stac_fastapi/core/extensions/catalogs.py: Added the main extension logic and router.stac_fastapi/core/models/: AddedCatalogPydantic models.stac_fastapi/elasticsearch/database_logic.py: Added CRUD logic filtering bytype: "Catalog".tests/: Added comprehensive test suite (test_catalogs.py) covering CRUD operations and hierarchical navigation.PR Checklist:
pre-commit run --all-files)make test)