A Python-based implementation of the Apache Iceberg REST Catalog API.
- Standard Iceberg REST API: Supports namespaces, tables, and config endpoints.
- Multi-Catalog Support: Run multiple isolated catalogs on the same server using URL prefixes.
- Pluggable Storage: Comes with SQLite storage for metadata, extensible to other DBs.
- Pluggable I/O: Supports S3 and local filesystem via PyIceberg's FileIO.
- Custom Metadata: Extension endpoints to store arbitrary JSON metadata.
- Extensible Auth: Interface for custom authentication logic.
- Python 3.9+
- Docker (optional)
The catalog can be configured via Environment Variables (global) or Catalog Properties (per-catalog).
| Variable | Description | Default |
|---|---|---|
CATALOG_WAREHOUSE |
Base path for table data (e.g. s3://my-bucket/warehouse or /tmp/warehouse) |
/tmp/warehouse |
CATALOG_PORT |
Port to run the server on | 8000 |
S3_ENDPOINT_URL |
Global S3 Endpoint URL (for MinIO/S3-compatible) | None |
AWS_ACCESS_KEY_ID |
Global AWS Access Key | None |
AWS_SECRET_ACCESS_KEY |
Global AWS Secret Key | None |
AWS_REGION |
Global AWS Region | None |
You can configure IO settings per catalog using the /v1/{catalog_name}/config/properties endpoint. These properties override environment variables.
| Property | Description | Example |
|---|---|---|
s3.endpoint |
S3 Endpoint URL | http://minio:9000 |
s3.access-key-id |
AWS Access Key | minioadmin |
s3.secret-access-key |
AWS Secret Key | minioadmin |
s3.region |
AWS Region | us-east-1 |
warehouse |
Warehouse Location (overrides env var) | s3://warehouse/my_cat |
Example: Setting properties via API
curl -X POST http://localhost:8000/v1/my_catalog/config/properties \
-H "Content-Type: application/json" \
-d '{
"s3.endpoint": "http://minio:9000",
"s3.access-key-id": "minioadmin",
"s3.secret-access-key": "minioadmin"
}'Important: If you want the server to use a specific warehouse location for a catalog (e.g. for
create_table), you must set thewarehouseproperty via this API. Passing it topyiceberg.load_catalogonly configures the client, not the server.
-
Install Dependencies:
pip install -e . -
Run the Server:
export CATALOG_WAREHOUSE=/tmp/warehouse uvicorn main:app --reload --host 0.0.0.0 --port 8000The server will start at
http://127.0.0.1:8000.
You can run the pre-built image alexmerced/iceberg-catalog:
docker run -p 8000:8000 -e CATALOG_WAREHOUSE=/tmp/warehouse alexmerced/iceberg-catalogTo mount a local directory for the warehouse (persistence):
docker run -p 8000:8000 \
-v $(pwd)/warehouse:/tmp/warehouse \
-e CATALOG_WAREHOUSE=/tmp/warehouse \
alexmerced/iceberg-catalogTo run the catalog with MinIO (S3-compatible storage) using Docker Compose:
- Create
docker-compose.yml(provided in the repo). - Run:
This starts:
docker-compose up
- Catalog: http://localhost:8000
- MinIO Console: http://localhost:9001 (User:
minioadmin, Pass:minioadmin) - MinIO API: http://localhost:9000
- mc: Automatically creates a
warehousebucket.
You can connect to this catalog using pyiceberg. The URL prefix determines the catalog name.
Example: Connecting to a catalog named my_team_catalog
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"my_team_catalog",
**{
"uri": "http://127.0.0.1:8000",
"prefix": "my_team_catalog", # IMPORTANT: Matches the URL prefix
"warehouse": "s3://my-bucket/warehouse",
}
)
# Create a namespace
catalog.create_namespace("analytics")
# Create a table
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType
schema = Schema(NestedField(1, "data", StringType(), required=True))
catalog.create_table("analytics.logs", schema=schema)Example: Using MinIO (S3)
If running via Docker Compose with MinIO:
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"minio_catalog",
**{
"uri": "http://localhost:8000",
"prefix": "minio_catalog",
"warehouse": "s3://warehouse/minio_catalog",
"s3.endpoint": "http://localhost:9000",
"s3.access-key-id": "minioadmin",
"s3.secret-access-key": "minioadmin",
}
)
# Create namespace and table
catalog.create_namespace("test_ns")
# ...This catalog is designed to be easily extended.
By default, metadata is stored in SQLite. To use Postgres, MySQL, or others:
- Open
catalog/storage.py. - Create a new class inheriting from
CatalogStorage. - Implement the abstract methods (
create_namespace,get_table, etc.). - In
catalog/api/routes.py, update theget_storagedependency to return your new storage class.
# catalog/storage.py
class PostgresStorage(CatalogStorage):
def __init__(self, connection_string):
# Setup SQLAlchemy engine with Postgres
pass
# ... implement methods ...By default, no authentication is enforced. To add Auth (e.g., OAuth2, Basic Auth):
- Open
catalog/auth.py. - Create a new class inheriting from
AuthenticationProvider. - Implement
get_user(request: Request). - In
catalog/api/routes.py, update theget_authdependency.
# catalog/auth.py
class BasicAuthProvider(AuthenticationProvider):
def get_user(self, request: Request) -> str:
# Check Authorization header
return user_idThe catalog exposes endpoints to store arbitrary JSON metadata associated with a catalog.
GET /v1/{catalog_name}/ext/metadata/{key}POST /v1/{catalog_name}/ext/metadata/{key}DELETE /v1/{catalog_name}/ext/metadata/{key}
This is useful for storing UI configurations, tags, or other auxiliary data not covered by the Iceberg spec.
GET /v1/{prefix}/configGET /v1/{prefix}/namespacesPOST /v1/{prefix}/namespacesGET /v1/{prefix}/namespaces/{namespace}DELETE /v1/{prefix}/namespaces/{namespace}POST /v1/{prefix}/namespaces/{namespace}/propertiesGET /v1/{prefix}/namespaces/{namespace}/tablesPOST /v1/{prefix}/namespaces/{namespace}/tablesGET /v1/{prefix}/namespaces/{namespace}/tables/{table}DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}POST /v1/{prefix}/tables/rename
GET /v1/{prefix}/ext/metadata/{key}POST /v1/{prefix}/ext/metadata/{key}DELETE /v1/{prefix}/ext/metadata/{key}