Python Iceberg REST Catalog

A Python-based implementation of the Apache Iceberg REST Catalog API.

Features

Standard Iceberg REST API: Supports namespaces, tables, and config endpoints.
Multi-Catalog Support: Run multiple isolated catalogs on the same server using URL prefixes.
Pluggable Storage: Comes with SQLite storage for metadata, extensible to other DBs.
Pluggable I/O: Supports S3 and local filesystem via PyIceberg's FileIO.
Custom Metadata: Extension endpoints to store arbitrary JSON metadata.
Extensible Auth: Interface for custom authentication logic.

Getting Started

Prerequisites

Python 3.9+
Docker (optional)

Configuration

The catalog can be configured via Environment Variables (global) or Catalog Properties (per-catalog).

Environment Variables

Variable	Description	Default
`CATALOG_WAREHOUSE`	Base path for table data (e.g. `s3://my-bucket/warehouse` or `/tmp/warehouse`)	`/tmp/warehouse`
`CATALOG_PORT`	Port to run the server on	`8000`
`S3_ENDPOINT_URL`	Global S3 Endpoint URL (for MinIO/S3-compatible)	`None`
`AWS_ACCESS_KEY_ID`	Global AWS Access Key	`None`
`AWS_SECRET_ACCESS_KEY`	Global AWS Secret Key	`None`
`AWS_REGION`	Global AWS Region	`None`

Catalog Properties (Per-Catalog IO Config)

You can configure IO settings per catalog using the /v1/{catalog_name}/config/properties endpoint. These properties override environment variables.

Property	Description	Example
`s3.endpoint`	S3 Endpoint URL	`http://minio:9000`
`s3.access-key-id`	AWS Access Key	`minioadmin`
`s3.secret-access-key`	AWS Secret Key	`minioadmin`
`s3.region`	AWS Region	`us-east-1`
`warehouse`	Warehouse Location (overrides env var)	`s3://warehouse/my_cat`

Example: Setting properties via API

curl -X POST http://localhost:8000/v1/my_catalog/config/properties \
  -H "Content-Type: application/json" \
  -d '{
    "s3.endpoint": "http://minio:9000",
    "s3.access-key-id": "minioadmin",
    "s3.secret-access-key": "minioadmin"
  }'

Important: If you want the server to use a specific warehouse location for a catalog (e.g. for create_table), you must set the warehouse property via this API. Passing it to pyiceberg.load_catalog only configures the client, not the server.

Running Locally

Install Dependencies:
```
pip install -e .
```

Run the Server:

export CATALOG_WAREHOUSE=/tmp/warehouse
uvicorn main:app --reload --host 0.0.0.0 --port 8000

The server will start at http://127.0.0.1:8000.

Running via Docker

You can run the pre-built image alexmerced/iceberg-catalog:

docker run -p 8000:8000 -e CATALOG_WAREHOUSE=/tmp/warehouse alexmerced/iceberg-catalog

To mount a local directory for the warehouse (persistence):

docker run -p 8000:8000 \
  -v $(pwd)/warehouse:/tmp/warehouse \
  -e CATALOG_WAREHOUSE=/tmp/warehouse \
  alexmerced/iceberg-catalog

Running with MinIO (Docker Compose)

To run the catalog with MinIO (S3-compatible storage) using Docker Compose:

Create docker-compose.yml (provided in the repo).
Run:
```
docker-compose up
```
This starts:
- Catalog: http://localhost:8000
- MinIO Console: http://localhost:9001 (User: minioadmin, Pass: minioadmin)
- MinIO API: http://localhost:9000
- mc: Automatically creates a warehouse bucket.

Usage with PyIceberg

You can connect to this catalog using pyiceberg. The URL prefix determines the catalog name.

Example: Connecting to a catalog named my_team_catalog

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "my_team_catalog",
    **{
        "uri": "http://127.0.0.1:8000",
        "prefix": "my_team_catalog", # IMPORTANT: Matches the URL prefix
        "warehouse": "s3://my-bucket/warehouse",
    }
)

# Create a namespace
catalog.create_namespace("analytics")

# Create a table
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType

schema = Schema(NestedField(1, "data", StringType(), required=True))
catalog.create_table("analytics.logs", schema=schema)

Example: Using MinIO (S3)

If running via Docker Compose with MinIO:

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "minio_catalog",
    **{
        "uri": "http://localhost:8000",
        "prefix": "minio_catalog",
        "warehouse": "s3://warehouse/minio_catalog",
        "s3.endpoint": "http://localhost:9000",
        "s3.access-key-id": "minioadmin",
        "s3.secret-access-key": "minioadmin",
    }
)

# Create namespace and table
catalog.create_namespace("test_ns")
# ...

Extensibility

This catalog is designed to be easily extended.

1. Storage Backend (Database)

By default, metadata is stored in SQLite. To use Postgres, MySQL, or others:

Open catalog/storage.py.
Create a new class inheriting from CatalogStorage.
Implement the abstract methods (create_namespace, get_table, etc.).
In catalog/api/routes.py, update the get_storage dependency to return your new storage class.

# catalog/storage.py
class PostgresStorage(CatalogStorage):
    def __init__(self, connection_string):
        # Setup SQLAlchemy engine with Postgres
        pass
    # ... implement methods ...

2. Authentication

By default, no authentication is enforced. To add Auth (e.g., OAuth2, Basic Auth):

Open catalog/auth.py.
Create a new class inheriting from AuthenticationProvider.
Implement get_user(request: Request).
In catalog/api/routes.py, update the get_auth dependency.

# catalog/auth.py
class BasicAuthProvider(AuthenticationProvider):
    def get_user(self, request: Request) -> str:
        # Check Authorization header
        return user_id

3. Custom Metadata

The catalog exposes endpoints to store arbitrary JSON metadata associated with a catalog.

GET /v1/{catalog_name}/ext/metadata/{key}
POST /v1/{catalog_name}/ext/metadata/{key}
DELETE /v1/{catalog_name}/ext/metadata/{key}

This is useful for storing UI configurations, tags, or other auxiliary data not covered by the Iceberg spec.

API Endpoints

Standard Iceberg

GET /v1/{prefix}/config
GET /v1/{prefix}/namespaces
POST /v1/{prefix}/namespaces
GET /v1/{prefix}/namespaces/{namespace}
DELETE /v1/{prefix}/namespaces/{namespace}
POST /v1/{prefix}/namespaces/{namespace}/properties
GET /v1/{prefix}/namespaces/{namespace}/tables
POST /v1/{prefix}/namespaces/{namespace}/tables
GET /v1/{prefix}/namespaces/{namespace}/tables/{table}
DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}
POST /v1/{prefix}/tables/rename

Extensions

GET /v1/{prefix}/ext/metadata/{key}
POST /v1/{prefix}/ext/metadata/{key}
DELETE /v1/{prefix}/ext/metadata/{key}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
catalog		catalog
Dockerfile		Dockerfile
README.md		README.md
catalog.db		catalog.db
docker-compose.yml		docker-compose.yml
example.py		example.py
main.py		main.py
pyproject.toml		pyproject.toml
rest-catalog-open-api.yaml		rest-catalog-open-api.yaml
verify_catalog.py		verify_catalog.py
verify_only.py		verify_only.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Iceberg REST Catalog

Features

Getting Started

Prerequisites

Configuration

Environment Variables

Catalog Properties (Per-Catalog IO Config)

Running Locally

Running via Docker

Running with MinIO (Docker Compose)

Usage with PyIceberg

Extensibility

1. Storage Backend (Database)

2. Authentication

3. Custom Metadata

API Endpoints

Standard Iceberg

Extensions

About

Uh oh!

Releases

Packages

Languages

AlexMercedCoder/iceberg-catalog

Folders and files

Latest commit

History

Repository files navigation

Python Iceberg REST Catalog

Features

Getting Started

Prerequisites

Configuration

Environment Variables

Catalog Properties (Per-Catalog IO Config)

Running Locally

Running via Docker

Running with MinIO (Docker Compose)

Usage with PyIceberg

Extensibility

1. Storage Backend (Database)

2. Authentication

3. Custom Metadata

API Endpoints

Standard Iceberg

Extensions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages