Skip to content

AlexMercedCoder/iceberg-catalog

Repository files navigation

Python Iceberg REST Catalog

A Python-based implementation of the Apache Iceberg REST Catalog API.

Features

  • Standard Iceberg REST API: Supports namespaces, tables, and config endpoints.
  • Multi-Catalog Support: Run multiple isolated catalogs on the same server using URL prefixes.
  • Pluggable Storage: Comes with SQLite storage for metadata, extensible to other DBs.
  • Pluggable I/O: Supports S3 and local filesystem via PyIceberg's FileIO.
  • Custom Metadata: Extension endpoints to store arbitrary JSON metadata.
  • Extensible Auth: Interface for custom authentication logic.

Getting Started

Prerequisites

  • Python 3.9+
  • Docker (optional)

Configuration

The catalog can be configured via Environment Variables (global) or Catalog Properties (per-catalog).

Environment Variables

Variable Description Default
CATALOG_WAREHOUSE Base path for table data (e.g. s3://my-bucket/warehouse or /tmp/warehouse) /tmp/warehouse
CATALOG_PORT Port to run the server on 8000
S3_ENDPOINT_URL Global S3 Endpoint URL (for MinIO/S3-compatible) None
AWS_ACCESS_KEY_ID Global AWS Access Key None
AWS_SECRET_ACCESS_KEY Global AWS Secret Key None
AWS_REGION Global AWS Region None

Catalog Properties (Per-Catalog IO Config)

You can configure IO settings per catalog using the /v1/{catalog_name}/config/properties endpoint. These properties override environment variables.

Property Description Example
s3.endpoint S3 Endpoint URL http://minio:9000
s3.access-key-id AWS Access Key minioadmin
s3.secret-access-key AWS Secret Key minioadmin
s3.region AWS Region us-east-1
warehouse Warehouse Location (overrides env var) s3://warehouse/my_cat

Example: Setting properties via API

curl -X POST http://localhost:8000/v1/my_catalog/config/properties \
  -H "Content-Type: application/json" \
  -d '{
    "s3.endpoint": "http://minio:9000",
    "s3.access-key-id": "minioadmin",
    "s3.secret-access-key": "minioadmin"
  }'

Important: If you want the server to use a specific warehouse location for a catalog (e.g. for create_table), you must set the warehouse property via this API. Passing it to pyiceberg.load_catalog only configures the client, not the server.

Running Locally

  1. Install Dependencies:

    pip install -e .
  2. Run the Server:

    export CATALOG_WAREHOUSE=/tmp/warehouse
    uvicorn main:app --reload --host 0.0.0.0 --port 8000

    The server will start at http://127.0.0.1:8000.

Running via Docker

You can run the pre-built image alexmerced/iceberg-catalog:

docker run -p 8000:8000 -e CATALOG_WAREHOUSE=/tmp/warehouse alexmerced/iceberg-catalog

To mount a local directory for the warehouse (persistence):

docker run -p 8000:8000 \
  -v $(pwd)/warehouse:/tmp/warehouse \
  -e CATALOG_WAREHOUSE=/tmp/warehouse \
  alexmerced/iceberg-catalog

Running with MinIO (Docker Compose)

To run the catalog with MinIO (S3-compatible storage) using Docker Compose:

  1. Create docker-compose.yml (provided in the repo).
  2. Run:
    docker-compose up
    This starts:

Usage with PyIceberg

You can connect to this catalog using pyiceberg. The URL prefix determines the catalog name.

Example: Connecting to a catalog named my_team_catalog

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "my_team_catalog",
    **{
        "uri": "http://127.0.0.1:8000",
        "prefix": "my_team_catalog", # IMPORTANT: Matches the URL prefix
        "warehouse": "s3://my-bucket/warehouse",
    }
)

# Create a namespace
catalog.create_namespace("analytics")

# Create a table
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StringType

schema = Schema(NestedField(1, "data", StringType(), required=True))
catalog.create_table("analytics.logs", schema=schema)

Example: Using MinIO (S3)

If running via Docker Compose with MinIO:

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "minio_catalog",
    **{
        "uri": "http://localhost:8000",
        "prefix": "minio_catalog",
        "warehouse": "s3://warehouse/minio_catalog",
        "s3.endpoint": "http://localhost:9000",
        "s3.access-key-id": "minioadmin",
        "s3.secret-access-key": "minioadmin",
    }
)

# Create namespace and table
catalog.create_namespace("test_ns")
# ...

Extensibility

This catalog is designed to be easily extended.

1. Storage Backend (Database)

By default, metadata is stored in SQLite. To use Postgres, MySQL, or others:

  1. Open catalog/storage.py.
  2. Create a new class inheriting from CatalogStorage.
  3. Implement the abstract methods (create_namespace, get_table, etc.).
  4. In catalog/api/routes.py, update the get_storage dependency to return your new storage class.
# catalog/storage.py
class PostgresStorage(CatalogStorage):
    def __init__(self, connection_string):
        # Setup SQLAlchemy engine with Postgres
        pass
    # ... implement methods ...

2. Authentication

By default, no authentication is enforced. To add Auth (e.g., OAuth2, Basic Auth):

  1. Open catalog/auth.py.
  2. Create a new class inheriting from AuthenticationProvider.
  3. Implement get_user(request: Request).
  4. In catalog/api/routes.py, update the get_auth dependency.
# catalog/auth.py
class BasicAuthProvider(AuthenticationProvider):
    def get_user(self, request: Request) -> str:
        # Check Authorization header
        return user_id

3. Custom Metadata

The catalog exposes endpoints to store arbitrary JSON metadata associated with a catalog.

  • GET /v1/{catalog_name}/ext/metadata/{key}
  • POST /v1/{catalog_name}/ext/metadata/{key}
  • DELETE /v1/{catalog_name}/ext/metadata/{key}

This is useful for storing UI configurations, tags, or other auxiliary data not covered by the Iceberg spec.

API Endpoints

Standard Iceberg

  • GET /v1/{prefix}/config
  • GET /v1/{prefix}/namespaces
  • POST /v1/{prefix}/namespaces
  • GET /v1/{prefix}/namespaces/{namespace}
  • DELETE /v1/{prefix}/namespaces/{namespace}
  • POST /v1/{prefix}/namespaces/{namespace}/properties
  • GET /v1/{prefix}/namespaces/{namespace}/tables
  • POST /v1/{prefix}/namespaces/{namespace}/tables
  • GET /v1/{prefix}/namespaces/{namespace}/tables/{table}
  • DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}
  • POST /v1/{prefix}/tables/rename

Extensions

  • GET /v1/{prefix}/ext/metadata/{key}
  • POST /v1/{prefix}/ext/metadata/{key}
  • DELETE /v1/{prefix}/ext/metadata/{key}

About

basic python based Iceberg Catalog for testing and practice purposes, feel free to fork and extend

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published