Skip to content

HyperFleet Sentinel Operator - Kubernetes operator that polls HyperFleet API, makes orchestration decisions, and publishes events. Features configurable backoff strategies, horizontal sharding via SentinelConfig CRD, and broker abstraction (GCP Pub/Sub, RabbitMQ, Stub). Centralized reconciliation logic.

License

Notifications You must be signed in to change notification settings

openshift-hyperfleet/hyperfleet-sentinel

Repository files navigation

hyperfleet-sentinel

HyperFleet Sentinel Service - Kubernetes service that polls HyperFleet API, makes orchestration decisions, and publishes events. Features configurable max age intervals, horizontal sharding via SentinelConfig CRD, and broker abstraction (GCP Pub/Sub, RabbitMQ, Stub). Centralized reconciliation logic.

Development Setup

Prerequisites

  • Go 1.25 or later
  • Docker or Podman
  • Make

Getting Started

  1. Clone the repository:

    git clone https://github.com/openshift-hyperfleet/hyperfleet-sentinel.git
    cd hyperfleet-sentinel
  2. Generate the OpenAPI client:

    make generate

    This will:

    • Download the official OpenAPI spec from hyperfleet-api (main branch)
    • Generate Go client code in pkg/api/openapi/

    Both the downloaded spec and generated client code are not committed to git and must be regenerated locally.

  3. Download dependencies:

    make download
  4. Build the binary:

    make build
  5. Run tests:

    make test

Common Make Targets

  • make help - Show all available make targets
  • make generate - Generate OpenAPI client from spec (Docker/Podman-based)
  • make build - Build the sentinel binary
  • make test - Run unit tests with coverage
  • make test-integration - Run integration tests (requires Docker/Podman)
  • make test-all - Run all tests (unit + integration)
  • make fmt - Format Go code
  • make lint - Run golangci-lint (requires golangci-lint installed)
  • make clean - Remove build artifacts and generated code

Testing

The project uses a hybrid testing approach:

  • Unit tests: Fast, isolated tests using mocks
  • Integration tests: End-to-end tests with real message brokers via testcontainers
# Run only unit tests (fast)
make test

# Run integration tests (requires Docker or Podman)
make test-integration

# Run all tests
make test-all

Integration tests automatically work with both Docker and Podman. For troubleshooting and advanced configuration, see docs/testcontainers.md.

For instructions on running Sentinel locally or on GKE, see docs/running-sentinel.md.

OpenAPI Client Generation

This project follows the rh-trex pattern for OpenAPI client generation. The OpenAPI specification is automatically downloaded from the official hyperfleet-api repository (main branch by default) during make generate.

The client is generated using Docker/Podman to ensure consistency across development environments.

To use a different branch or tag:

make generate OPENAPI_SPEC_REF=v1.0.0    # Use a specific tag
make generate OPENAPI_SPEC_REF=develop   # Use a branch

For detailed information about OpenAPI client generation, see openapi/README.md.

Configuration

The Sentinel service uses YAML-based configuration with environment variable overrides for sensitive data (broker credentials).

Configuration File

Create a configuration file based on the examples in the configs/ directory:

  • configs/gcp-pubsub-example.yaml - GCP Pub/Sub configuration
  • configs/rabbitmq-example.yaml - RabbitMQ configuration
  • configs/dev-example.yaml - Development configuration

Configuration Schema

Required Fields

Field Type Description Example
resource_type string Resource to watch (clusters, nodepools) clusters
hyperfleet_api.endpoint string HyperFleet API base URL (k8s service) http://hyperfleet-api.hyperfleet-system.svc.cluster.local:8080

Optional Fields with Defaults

Field Type Default Description
poll_interval duration 5s How often to poll the API for resource updates
max_age_not_ready duration 10s Max age interval for resources not ready
max_age_ready duration 30m Max age interval for ready resources
hyperfleet_api.timeout duration 5s Request timeout for API calls
resource_selector array [] Label selectors for filtering resources (enables sharding)
message_data map {} Template fields for CloudEvents data payload

Resource Selector (Sharding)

The resource_selector field enables horizontal scaling by having multiple Sentinel instances watch different resource subsets:

resource_selector:
  - label: shard
    value: "1"
  - label: region
    value: us-east-1

An empty or omitted resource_selector means watch all resources. Multiple selectors use AND logic (all labels must match).

For detailed instructions on deploying multiple Sentinel instances with different resource selectors, see docs/multi-instance-deployment.md.

Message Data Templates

Define custom fields to include in CloudEvents using Go template syntax. Both .field and {{.field}} formats are supported:

message_data:
  resource_id: .id
  resource_type: .kind
  href: .href
  generation: .generation

Templates can reference any field from the Resource object returned by the API. The example above follows the ObjectReference pattern (id, kind, href) with generation for reconciliation tracking.

Broker Configuration

Broker configuration is managed by the hyperfleet-broker library. You can configure the broker using either:

  1. broker.yaml file (see broker.yaml in project root for example)
  2. BROKER_CONFIG_FILE environment variable (path to your broker config file)
  3. Direct environment variables (listed below)

Environment Variables (Override broker.yaml)

RabbitMQ:

Variable Description Example
BROKER_RABBITMQ_URL Complete connection URL amqp://user:pass@localhost:5672/vhost

Google Pub/Sub:

Variable Description Example
BROKER_GOOGLEPUBSUB_PROJECT_ID GCP project ID my-gcp-project
GOOGLE_APPLICATION_CREDENTIALS Service account key path (optional, uses ADC if not set) /path/to/key.json

Topic Configuration:

Variable Description Example
BROKER_TOPIC Topic name for publishing events hyperfleet-dev-clusters

The BROKER_TOPIC environment variable sets the full topic name where events will be published. When using Helm, the default topic is {namespace}-{resourceType} (e.g., hyperfleet-dev-clusters, hyperfleet-dev-nodepools). This enables isolation between different environments or tenants sharing the same broker. See Naming Strategy for details.

For detailed broker configuration options, see the hyperfleet-broker documentation.

Running Sentinel

For detailed instructions on running Sentinel locally or deploying to GKE, see docs/running-sentinel.md.

For Helm chart documentation and configuration options, see deployments/helm/sentinel/README.md.

Configuration Validation

The service validates configuration at startup and will fail fast on errors:

  • Required fields present: resource_type, hyperfleet_api.endpoint
  • Valid enums: resource_type must be clusters/nodepools
  • Valid durations: All interval fields must be positive
  • Valid templates: All message_data templates must be valid Go templates
  • Broker configuration: Managed by hyperfleet-broker library (see broker.yaml)

Configuration Examples

Minimal Configuration

resource_type: clusters
hyperfleet_api:
  endpoint: http://localhost:8000

This uses all defaults. Broker configuration is managed via broker.yaml or environment variables (see Broker Configuration section).

Production Configuration with Sharding

resource_type: clusters
poll_interval: 5s
max_age_not_ready: 10s
max_age_ready: 30m

resource_selector:
  - label: shard
    value: "1"

hyperfleet_api:
  endpoint: http://hyperfleet-api.hyperfleet-system.svc.cluster.local:8080
  timeout: 30s

message_data:
  resource_id: .id
  resource_type: .kind
  href: .href
  generation: .generation

Observability

The Sentinel service exposes Prometheus metrics on port 8080 at /metrics for monitoring and alerting.

Metrics

Sentinel provides 6 core metrics for comprehensive observability:

Metric Type Description
hyperfleet_sentinel_pending_resources Gauge Number of resources pending reconciliation
hyperfleet_sentinel_events_published_total Counter Total events published to message broker
hyperfleet_sentinel_resources_skipped_total Counter Resources skipped (preconditions not met)
hyperfleet_sentinel_poll_duration_seconds Histogram Duration of each polling cycle
hyperfleet_sentinel_api_errors_total Counter Errors when calling HyperFleet API
hyperfleet_sentinel_broker_errors_total Counter Errors when publishing to message broker

All metrics include resource_type and resource_selector labels for filtering.

For detailed metric descriptions, example queries, and alerting rules, see docs/metrics.md.

Grafana Dashboard

A pre-built Grafana dashboard is available at deployments/dashboards/sentinel-metrics.json with 8 visualization panels covering all metrics.

To import:

  1. Navigate to Grafana → Dashboards → Import
  2. Upload deployments/dashboards/sentinel-metrics.json
  3. Select your Prometheus datasource

GKE Integration with Google Cloud Managed Prometheus

Sentinel integrates with Google Cloud Managed Prometheus (GMP) for automated metrics collection:

# Deploy with PodMonitoring enabled (default)
helm install sentinel ./deployments/helm/sentinel \
  --namespace hyperfleet-system \
  --create-namespace

# Verify metrics in Google Cloud Console
# Navigate to: Monitoring → Metrics Explorer
# Query: hyperfleet_sentinel_pending_resources

GMP automatically discovers the PodMonitoring resource and begins scraping metrics. No additional configuration required.

Alerting

Configure alerts in Google Cloud Console → Monitoring → Alerting using the PromQL expressions provided in docs/metrics.md.

Recommended alerts:

  • SentinelHighPendingResources - High number of pending resources
  • SentinelAPIErrorRateHigh - High API error rate
  • SentinelBrokerErrorRateHigh - High broker error rate
  • SentinelSlowPolling - Slow polling cycles
  • SentinelNoEventsPublished - No events published despite pending resources
  • SentinelHighSkipRatio - High ratio of skipped resources
  • SentinelDown - Sentinel service is down

See docs/metrics.md for complete alerting rules documentation.

Accessing Metrics

Access metrics through Google Cloud Console:

  1. Navigate to Monitoring → Metrics Explorer
  2. Select resource type: Prometheus Target
  3. Query: hyperfleet_sentinel_pending_resources

Metrics are automatically collected by Google Cloud Managed Prometheus via the PodMonitoring resource.

Repository Access

All members of the hyperfleet team have write access to this repository.

Steps to Apply for Repository Access

If you're a team member and need access to this repository:

  1. Verify Organization Membership: Ensure you're a member of the openshift-hyperfleet organization
  2. Check Team Assignment: Confirm you're added to the hyperfleet team within the organization
  3. Repository Permissions: All hyperfleet team members automatically receive write access
  4. OWNERS File: Code reviews and approvals are managed through the OWNERS file

For access issues, contact a repository administrator or organization owner.

About

HyperFleet Sentinel Operator - Kubernetes operator that polls HyperFleet API, makes orchestration decisions, and publishes events. Features configurable backoff strategies, horizontal sharding via SentinelConfig CRD, and broker abstraction (GCP Pub/Sub, RabbitMQ, Stub). Centralized reconciliation logic.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 9

Languages