HyperFleet Sentinel Service - Kubernetes service that polls HyperFleet API, makes orchestration decisions, and publishes events. Features configurable max age intervals, horizontal sharding via SentinelConfig CRD, and broker abstraction (GCP Pub/Sub, RabbitMQ, Stub). Centralized reconciliation logic.
- Go 1.25 or later
- Docker or Podman
- Make
-
Clone the repository:
git clone https://github.com/openshift-hyperfleet/hyperfleet-sentinel.git cd hyperfleet-sentinel -
Generate the OpenAPI client:
make generate
This will:
- Download the official OpenAPI spec from hyperfleet-api (main branch)
- Generate Go client code in
pkg/api/openapi/
Both the downloaded spec and generated client code are not committed to git and must be regenerated locally.
-
Download dependencies:
make download
-
Build the binary:
make build
-
Run tests:
make test
make help- Show all available make targetsmake generate- Generate OpenAPI client from spec (Docker/Podman-based)make build- Build the sentinel binarymake test- Run unit tests with coveragemake test-integration- Run integration tests (requires Docker/Podman)make test-all- Run all tests (unit + integration)make fmt- Format Go codemake lint- Run golangci-lint (requires golangci-lint installed)make clean- Remove build artifacts and generated code
The project uses a hybrid testing approach:
- Unit tests: Fast, isolated tests using mocks
- Integration tests: End-to-end tests with real message brokers via testcontainers
# Run only unit tests (fast)
make test
# Run integration tests (requires Docker or Podman)
make test-integration
# Run all tests
make test-allIntegration tests automatically work with both Docker and Podman. For troubleshooting and advanced configuration, see docs/testcontainers.md.
For instructions on running Sentinel locally or on GKE, see docs/running-sentinel.md.
This project follows the rh-trex pattern for OpenAPI client generation. The OpenAPI specification is automatically downloaded from the official hyperfleet-api repository (main branch by default) during make generate.
The client is generated using Docker/Podman to ensure consistency across development environments.
To use a different branch or tag:
make generate OPENAPI_SPEC_REF=v1.0.0 # Use a specific tag
make generate OPENAPI_SPEC_REF=develop # Use a branchFor detailed information about OpenAPI client generation, see openapi/README.md.
The Sentinel service uses YAML-based configuration with environment variable overrides for sensitive data (broker credentials).
Create a configuration file based on the examples in the configs/ directory:
configs/gcp-pubsub-example.yaml- GCP Pub/Sub configurationconfigs/rabbitmq-example.yaml- RabbitMQ configurationconfigs/dev-example.yaml- Development configuration
| Field | Type | Description | Example |
|---|---|---|---|
resource_type |
string | Resource to watch (clusters, nodepools) | clusters |
hyperfleet_api.endpoint |
string | HyperFleet API base URL (k8s service) | http://hyperfleet-api.hyperfleet-system.svc.cluster.local:8080 |
| Field | Type | Default | Description |
|---|---|---|---|
poll_interval |
duration | 5s |
How often to poll the API for resource updates |
max_age_not_ready |
duration | 10s |
Max age interval for resources not ready |
max_age_ready |
duration | 30m |
Max age interval for ready resources |
hyperfleet_api.timeout |
duration | 5s |
Request timeout for API calls |
resource_selector |
array | [] |
Label selectors for filtering resources (enables sharding) |
message_data |
map | {} |
Template fields for CloudEvents data payload |
The resource_selector field enables horizontal scaling by having multiple Sentinel instances watch different resource subsets:
resource_selector:
- label: shard
value: "1"
- label: region
value: us-east-1An empty or omitted resource_selector means watch all resources. Multiple selectors use AND logic (all labels must match).
For detailed instructions on deploying multiple Sentinel instances with different resource selectors, see docs/multi-instance-deployment.md.
Define custom fields to include in CloudEvents using Go template syntax. Both .field and {{.field}} formats are supported:
message_data:
resource_id: .id
resource_type: .kind
href: .href
generation: .generationTemplates can reference any field from the Resource object returned by the API. The example above follows the ObjectReference pattern (id, kind, href) with generation for reconciliation tracking.
Broker configuration is managed by the hyperfleet-broker library. You can configure the broker using either:
- broker.yaml file (see
broker.yamlin project root for example) - BROKER_CONFIG_FILE environment variable (path to your broker config file)
- Direct environment variables (listed below)
RabbitMQ:
| Variable | Description | Example |
|---|---|---|
BROKER_RABBITMQ_URL |
Complete connection URL | amqp://user:pass@localhost:5672/vhost |
Google Pub/Sub:
| Variable | Description | Example |
|---|---|---|
BROKER_GOOGLEPUBSUB_PROJECT_ID |
GCP project ID | my-gcp-project |
GOOGLE_APPLICATION_CREDENTIALS |
Service account key path (optional, uses ADC if not set) | /path/to/key.json |
Topic Configuration:
| Variable | Description | Example |
|---|---|---|
BROKER_TOPIC |
Topic name for publishing events | hyperfleet-dev-clusters |
The BROKER_TOPIC environment variable sets the full topic name where events will be published. When using Helm, the default topic is {namespace}-{resourceType} (e.g., hyperfleet-dev-clusters, hyperfleet-dev-nodepools). This enables isolation between different environments or tenants sharing the same broker. See Naming Strategy for details.
For detailed broker configuration options, see the hyperfleet-broker documentation.
For detailed instructions on running Sentinel locally or deploying to GKE, see docs/running-sentinel.md.
For Helm chart documentation and configuration options, see deployments/helm/sentinel/README.md.
The service validates configuration at startup and will fail fast on errors:
- Required fields present:
resource_type,hyperfleet_api.endpoint - Valid enums:
resource_typemust be clusters/nodepools - Valid durations: All interval fields must be positive
- Valid templates: All
message_datatemplates must be valid Go templates - Broker configuration: Managed by hyperfleet-broker library (see broker.yaml)
resource_type: clusters
hyperfleet_api:
endpoint: http://localhost:8000This uses all defaults. Broker configuration is managed via broker.yaml or environment variables (see Broker Configuration section).
resource_type: clusters
poll_interval: 5s
max_age_not_ready: 10s
max_age_ready: 30m
resource_selector:
- label: shard
value: "1"
hyperfleet_api:
endpoint: http://hyperfleet-api.hyperfleet-system.svc.cluster.local:8080
timeout: 30s
message_data:
resource_id: .id
resource_type: .kind
href: .href
generation: .generationThe Sentinel service exposes Prometheus metrics on port 8080 at /metrics for monitoring and alerting.
Sentinel provides 6 core metrics for comprehensive observability:
| Metric | Type | Description |
|---|---|---|
hyperfleet_sentinel_pending_resources |
Gauge | Number of resources pending reconciliation |
hyperfleet_sentinel_events_published_total |
Counter | Total events published to message broker |
hyperfleet_sentinel_resources_skipped_total |
Counter | Resources skipped (preconditions not met) |
hyperfleet_sentinel_poll_duration_seconds |
Histogram | Duration of each polling cycle |
hyperfleet_sentinel_api_errors_total |
Counter | Errors when calling HyperFleet API |
hyperfleet_sentinel_broker_errors_total |
Counter | Errors when publishing to message broker |
All metrics include resource_type and resource_selector labels for filtering.
For detailed metric descriptions, example queries, and alerting rules, see docs/metrics.md.
A pre-built Grafana dashboard is available at deployments/dashboards/sentinel-metrics.json with 8 visualization panels covering all metrics.
To import:
- Navigate to Grafana → Dashboards → Import
- Upload
deployments/dashboards/sentinel-metrics.json - Select your Prometheus datasource
Sentinel integrates with Google Cloud Managed Prometheus (GMP) for automated metrics collection:
# Deploy with PodMonitoring enabled (default)
helm install sentinel ./deployments/helm/sentinel \
--namespace hyperfleet-system \
--create-namespace
# Verify metrics in Google Cloud Console
# Navigate to: Monitoring → Metrics Explorer
# Query: hyperfleet_sentinel_pending_resourcesGMP automatically discovers the PodMonitoring resource and begins scraping metrics. No additional configuration required.
Configure alerts in Google Cloud Console → Monitoring → Alerting using the PromQL expressions provided in docs/metrics.md.
Recommended alerts:
- SentinelHighPendingResources - High number of pending resources
- SentinelAPIErrorRateHigh - High API error rate
- SentinelBrokerErrorRateHigh - High broker error rate
- SentinelSlowPolling - Slow polling cycles
- SentinelNoEventsPublished - No events published despite pending resources
- SentinelHighSkipRatio - High ratio of skipped resources
- SentinelDown - Sentinel service is down
See docs/metrics.md for complete alerting rules documentation.
Access metrics through Google Cloud Console:
- Navigate to Monitoring → Metrics Explorer
- Select resource type: Prometheus Target
- Query:
hyperfleet_sentinel_pending_resources
Metrics are automatically collected by Google Cloud Managed Prometheus via the PodMonitoring resource.
All members of the hyperfleet team have write access to this repository.
If you're a team member and need access to this repository:
- Verify Organization Membership: Ensure you're a member of the
openshift-hyperfleetorganization - Check Team Assignment: Confirm you're added to the hyperfleet team within the organization
- Repository Permissions: All hyperfleet team members automatically receive write access
- OWNERS File: Code reviews and approvals are managed through the OWNERS file
For access issues, contact a repository administrator or organization owner.