diff --git a/design/EP-476-enterprise-enablement.md b/design/EP-476-enterprise-enablement.md new file mode 100644 index 000000000..e1348b775 --- /dev/null +++ b/design/EP-476-enterprise-enablement.md @@ -0,0 +1,117 @@ +# EP-476: Enterprise Enablement + +* Issue: [#476](https://github.com/kagent-dev/kagent/issues/476) + +## Background + +This EP addresses enterprise deployment requirements for kagent, specifically authentication, multi-tenancy, and audit logging. These features are prerequisites for production deployment in regulated environments. + +## Motivation + +Currently kagent has limited support for enterprise deployment scenarios: + +1. **Authentication**: Only `UnsecureAuthenticator` is implemented +2. **Multi-tenancy**: Controller operates cluster-wide with no namespace isolation +3. **Audit logging**: Basic HTTP logging without compliance-ready audit trail + +### Goals + +1. Implement OAuth2/OIDC authentication provider +2. Add namespace-scoped controller mode for multi-tenancy +3. Add structured audit logging for compliance + +### Non-Goals + +- RBAC authorization (future work) +- Multi-cluster support (separate EP) +- Air-gapped installation (documentation only) + +## Implementation Details + +### OAuth2/OIDC Authentication + +New `OAuth2Authenticator` implementing `auth.AuthProvider`: + +```go +// go/internal/httpserver/auth/oauth2.go +type OAuth2Config struct { + IssuerURL string + ClientID string + Audience string + RequiredScopes []string + UserIDClaim string // default: "sub" + RolesClaim string // default: "roles" +} +``` + +Features: +- JWT validation with JWKS caching +- Configurable claims extraction +- Scope and audience validation +- Bearer token from header or query parameter + +### Namespace-Scoped Controller + +Add `watchedNamespaces` parameter to reconciler: + +```go +// go/internal/controller/reconciler/reconciler.go +type kagentReconciler struct { + // If empty, cluster-wide. If set, only these namespaces. + watchedNamespaces []string +} + +func (a *kagentReconciler) validateNamespaceIsolation(namespace string) error +``` + +All reconcile methods call `validateNamespaceIsolation()` before processing. + +### Structured Audit Logging + +Middleware that logs compliance-ready JSON: + +```go +// go/internal/httpserver/middleware.go +type AuditLogConfig struct { + Enabled bool + LogLevel int + IncludeHeaders []string +} +``` + +Logged fields: `request_id`, `timestamp`, `user`, `user_roles`, `namespace`, `action`, `status`, `duration_ms` + +### Helm Configuration + +```yaml +# values.yaml +controller: + watchNamespaces: [] # empty = cluster-wide + +pdb: + enabled: false + controller: + minAvailable: 1 + +metrics: + serviceMonitor: + enabled: false +``` + +### Test Plan + +- Unit tests for OAuth2 token validation (8 test cases) +- Unit tests for namespace isolation (15 test cases) +- Unit tests for audit middleware (11 test cases) +- Integration tests with mock OIDC server + +## Alternatives + +1. **Use existing auth middleware**: Rejected - kagent needs session-based auth for A2A protocol +2. **Namespace isolation via RBAC only**: Rejected - controller still needs to enforce boundaries +3. **External audit logging**: Considered - middleware approach is simpler and integrates with existing logging + +## Open Questions + +1. Should OAuth2 config be a CRD or Helm values? (Currently Helm values) +2. Integration with OpenShift OAuth server? (Future work) diff --git a/docs/openshift-deployment-guide.md b/docs/openshift-deployment-guide.md new file mode 100644 index 000000000..55fd49541 --- /dev/null +++ b/docs/openshift-deployment-guide.md @@ -0,0 +1,112 @@ +# OpenShift Deployment Guide + +This guide covers OpenShift-specific deployment considerations for kagent. + +## Prerequisites + +- OpenShift 4.12+ +- `oc` CLI configured +- Helm 3.x + +## Installation + +```bash +# Create namespace +oc new-project kagent + +# Install CRDs +helm install kagent-crds ./helm/kagent-crds/ -n kagent + +# Install kagent with OpenShift Route +helm install kagent ./helm/kagent/ -n kagent \ + --set providers.openAI.apiKey=$OPENAI_API_KEY \ + --set openshift.enabled=true +``` + +## Security Context Constraints + +kagent runs with `restricted-v2` SCC by default. No special SCCs required. + +For custom SCCs: + +```bash +# View current SCC +oc get pod -n kagent -o yaml | grep -A5 securityContext + +# Grant specific SCC (if needed) +oc adm policy add-scc-to-user anyuid -z kagent-controller -n kagent +``` + +## Routes + +The Helm chart creates an OpenShift Route when `openshift.enabled=true`: + +```yaml +# values.yaml +openshift: + enabled: true + route: + host: kagent.apps.example.com # optional + tls: + termination: edge +``` + +Access the UI: + +```bash +oc get route kagent-ui -n kagent -o jsonpath='{.spec.host}' +``` + +## Pod Security Standards + +kagent is compatible with PSS `restricted` profile: + +| Setting | Value | +|---------|-------| +| `runAsNonRoot` | true | +| `allowPrivilegeEscalation` | false | +| `capabilities.drop` | ALL | +| `seccompProfile` | RuntimeDefault | + +## High Availability + +```yaml +# values-ha.yaml +controller: + replicas: 2 + +ui: + replicas: 2 + +pdb: + enabled: true + controller: + minAvailable: 1 + ui: + minAvailable: 1 +``` + +```bash +helm upgrade kagent ./helm/kagent/ -n kagent -f values-ha.yaml +``` + +## Troubleshooting + +```bash +# Check pod status +oc get pods -n kagent + +# Check SCC violations +oc get events -n kagent | grep -i scc + +# View controller logs +oc logs -l app.kubernetes.io/component=controller -n kagent + +# Check Route status +oc describe route kagent-ui -n kagent +``` + +## See Also + +- [Installation Guide](https://kagent.dev/docs/kagent/introduction/installation) +- [Helm Chart README](../helm/README.md) diff --git a/go/internal/controller/reconciler/reconciler.go b/go/internal/controller/reconciler/reconciler.go index 258fd62a9..777936590 100644 --- a/go/internal/controller/reconciler/reconciler.go +++ b/go/internal/controller/reconciler/reconciler.go @@ -58,6 +58,11 @@ type kagentReconciler struct { defaultModelConfig types.NamespacedName + // watchedNamespaces contains the list of namespaces the controller is allowed to operate on. + // If empty, all namespaces are allowed (cluster-wide mode). + // If non-empty, only operations within these namespaces are permitted (namespace-scoped mode). + watchedNamespaces []string + // TODO: Remove this lock since we have a DB which we can batch anyway upsertLock sync.Mutex } @@ -67,16 +72,50 @@ func NewKagentReconciler( kube client.Client, dbClient database.Client, defaultModelConfig types.NamespacedName, + watchedNamespaces []string, ) KagentReconciler { return &kagentReconciler{ adkTranslator: translator, kube: kube, dbClient: dbClient, defaultModelConfig: defaultModelConfig, + watchedNamespaces: watchedNamespaces, + } +} + +// isNamespaceAllowed checks if the given namespace is allowed based on the watched namespaces configuration. +// Returns true if: +// - watchedNamespaces is empty (cluster-wide mode, all namespaces allowed) +// - namespace is in the watchedNamespaces list (namespace-scoped mode) +func (a *kagentReconciler) isNamespaceAllowed(namespace string) bool { + if len(a.watchedNamespaces) == 0 { + return true } + return slices.Contains(a.watchedNamespaces, namespace) +} + +// validateNamespaceIsolation checks if an operation in the given namespace is allowed. +// Returns an error if namespace isolation is violated. +func (a *kagentReconciler) validateNamespaceIsolation(namespace string) error { + if !a.isNamespaceAllowed(namespace) { + return fmt.Errorf("namespace %q is not in the list of watched namespaces; controller is in namespace-scoped mode watching: %v", namespace, a.watchedNamespaces) + } + return nil +} + +// IsNamespaceScopedMode returns true if the controller is operating in namespace-scoped mode +// (i.e., watching specific namespaces rather than all namespaces). +func (a *kagentReconciler) IsNamespaceScopedMode() bool { + return len(a.watchedNamespaces) > 0 } func (a *kagentReconciler) ReconcileKagentAgent(ctx context.Context, req ctrl.Request) error { + // Enforce namespace isolation in namespace-scoped mode + if err := a.validateNamespaceIsolation(req.Namespace); err != nil { + reconcileLog.Info("Skipping agent reconciliation due to namespace isolation", "agent", req.NamespacedName, "error", err) + return nil + } + // TODO(sbx0r): missing finalizer logic agent := &v1alpha2.Agent{} if err := a.kube.Get(ctx, req.NamespacedName, agent); err != nil { @@ -171,6 +210,12 @@ func (a *kagentReconciler) reconcileAgentStatus(ctx context.Context, agent *v1al } func (a *kagentReconciler) ReconcileKagentMCPService(ctx context.Context, req ctrl.Request) error { + // Enforce namespace isolation in namespace-scoped mode + if err := a.validateNamespaceIsolation(req.Namespace); err != nil { + reconcileLog.Info("Skipping MCP service reconciliation due to namespace isolation", "service", req.NamespacedName, "error", err) + return nil + } + service := &corev1.Service{} if err := a.kube.Get(ctx, req.NamespacedName, service); err != nil { if apierrors.IsNotFound(err) { @@ -214,6 +259,12 @@ type secretRef struct { } func (a *kagentReconciler) ReconcileKagentModelConfig(ctx context.Context, req ctrl.Request) error { + // Enforce namespace isolation in namespace-scoped mode + if err := a.validateNamespaceIsolation(req.Namespace); err != nil { + reconcileLog.Info("Skipping model config reconciliation due to namespace isolation", "modelConfig", req.NamespacedName, "error", err) + return nil + } + modelConfig := &v1alpha2.ModelConfig{} if err := a.kube.Get(ctx, req.NamespacedName, modelConfig); err != nil { if apierrors.IsNotFound(err) { @@ -337,6 +388,12 @@ func (a *kagentReconciler) reconcileModelConfigStatus(ctx context.Context, model } func (a *kagentReconciler) ReconcileKagentMCPServer(ctx context.Context, req ctrl.Request) error { + // Enforce namespace isolation in namespace-scoped mode + if err := a.validateNamespaceIsolation(req.Namespace); err != nil { + reconcileLog.Info("Skipping MCP server reconciliation due to namespace isolation", "mcpServer", req.NamespacedName, "error", err) + return nil + } + mcpServer := &v1alpha1.MCPServer{} if err := a.kube.Get(ctx, req.NamespacedName, mcpServer); err != nil { if apierrors.IsNotFound(err) { @@ -375,6 +432,12 @@ func (a *kagentReconciler) ReconcileKagentMCPServer(ctx context.Context, req ctr } func (a *kagentReconciler) ReconcileKagentRemoteMCPServer(ctx context.Context, req ctrl.Request) error { + // Enforce namespace isolation in namespace-scoped mode + if err := a.validateNamespaceIsolation(req.Namespace); err != nil { + reconcileLog.Info("Skipping remote MCP server reconciliation due to namespace isolation", "remoteMCPServer", req.NamespacedName, "error", err) + return nil + } + nns := req.NamespacedName serverRef := nns.String() l := reconcileLog.WithValues("remoteMCPServer", serverRef) diff --git a/go/internal/controller/reconciler/reconciler_test.go b/go/internal/controller/reconciler/reconciler_test.go index 4cd6bd5ea..27ba14a7f 100644 --- a/go/internal/controller/reconciler/reconciler_test.go +++ b/go/internal/controller/reconciler/reconciler_test.go @@ -207,3 +207,157 @@ func TestAgentIDConsistency(t *testing.T) { assert.Equal(t, storeID, deleteID) } + +// TestIsNamespaceAllowed tests the namespace isolation logic +func TestIsNamespaceAllowed(t *testing.T) { + tests := []struct { + name string + watchedNamespaces []string + namespace string + expected bool + }{ + { + name: "cluster-wide mode (empty watchedNamespaces) allows all namespaces", + watchedNamespaces: []string{}, + namespace: "any-namespace", + expected: true, + }, + { + name: "cluster-wide mode allows default namespace", + watchedNamespaces: []string{}, + namespace: "default", + expected: true, + }, + { + name: "namespace-scoped mode allows watched namespace", + watchedNamespaces: []string{"ns1", "ns2", "ns3"}, + namespace: "ns1", + expected: true, + }, + { + name: "namespace-scoped mode allows another watched namespace", + watchedNamespaces: []string{"ns1", "ns2", "ns3"}, + namespace: "ns3", + expected: true, + }, + { + name: "namespace-scoped mode blocks non-watched namespace", + watchedNamespaces: []string{"ns1", "ns2"}, + namespace: "ns3", + expected: false, + }, + { + name: "namespace-scoped mode blocks default namespace if not in list", + watchedNamespaces: []string{"ns1", "ns2"}, + namespace: "default", + expected: false, + }, + { + name: "single namespace mode allows only that namespace", + watchedNamespaces: []string{"production"}, + namespace: "production", + expected: true, + }, + { + name: "single namespace mode blocks other namespaces", + watchedNamespaces: []string{"production"}, + namespace: "staging", + expected: false, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + r := &kagentReconciler{ + watchedNamespaces: tt.watchedNamespaces, + } + result := r.isNamespaceAllowed(tt.namespace) + assert.Equal(t, tt.expected, result) + }) + } +} + +// TestValidateNamespaceIsolation tests the namespace isolation validation +func TestValidateNamespaceIsolation(t *testing.T) { + tests := []struct { + name string + watchedNamespaces []string + namespace string + expectError bool + }{ + { + name: "cluster-wide mode returns no error", + watchedNamespaces: []string{}, + namespace: "any-namespace", + expectError: false, + }, + { + name: "namespace-scoped mode allows watched namespace", + watchedNamespaces: []string{"allowed-ns"}, + namespace: "allowed-ns", + expectError: false, + }, + { + name: "namespace-scoped mode returns error for non-watched namespace", + watchedNamespaces: []string{"allowed-ns"}, + namespace: "blocked-ns", + expectError: true, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + r := &kagentReconciler{ + watchedNamespaces: tt.watchedNamespaces, + } + err := r.validateNamespaceIsolation(tt.namespace) + if tt.expectError { + assert.Error(t, err) + assert.Contains(t, err.Error(), "namespace-scoped mode") + assert.Contains(t, err.Error(), tt.namespace) + } else { + assert.NoError(t, err) + } + }) + } +} + +// TestIsNamespaceScopedMode tests the namespace scoped mode detection +func TestIsNamespaceScopedMode(t *testing.T) { + tests := []struct { + name string + watchedNamespaces []string + expected bool + }{ + { + name: "empty watchedNamespaces is not namespace-scoped", + watchedNamespaces: []string{}, + expected: false, + }, + { + name: "nil watchedNamespaces is not namespace-scoped", + watchedNamespaces: nil, + expected: false, + }, + { + name: "single namespace is namespace-scoped", + watchedNamespaces: []string{"ns1"}, + expected: true, + }, + { + name: "multiple namespaces is namespace-scoped", + watchedNamespaces: []string{"ns1", "ns2", "ns3"}, + expected: true, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + r := &kagentReconciler{ + watchedNamespaces: tt.watchedNamespaces, + } + result := r.IsNamespaceScopedMode() + assert.Equal(t, tt.expected, result) + }) + } +} diff --git a/go/internal/controller/translator/agent/security_context_test.go b/go/internal/controller/translator/agent/security_context_test.go index 71f1d5296..8f477d88e 100644 --- a/go/internal/controller/translator/agent/security_context_test.go +++ b/go/internal/controller/translator/agent/security_context_test.go @@ -349,3 +349,617 @@ func TestSecurityContext_WithSandbox(t *testing.T) { assert.True(t, *containerSecurityContext.Privileged, "Privileged should be true when sandbox is needed") assert.Equal(t, int64(1000), *containerSecurityContext.RunAsUser, "User-provided runAsUser should still be set") } + +// OpenShift SCC (Security Context Constraints) Test Scenarios +// These tests verify compatibility with OpenShift's SCC policies: +// - restricted: Most restrictive, requires specific UID/GID ranges +// - anyuid: Allows running as any user ID +// - privileged: Allows privileged containers + +// TestSecurityContext_OpenShiftSCC_Restricted tests that agents can run under +// OpenShift's restricted SCC which enforces: +// - RunAsNonRoot: true +// - Specific UID/GID ranges (typically allocated by the namespace) +// - No privilege escalation +// - Capabilities dropped to minimum +func TestSecurityContext_OpenShiftSCC_Restricted(t *testing.T) { + ctx := context.Background() + + // OpenShift restricted SCC requires specific security settings + // UID ranges are typically allocated per-namespace (e.g., 1000680000-1000689999) + agent := &v1alpha2.Agent{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-agent", + Namespace: "test", + }, + Spec: v1alpha2.AgentSpec{ + Type: v1alpha2.AgentType_Declarative, + Declarative: &v1alpha2.DeclarativeAgentSpec{ + SystemMessage: "Test agent", + ModelConfig: "test-model", + Deployment: &v1alpha2.DeclarativeDeploymentSpec{ + SharedDeploymentSpec: v1alpha2.SharedDeploymentSpec{ + PodSecurityContext: &corev1.PodSecurityContext{ + RunAsNonRoot: ptr.To(true), + FSGroup: ptr.To(int64(1000680000)), + SeccompProfile: &corev1.SeccompProfile{ + Type: corev1.SeccompProfileTypeRuntimeDefault, + }, + }, + SecurityContext: &corev1.SecurityContext{ + RunAsNonRoot: ptr.To(true), + AllowPrivilegeEscalation: ptr.To(false), + Capabilities: &corev1.Capabilities{ + Drop: []corev1.Capability{"ALL"}, + }, + SeccompProfile: &corev1.SeccompProfile{ + Type: corev1.SeccompProfileTypeRuntimeDefault, + }, + }, + }, + }, + }, + }, + } + + modelConfig := &v1alpha2.ModelConfig{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-model", + Namespace: "test", + }, + Spec: v1alpha2.ModelConfigSpec{ + Provider: "OpenAI", + Model: "gpt-4o", + }, + } + + scheme := schemev1.Scheme + err := v1alpha2.AddToScheme(scheme) + require.NoError(t, err) + + kubeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(agent, modelConfig). + Build() + + defaultModel := types.NamespacedName{ + Namespace: "test", + Name: "test-model", + } + translatorInstance := translator.NewAdkApiTranslator(kubeClient, defaultModel, nil) + + result, err := translatorInstance.TranslateAgent(ctx, agent) + require.NoError(t, err) + + var deployment *appsv1.Deployment + for _, obj := range result.Manifest { + if dep, ok := obj.(*appsv1.Deployment); ok { + deployment = dep + break + } + } + require.NotNil(t, deployment) + podTemplate := &deployment.Spec.Template + + // Verify pod security context meets restricted SCC requirements + podSecurityContext := podTemplate.Spec.SecurityContext + require.NotNil(t, podSecurityContext, "Pod securityContext should be set for restricted SCC") + assert.True(t, *podSecurityContext.RunAsNonRoot, "RunAsNonRoot must be true for restricted SCC") + assert.Equal(t, int64(1000680000), *podSecurityContext.FSGroup, "FSGroup should be in OpenShift allocated range") + require.NotNil(t, podSecurityContext.SeccompProfile, "SeccompProfile should be set") + assert.Equal(t, corev1.SeccompProfileTypeRuntimeDefault, podSecurityContext.SeccompProfile.Type) + + // Verify container security context meets restricted SCC requirements + containerSecurityContext := podTemplate.Spec.Containers[0].SecurityContext + require.NotNil(t, containerSecurityContext, "Container securityContext should be set for restricted SCC") + assert.True(t, *containerSecurityContext.RunAsNonRoot, "Container runAsNonRoot must be true") + assert.False(t, *containerSecurityContext.AllowPrivilegeEscalation, "AllowPrivilegeEscalation must be false") + require.NotNil(t, containerSecurityContext.Capabilities, "Capabilities should be set") + assert.Contains(t, containerSecurityContext.Capabilities.Drop, corev1.Capability("ALL"), "Should drop ALL capabilities") + assert.Empty(t, containerSecurityContext.Capabilities.Add, "No capabilities should be added for restricted SCC") +} + +// TestSecurityContext_OpenShiftSCC_Anyuid tests that agents can run under +// OpenShift's anyuid SCC which allows: +// - Running as any user ID (including root) +// - Still requires security best practices for containers +func TestSecurityContext_OpenShiftSCC_Anyuid(t *testing.T) { + ctx := context.Background() + + // Anyuid SCC allows running as specific UID without OpenShift range restrictions + agent := &v1alpha2.Agent{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-agent", + Namespace: "test", + }, + Spec: v1alpha2.AgentSpec{ + Type: v1alpha2.AgentType_Declarative, + Declarative: &v1alpha2.DeclarativeAgentSpec{ + SystemMessage: "Test agent", + ModelConfig: "test-model", + Deployment: &v1alpha2.DeclarativeDeploymentSpec{ + SharedDeploymentSpec: v1alpha2.SharedDeploymentSpec{ + PodSecurityContext: &corev1.PodSecurityContext{ + RunAsUser: ptr.To(int64(1000)), + RunAsGroup: ptr.To(int64(1000)), + FSGroup: ptr.To(int64(1000)), + }, + SecurityContext: &corev1.SecurityContext{ + RunAsUser: ptr.To(int64(1000)), + RunAsGroup: ptr.To(int64(1000)), + AllowPrivilegeEscalation: ptr.To(false), + Capabilities: &corev1.Capabilities{ + Drop: []corev1.Capability{"ALL"}, + }, + }, + }, + }, + }, + }, + } + + modelConfig := &v1alpha2.ModelConfig{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-model", + Namespace: "test", + }, + Spec: v1alpha2.ModelConfigSpec{ + Provider: "OpenAI", + Model: "gpt-4o", + }, + } + + scheme := schemev1.Scheme + err := v1alpha2.AddToScheme(scheme) + require.NoError(t, err) + + kubeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(agent, modelConfig). + Build() + + defaultModel := types.NamespacedName{ + Namespace: "test", + Name: "test-model", + } + translatorInstance := translator.NewAdkApiTranslator(kubeClient, defaultModel, nil) + + result, err := translatorInstance.TranslateAgent(ctx, agent) + require.NoError(t, err) + + var deployment *appsv1.Deployment + for _, obj := range result.Manifest { + if dep, ok := obj.(*appsv1.Deployment); ok { + deployment = dep + break + } + } + require.NotNil(t, deployment) + podTemplate := &deployment.Spec.Template + + // Verify pod security context for anyuid SCC + podSecurityContext := podTemplate.Spec.SecurityContext + require.NotNil(t, podSecurityContext) + assert.Equal(t, int64(1000), *podSecurityContext.RunAsUser, "Anyuid allows specific user ID") + assert.Equal(t, int64(1000), *podSecurityContext.RunAsGroup) + assert.Equal(t, int64(1000), *podSecurityContext.FSGroup) + + // Verify container security context for anyuid SCC + containerSecurityContext := podTemplate.Spec.Containers[0].SecurityContext + require.NotNil(t, containerSecurityContext) + assert.Equal(t, int64(1000), *containerSecurityContext.RunAsUser) + assert.False(t, *containerSecurityContext.AllowPrivilegeEscalation, "Should still prevent privilege escalation") +} + +// TestSecurityContext_OpenShiftSCC_Privileged tests agents that require +// privileged access, such as those using sandboxed code execution. +// Note: This SCC should only be used when absolutely necessary. +func TestSecurityContext_OpenShiftSCC_Privileged(t *testing.T) { + ctx := context.Background() + + // When skills are used, kagent requires privileged mode for sandbox + agent := &v1alpha2.Agent{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-agent", + Namespace: "test", + }, + Spec: v1alpha2.AgentSpec{ + Type: v1alpha2.AgentType_Declarative, + Skills: &v1alpha2.SkillForAgent{ + Refs: []string{"code-exec-skill:latest"}, + }, + Declarative: &v1alpha2.DeclarativeAgentSpec{ + SystemMessage: "Test agent", + ModelConfig: "test-model", + Deployment: &v1alpha2.DeclarativeDeploymentSpec{ + SharedDeploymentSpec: v1alpha2.SharedDeploymentSpec{ + PodSecurityContext: &corev1.PodSecurityContext{ + RunAsUser: ptr.To(int64(0)), + RunAsGroup: ptr.To(int64(0)), + }, + }, + }, + }, + }, + } + + modelConfig := &v1alpha2.ModelConfig{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-model", + Namespace: "test", + }, + Spec: v1alpha2.ModelConfigSpec{ + Provider: "OpenAI", + Model: "gpt-4o", + }, + } + + scheme := schemev1.Scheme + err := v1alpha2.AddToScheme(scheme) + require.NoError(t, err) + + kubeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(agent, modelConfig). + Build() + + defaultModel := types.NamespacedName{ + Namespace: "test", + Name: "test-model", + } + translatorInstance := translator.NewAdkApiTranslator(kubeClient, defaultModel, nil) + + result, err := translatorInstance.TranslateAgent(ctx, agent) + require.NoError(t, err) + + var deployment *appsv1.Deployment + for _, obj := range result.Manifest { + if dep, ok := obj.(*appsv1.Deployment); ok { + deployment = dep + break + } + } + require.NotNil(t, deployment) + podTemplate := &deployment.Spec.Template + + // Verify privileged container for sandbox + containerSecurityContext := podTemplate.Spec.Containers[0].SecurityContext + require.NotNil(t, containerSecurityContext) + assert.True(t, *containerSecurityContext.Privileged, "Privileged must be true for sandbox execution") + + // Pod security context should still be respected + podSecurityContext := podTemplate.Spec.SecurityContext + require.NotNil(t, podSecurityContext) + assert.Equal(t, int64(0), *podSecurityContext.RunAsUser, "Should run as root when privileged") +} + +// Pod Security Standards (PSS) Test Scenarios +// These tests verify compatibility with Kubernetes Pod Security Standards: +// - restricted: Highly restricted, following current pod hardening best practices +// - baseline: Minimally restrictive, prevents known privilege escalations +// - privileged: Unrestricted policy + +// TestSecurityContext_PSS_RestrictedV2 tests that agents can run under the +// Pod Security Standards "restricted" profile (v2), which is the most secure +// and requires: +// - Running as non-root +// - Seccomp profile set to RuntimeDefault or Localhost +// - Dropping all capabilities +// - No privilege escalation +// - Read-only root filesystem (recommended) +func TestSecurityContext_PSS_RestrictedV2(t *testing.T) { + ctx := context.Background() + + // PSS restricted profile (v2) configuration + agent := &v1alpha2.Agent{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-agent", + Namespace: "test", + }, + Spec: v1alpha2.AgentSpec{ + Type: v1alpha2.AgentType_Declarative, + Declarative: &v1alpha2.DeclarativeAgentSpec{ + SystemMessage: "Test agent", + ModelConfig: "test-model", + Deployment: &v1alpha2.DeclarativeDeploymentSpec{ + SharedDeploymentSpec: v1alpha2.SharedDeploymentSpec{ + PodSecurityContext: &corev1.PodSecurityContext{ + RunAsNonRoot: ptr.To(true), + RunAsUser: ptr.To(int64(65534)), // nobody user + RunAsGroup: ptr.To(int64(65534)), + FSGroup: ptr.To(int64(65534)), + SeccompProfile: &corev1.SeccompProfile{ + Type: corev1.SeccompProfileTypeRuntimeDefault, + }, + }, + SecurityContext: &corev1.SecurityContext{ + RunAsNonRoot: ptr.To(true), + RunAsUser: ptr.To(int64(65534)), + RunAsGroup: ptr.To(int64(65534)), + AllowPrivilegeEscalation: ptr.To(false), + ReadOnlyRootFilesystem: ptr.To(true), + Capabilities: &corev1.Capabilities{ + Drop: []corev1.Capability{"ALL"}, + }, + SeccompProfile: &corev1.SeccompProfile{ + Type: corev1.SeccompProfileTypeRuntimeDefault, + }, + }, + }, + }, + }, + }, + } + + modelConfig := &v1alpha2.ModelConfig{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-model", + Namespace: "test", + }, + Spec: v1alpha2.ModelConfigSpec{ + Provider: "OpenAI", + Model: "gpt-4o", + }, + } + + scheme := schemev1.Scheme + err := v1alpha2.AddToScheme(scheme) + require.NoError(t, err) + + kubeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(agent, modelConfig). + Build() + + defaultModel := types.NamespacedName{ + Namespace: "test", + Name: "test-model", + } + translatorInstance := translator.NewAdkApiTranslator(kubeClient, defaultModel, nil) + + result, err := translatorInstance.TranslateAgent(ctx, agent) + require.NoError(t, err) + + var deployment *appsv1.Deployment + for _, obj := range result.Manifest { + if dep, ok := obj.(*appsv1.Deployment); ok { + deployment = dep + break + } + } + require.NotNil(t, deployment) + podTemplate := &deployment.Spec.Template + + // Verify pod security context meets PSS restricted requirements + podSecurityContext := podTemplate.Spec.SecurityContext + require.NotNil(t, podSecurityContext, "Pod securityContext is required for PSS restricted") + assert.True(t, *podSecurityContext.RunAsNonRoot, "RunAsNonRoot must be true for PSS restricted") + assert.Equal(t, int64(65534), *podSecurityContext.RunAsUser, "Should run as nobody user") + require.NotNil(t, podSecurityContext.SeccompProfile, "SeccompProfile is required for PSS restricted") + assert.Equal(t, corev1.SeccompProfileTypeRuntimeDefault, podSecurityContext.SeccompProfile.Type) + + // Verify container security context meets PSS restricted requirements + containerSecurityContext := podTemplate.Spec.Containers[0].SecurityContext + require.NotNil(t, containerSecurityContext, "Container securityContext is required") + assert.True(t, *containerSecurityContext.RunAsNonRoot, "RunAsNonRoot must be true") + assert.False(t, *containerSecurityContext.AllowPrivilegeEscalation, "AllowPrivilegeEscalation must be false") + assert.True(t, *containerSecurityContext.ReadOnlyRootFilesystem, "ReadOnlyRootFilesystem should be true") + + // Verify capabilities are dropped + require.NotNil(t, containerSecurityContext.Capabilities, "Capabilities must be configured") + assert.Contains(t, containerSecurityContext.Capabilities.Drop, corev1.Capability("ALL"), "Must drop ALL capabilities") + assert.Empty(t, containerSecurityContext.Capabilities.Add, "No capabilities should be added for PSS restricted") + + // Verify seccomp profile at container level + require.NotNil(t, containerSecurityContext.SeccompProfile, "Container seccompProfile is required") + assert.Equal(t, corev1.SeccompProfileTypeRuntimeDefault, containerSecurityContext.SeccompProfile.Type) +} + +// TestSecurityContext_PSS_Baseline tests the Pod Security Standards "baseline" +// profile which prevents known privilege escalations but is more permissive. +func TestSecurityContext_PSS_Baseline(t *testing.T) { + ctx := context.Background() + + // PSS baseline profile - minimal restrictions + agent := &v1alpha2.Agent{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-agent", + Namespace: "test", + }, + Spec: v1alpha2.AgentSpec{ + Type: v1alpha2.AgentType_Declarative, + Declarative: &v1alpha2.DeclarativeAgentSpec{ + SystemMessage: "Test agent", + ModelConfig: "test-model", + Deployment: &v1alpha2.DeclarativeDeploymentSpec{ + SharedDeploymentSpec: v1alpha2.SharedDeploymentSpec{ + SecurityContext: &corev1.SecurityContext{ + AllowPrivilegeEscalation: ptr.To(false), + Capabilities: &corev1.Capabilities{ + Drop: []corev1.Capability{"ALL"}, + Add: []corev1.Capability{"NET_BIND_SERVICE"}, // Allowed in baseline + }, + }, + }, + }, + }, + }, + } + + modelConfig := &v1alpha2.ModelConfig{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-model", + Namespace: "test", + }, + Spec: v1alpha2.ModelConfigSpec{ + Provider: "OpenAI", + Model: "gpt-4o", + }, + } + + scheme := schemev1.Scheme + err := v1alpha2.AddToScheme(scheme) + require.NoError(t, err) + + kubeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(agent, modelConfig). + Build() + + defaultModel := types.NamespacedName{ + Namespace: "test", + Name: "test-model", + } + translatorInstance := translator.NewAdkApiTranslator(kubeClient, defaultModel, nil) + + result, err := translatorInstance.TranslateAgent(ctx, agent) + require.NoError(t, err) + + var deployment *appsv1.Deployment + for _, obj := range result.Manifest { + if dep, ok := obj.(*appsv1.Deployment); ok { + deployment = dep + break + } + } + require.NotNil(t, deployment) + podTemplate := &deployment.Spec.Template + + // Verify container security context meets PSS baseline requirements + containerSecurityContext := podTemplate.Spec.Containers[0].SecurityContext + require.NotNil(t, containerSecurityContext) + assert.False(t, *containerSecurityContext.AllowPrivilegeEscalation, "AllowPrivilegeEscalation should be false") + + // Baseline allows NET_BIND_SERVICE capability + require.NotNil(t, containerSecurityContext.Capabilities) + assert.Contains(t, containerSecurityContext.Capabilities.Add, corev1.Capability("NET_BIND_SERVICE"), "NET_BIND_SERVICE is allowed in baseline") +} + +// TestSecurityContext_PSS_RestrictedV2_WithVolumes tests PSS restricted compliance +// when the agent requires writable volumes (since ReadOnlyRootFilesystem is true). +func TestSecurityContext_PSS_RestrictedV2_WithVolumes(t *testing.T) { + ctx := context.Background() + + // PSS restricted with emptyDir volumes for writable paths + agent := &v1alpha2.Agent{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-agent", + Namespace: "test", + }, + Spec: v1alpha2.AgentSpec{ + Type: v1alpha2.AgentType_Declarative, + Declarative: &v1alpha2.DeclarativeAgentSpec{ + SystemMessage: "Test agent", + ModelConfig: "test-model", + Deployment: &v1alpha2.DeclarativeDeploymentSpec{ + SharedDeploymentSpec: v1alpha2.SharedDeploymentSpec{ + PodSecurityContext: &corev1.PodSecurityContext{ + RunAsNonRoot: ptr.To(true), + RunAsUser: ptr.To(int64(1000)), + FSGroup: ptr.To(int64(1000)), + SeccompProfile: &corev1.SeccompProfile{ + Type: corev1.SeccompProfileTypeRuntimeDefault, + }, + }, + SecurityContext: &corev1.SecurityContext{ + RunAsNonRoot: ptr.To(true), + AllowPrivilegeEscalation: ptr.To(false), + ReadOnlyRootFilesystem: ptr.To(true), + Capabilities: &corev1.Capabilities{ + Drop: []corev1.Capability{"ALL"}, + }, + SeccompProfile: &corev1.SeccompProfile{ + Type: corev1.SeccompProfileTypeRuntimeDefault, + }, + }, + Volumes: []corev1.Volume{ + { + Name: "tmp", + VolumeSource: corev1.VolumeSource{ + EmptyDir: &corev1.EmptyDirVolumeSource{}, + }, + }, + { + Name: "cache", + VolumeSource: corev1.VolumeSource{ + EmptyDir: &corev1.EmptyDirVolumeSource{}, + }, + }, + }, + VolumeMounts: []corev1.VolumeMount{ + { + Name: "tmp", + MountPath: "/tmp", + }, + { + Name: "cache", + MountPath: "/cache", + }, + }, + }, + }, + }, + }, + } + + modelConfig := &v1alpha2.ModelConfig{ + ObjectMeta: metav1.ObjectMeta{ + Name: "test-model", + Namespace: "test", + }, + Spec: v1alpha2.ModelConfigSpec{ + Provider: "OpenAI", + Model: "gpt-4o", + }, + } + + scheme := schemev1.Scheme + err := v1alpha2.AddToScheme(scheme) + require.NoError(t, err) + + kubeClient := fake.NewClientBuilder(). + WithScheme(scheme). + WithObjects(agent, modelConfig). + Build() + + defaultModel := types.NamespacedName{ + Namespace: "test", + Name: "test-model", + } + translatorInstance := translator.NewAdkApiTranslator(kubeClient, defaultModel, nil) + + result, err := translatorInstance.TranslateAgent(ctx, agent) + require.NoError(t, err) + + var deployment *appsv1.Deployment + for _, obj := range result.Manifest { + if dep, ok := obj.(*appsv1.Deployment); ok { + deployment = dep + break + } + } + require.NotNil(t, deployment) + podTemplate := &deployment.Spec.Template + + // Verify PSS restricted compliance + containerSecurityContext := podTemplate.Spec.Containers[0].SecurityContext + require.NotNil(t, containerSecurityContext) + assert.True(t, *containerSecurityContext.ReadOnlyRootFilesystem, "ReadOnlyRootFilesystem should be true") + + // Verify writable volumes are present + volumeNames := make(map[string]bool) + for _, vol := range podTemplate.Spec.Volumes { + volumeNames[vol.Name] = true + } + assert.True(t, volumeNames["tmp"], "tmp volume should be present") + assert.True(t, volumeNames["cache"], "cache volume should be present") + + // Verify volume mounts + var container = podTemplate.Spec.Containers[0] + mountPaths := make(map[string]bool) + for _, mount := range container.VolumeMounts { + mountPaths[mount.MountPath] = true + } + assert.True(t, mountPaths["/tmp"], "/tmp mount should be present") + assert.True(t, mountPaths["/cache"], "/cache mount should be present") +} diff --git a/go/internal/httpserver/auth/oauth2.go b/go/internal/httpserver/auth/oauth2.go new file mode 100644 index 000000000..f27a57001 --- /dev/null +++ b/go/internal/httpserver/auth/oauth2.go @@ -0,0 +1,446 @@ +/* +Copyright 2025. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package auth + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "net/http" + "net/url" + "strings" + "sync" + "time" + + "github.com/lestrrat-go/jwx/v2/jwk" + "github.com/lestrrat-go/jwx/v2/jwt" + + "github.com/kagent-dev/kagent/go/pkg/auth" +) + +var ( + // ErrMissingToken is returned when no bearer token is provided + ErrMissingToken = errors.New("missing bearer token") + // ErrInvalidToken is returned when the token is invalid or expired + ErrInvalidToken = errors.New("invalid or expired token") + // ErrTokenValidation is returned when token validation fails + ErrTokenValidation = errors.New("token validation failed") + // ErrDiscoveryFailed is returned when OIDC discovery fails + ErrDiscoveryFailed = errors.New("OIDC discovery failed") +) + +// OAuth2Config contains configuration for the OAuth2 authenticator +type OAuth2Config struct { + // IssuerURL is the OIDC issuer URL (e.g., https://accounts.google.com) + IssuerURL string + // ClientID is the OAuth2 client ID for token validation + ClientID string + // Audience is the expected audience claim in the token + Audience string + // RequiredScopes are the scopes that must be present in the token + RequiredScopes []string + // UserIDClaim is the claim to use as the user ID (default: "sub") + UserIDClaim string + // RolesClaim is the claim to use for roles (default: "roles") + RolesClaim string + // SkipIssuerValidation disables issuer validation (not recommended for production) + SkipIssuerValidation bool + // SkipExpiryValidation disables expiry validation (not recommended for production) + SkipExpiryValidation bool + // HTTPClient is an optional custom HTTP client for OIDC discovery + HTTPClient *http.Client + // JWKSCacheDuration is how long to cache the JWKS (default: 1 hour) + JWKSCacheDuration time.Duration +} + +// oidcDiscoveryDocument represents the OIDC discovery document +type oidcDiscoveryDocument struct { + Issuer string `json:"issuer"` + AuthorizationEndpoint string `json:"authorization_endpoint"` + TokenEndpoint string `json:"token_endpoint"` + JWKSURI string `json:"jwks_uri"` + UserInfoEndpoint string `json:"userinfo_endpoint,omitempty"` + ScopesSupported []string `json:"scopes_supported,omitempty"` + ClaimsSupported []string `json:"claims_supported,omitempty"` +} + +// OAuth2Authenticator implements the auth.AuthProvider interface for OAuth2/OIDC +type OAuth2Authenticator struct { + config OAuth2Config + httpClient *http.Client + + // JWKS cache + jwksMu sync.RWMutex + jwksCache jwk.Set + jwksCacheTime time.Time + + // Discovery cache + discoveryMu sync.RWMutex + discoveryCache *oidcDiscoveryDocument + discoveryCacheTime time.Time +} + +// OAuth2Session implements the auth.Session interface +type OAuth2Session struct { + principal auth.Principal + accessToken string + claims map[string]interface{} + expiresAt time.Time +} + +func (s *OAuth2Session) Principal() auth.Principal { + return s.principal +} + +// Claims returns the raw JWT claims +func (s *OAuth2Session) Claims() map[string]interface{} { + return s.claims +} + +// AccessToken returns the original access token +func (s *OAuth2Session) AccessToken() string { + return s.accessToken +} + +// ExpiresAt returns when the session expires +func (s *OAuth2Session) ExpiresAt() time.Time { + return s.expiresAt +} + +// NewOAuth2Authenticator creates a new OAuth2 authenticator +func NewOAuth2Authenticator(config OAuth2Config) (*OAuth2Authenticator, error) { + if config.IssuerURL == "" { + return nil, fmt.Errorf("issuer URL is required") + } + + // Set defaults + if config.UserIDClaim == "" { + config.UserIDClaim = "sub" + } + if config.RolesClaim == "" { + config.RolesClaim = "roles" + } + if config.JWKSCacheDuration == 0 { + config.JWKSCacheDuration = time.Hour + } + + httpClient := config.HTTPClient + if httpClient == nil { + httpClient = &http.Client{ + Timeout: 30 * time.Second, + } + } + + return &OAuth2Authenticator{ + config: config, + httpClient: httpClient, + }, nil +} + +// Authenticate validates the bearer token and returns a session +func (a *OAuth2Authenticator) Authenticate(ctx context.Context, reqHeaders http.Header, query url.Values) (auth.Session, error) { + // Extract bearer token from Authorization header + token := extractBearerToken(reqHeaders) + if token == "" { + // Also check query parameter for WebSocket connections + token = query.Get("access_token") + } + if token == "" { + return nil, ErrMissingToken + } + + // Validate the token + session, err := a.validateToken(ctx, token) + if err != nil { + return nil, fmt.Errorf("%w: %v", ErrInvalidToken, err) + } + + return session, nil +} + +// UpstreamAuth adds authentication to upstream requests +func (a *OAuth2Authenticator) UpstreamAuth(r *http.Request, session auth.Session, upstreamPrincipal auth.Principal) error { + if session == nil { + return nil + } + + // Forward the user ID in a header + if session.Principal().User.ID != "" { + r.Header.Set("X-User-Id", session.Principal().User.ID) + } + + // If we have an OAuth2Session, forward the access token + if oauth2Session, ok := session.(*OAuth2Session); ok && oauth2Session.accessToken != "" { + r.Header.Set("Authorization", "Bearer "+oauth2Session.accessToken) + } + + return nil +} + +// validateToken validates a JWT token and extracts claims +func (a *OAuth2Authenticator) validateToken(ctx context.Context, tokenString string) (*OAuth2Session, error) { + // Get JWKS for validation + keySet, err := a.getJWKS(ctx) + if err != nil { + return nil, fmt.Errorf("failed to get JWKS: %w", err) + } + + // Build validation options + validateOpts := []jwt.ValidateOption{ + jwt.WithContext(ctx), + } + + // If skipping expiry validation, reset default validators to avoid automatic exp check + if a.config.SkipExpiryValidation { + validateOpts = append(validateOpts, jwt.WithResetValidators(true)) + } + + if !a.config.SkipIssuerValidation && a.config.IssuerURL != "" { + validateOpts = append(validateOpts, jwt.WithIssuer(a.config.IssuerURL)) + } + + if a.config.Audience != "" { + validateOpts = append(validateOpts, jwt.WithAudience(a.config.Audience)) + } + + // Parse and validate the token + // Disable automatic validation during parsing so we can control it + parseOpts := []jwt.ParseOption{ + jwt.WithKeySet(keySet), + jwt.WithValidate(false), // We'll validate manually after parsing + } + + token, err := jwt.ParseString(tokenString, parseOpts...) + if err != nil { + return nil, fmt.Errorf("failed to parse token: %w", err) + } + + // Validate the token + if err := jwt.Validate(token, validateOpts...); err != nil { + return nil, fmt.Errorf("token validation failed: %w", err) + } + + // Check required scopes if configured + if len(a.config.RequiredScopes) > 0 { + if err := a.validateScopes(token); err != nil { + return nil, err + } + } + + // Extract claims for session + claims := make(map[string]interface{}) + for iter := token.Iterate(ctx); iter.Next(ctx); { + pair := iter.Pair() + claims[pair.Key.(string)] = pair.Value + } + + // Extract user ID + userID := "" + if idClaim, ok := claims[a.config.UserIDClaim]; ok { + if id, ok := idClaim.(string); ok { + userID = id + } + } + + // Extract roles + var roles []string + if rolesClaim, ok := claims[a.config.RolesClaim]; ok { + switch r := rolesClaim.(type) { + case []interface{}: + for _, role := range r { + if s, ok := role.(string); ok { + roles = append(roles, s) + } + } + case []string: + roles = r + case string: + // Some providers return roles as a space-separated string + roles = strings.Fields(r) + } + } + + // Get expiration time + expiresAt := token.Expiration() + + return &OAuth2Session{ + principal: auth.Principal{ + User: auth.User{ + ID: userID, + Roles: roles, + }, + }, + accessToken: tokenString, + claims: claims, + expiresAt: expiresAt, + }, nil +} + +// validateScopes checks if the token has all required scopes +func (a *OAuth2Authenticator) validateScopes(token jwt.Token) error { + scopeClaim, ok := token.Get("scope") + if !ok { + // Try "scp" claim (used by some providers like Azure AD) + scopeClaim, ok = token.Get("scp") + } + + if !ok { + return fmt.Errorf("token missing scope claim") + } + + var tokenScopes []string + switch s := scopeClaim.(type) { + case string: + tokenScopes = strings.Fields(s) + case []interface{}: + for _, scope := range s { + if str, ok := scope.(string); ok { + tokenScopes = append(tokenScopes, str) + } + } + case []string: + tokenScopes = s + default: + return fmt.Errorf("unexpected scope claim type: %T", scopeClaim) + } + + scopeSet := make(map[string]bool) + for _, s := range tokenScopes { + scopeSet[s] = true + } + + for _, required := range a.config.RequiredScopes { + if !scopeSet[required] { + return fmt.Errorf("missing required scope: %s", required) + } + } + + return nil +} + +// getJWKS retrieves the JWKS, using cache when available +func (a *OAuth2Authenticator) getJWKS(ctx context.Context) (jwk.Set, error) { + a.jwksMu.RLock() + if a.jwksCache != nil && time.Since(a.jwksCacheTime) < a.config.JWKSCacheDuration { + defer a.jwksMu.RUnlock() + return a.jwksCache, nil + } + a.jwksMu.RUnlock() + + // Need to refresh cache + a.jwksMu.Lock() + defer a.jwksMu.Unlock() + + // Double-check after acquiring write lock + if a.jwksCache != nil && time.Since(a.jwksCacheTime) < a.config.JWKSCacheDuration { + return a.jwksCache, nil + } + + // Get JWKS URI from discovery + discovery, err := a.getDiscoveryDocument(ctx) + if err != nil { + return nil, err + } + + // Fetch JWKS + keySet, err := jwk.Fetch(ctx, discovery.JWKSURI, jwk.WithHTTPClient(a.httpClient)) + if err != nil { + return nil, fmt.Errorf("failed to fetch JWKS: %w", err) + } + + a.jwksCache = keySet + a.jwksCacheTime = time.Now() + + return keySet, nil +} + +// getDiscoveryDocument retrieves the OIDC discovery document +func (a *OAuth2Authenticator) getDiscoveryDocument(ctx context.Context) (*oidcDiscoveryDocument, error) { + a.discoveryMu.RLock() + if a.discoveryCache != nil && time.Since(a.discoveryCacheTime) < a.config.JWKSCacheDuration { + defer a.discoveryMu.RUnlock() + return a.discoveryCache, nil + } + a.discoveryMu.RUnlock() + + a.discoveryMu.Lock() + defer a.discoveryMu.Unlock() + + // Double-check after acquiring write lock + if a.discoveryCache != nil && time.Since(a.discoveryCacheTime) < a.config.JWKSCacheDuration { + return a.discoveryCache, nil + } + + // Build discovery URL + issuerURL := strings.TrimSuffix(a.config.IssuerURL, "/") + discoveryURL := issuerURL + "/.well-known/openid-configuration" + + req, err := http.NewRequestWithContext(ctx, http.MethodGet, discoveryURL, nil) + if err != nil { + return nil, fmt.Errorf("failed to create discovery request: %w", err) + } + + resp, err := a.httpClient.Do(req) + if err != nil { + return nil, fmt.Errorf("%w: %v", ErrDiscoveryFailed, err) + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusOK { + return nil, fmt.Errorf("%w: status %d", ErrDiscoveryFailed, resp.StatusCode) + } + + var discovery oidcDiscoveryDocument + if err := json.NewDecoder(resp.Body).Decode(&discovery); err != nil { + return nil, fmt.Errorf("failed to decode discovery document: %w", err) + } + + a.discoveryCache = &discovery + a.discoveryCacheTime = time.Now() + + return &discovery, nil +} + +// extractBearerToken extracts the bearer token from the Authorization header +func extractBearerToken(headers http.Header) string { + authHeader := headers.Get("Authorization") + if authHeader == "" { + return "" + } + + // Check for Bearer prefix (case-insensitive) + if len(authHeader) > 7 && strings.EqualFold(authHeader[:7], "bearer ") { + return authHeader[7:] + } + + return "" +} + +// ClearCache clears the JWKS and discovery caches +func (a *OAuth2Authenticator) ClearCache() { + a.jwksMu.Lock() + a.jwksCache = nil + a.jwksMu.Unlock() + + a.discoveryMu.Lock() + a.discoveryCache = nil + a.discoveryMu.Unlock() +} + +// Ensure OAuth2Authenticator implements auth.AuthProvider +var _ auth.AuthProvider = (*OAuth2Authenticator)(nil) diff --git a/go/internal/httpserver/auth/oauth2_test.go b/go/internal/httpserver/auth/oauth2_test.go new file mode 100644 index 000000000..180540777 --- /dev/null +++ b/go/internal/httpserver/auth/oauth2_test.go @@ -0,0 +1,736 @@ +/* +Copyright 2025. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +package auth + +import ( + "context" + "crypto/rand" + "crypto/rsa" + "encoding/json" + "net/http" + "net/http/httptest" + "net/url" + "testing" + "time" + + "github.com/lestrrat-go/jwx/v2/jwa" + "github.com/lestrrat-go/jwx/v2/jwk" + "github.com/lestrrat-go/jwx/v2/jwt" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "github.com/kagent-dev/kagent/go/pkg/auth" +) + +// testOIDCServer creates a mock OIDC server for testing +type testOIDCServer struct { + server *httptest.Server + privateKey *rsa.PrivateKey + keyID string +} + +func newTestOIDCServer(t *testing.T) *testOIDCServer { + t.Helper() + + // Generate RSA key pair for signing + privateKey, err := rsa.GenerateKey(rand.Reader, 2048) + require.NoError(t, err) + + keyID := "test-key-1" + + // Create JWK from public key + publicJWK, err := jwk.FromRaw(privateKey.PublicKey) + require.NoError(t, err) + err = publicJWK.Set(jwk.KeyIDKey, keyID) + require.NoError(t, err) + err = publicJWK.Set(jwk.AlgorithmKey, jwa.RS256) + require.NoError(t, err) + err = publicJWK.Set(jwk.KeyUsageKey, "sig") + require.NoError(t, err) + + keySet := jwk.NewSet() + err = keySet.AddKey(publicJWK) + require.NoError(t, err) + + ts := &testOIDCServer{ + privateKey: privateKey, + keyID: keyID, + } + + // Create test server + mux := http.NewServeMux() + + // OIDC discovery endpoint + mux.HandleFunc("/.well-known/openid-configuration", func(w http.ResponseWriter, r *http.Request) { + discovery := map[string]interface{}{ + "issuer": ts.server.URL, + "authorization_endpoint": ts.server.URL + "/authorize", + "token_endpoint": ts.server.URL + "/token", + "jwks_uri": ts.server.URL + "/jwks", + "userinfo_endpoint": ts.server.URL + "/userinfo", + } + w.Header().Set("Content-Type", "application/json") + json.NewEncoder(w).Encode(discovery) + }) + + // JWKS endpoint + mux.HandleFunc("/jwks", func(w http.ResponseWriter, r *http.Request) { + w.Header().Set("Content-Type", "application/json") + json.NewEncoder(w).Encode(keySet) + }) + + ts.server = httptest.NewServer(mux) + + return ts +} + +func (ts *testOIDCServer) Close() { + ts.server.Close() +} + +func (ts *testOIDCServer) URL() string { + return ts.server.URL +} + +func (ts *testOIDCServer) createToken(t *testing.T, claims map[string]interface{}, expiry time.Duration) string { + t.Helper() + + builder := jwt.NewBuilder() + + // Set standard claims + builder.Issuer(ts.server.URL) + builder.IssuedAt(time.Now()) + builder.Expiration(time.Now().Add(expiry)) + + // Set custom claims + for k, v := range claims { + builder.Claim(k, v) + } + + token, err := builder.Build() + require.NoError(t, err) + + // Create signing key + privateJWK, err := jwk.FromRaw(ts.privateKey) + require.NoError(t, err) + err = privateJWK.Set(jwk.KeyIDKey, ts.keyID) + require.NoError(t, err) + err = privateJWK.Set(jwk.AlgorithmKey, jwa.RS256) + require.NoError(t, err) + + // Sign the token + signed, err := jwt.Sign(token, jwt.WithKey(jwa.RS256, privateJWK)) + require.NoError(t, err) + + return string(signed) +} + +func TestNewOAuth2Authenticator(t *testing.T) { + tests := []struct { + name string + config OAuth2Config + expectError bool + }{ + { + name: "valid config", + config: OAuth2Config{ + IssuerURL: "https://example.com", + ClientID: "test-client", + }, + expectError: false, + }, + { + name: "missing issuer URL", + config: OAuth2Config{ + ClientID: "test-client", + }, + expectError: true, + }, + { + name: "default claims are set", + config: OAuth2Config{ + IssuerURL: "https://example.com", + }, + expectError: false, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + auth, err := NewOAuth2Authenticator(tt.config) + if tt.expectError { + assert.Error(t, err) + assert.Nil(t, auth) + } else { + assert.NoError(t, err) + assert.NotNil(t, auth) + } + }) + } +} + +func TestOAuth2Authenticator_Authenticate(t *testing.T) { + ts := newTestOIDCServer(t) + defer ts.Close() + + authenticator, err := NewOAuth2Authenticator(OAuth2Config{ + IssuerURL: ts.URL(), + UserIDClaim: "sub", + RolesClaim: "roles", + }) + require.NoError(t, err) + + tests := []struct { + name string + setupHeaders func() http.Header + setupQuery func() url.Values + expectedUserID string + expectedRoles []string + expectError bool + errorContains string + }{ + { + name: "valid token in Authorization header", + setupHeaders: func() http.Header { + token := ts.createToken(t, map[string]interface{}{ + "sub": "user123", + "roles": []interface{}{"admin", "user"}, + }, time.Hour) + h := http.Header{} + h.Set("Authorization", "Bearer "+token) + return h + }, + setupQuery: func() url.Values { return url.Values{} }, + expectedUserID: "user123", + expectedRoles: []string{"admin", "user"}, + expectError: false, + }, + { + name: "valid token in query parameter", + setupHeaders: func() http.Header { + return http.Header{} + }, + setupQuery: func() url.Values { + token := ts.createToken(t, map[string]interface{}{ + "sub": "user456", + }, time.Hour) + q := url.Values{} + q.Set("access_token", token) + return q + }, + expectedUserID: "user456", + expectError: false, + }, + { + name: "missing token", + setupHeaders: func() http.Header { + return http.Header{} + }, + setupQuery: func() url.Values { return url.Values{} }, + expectError: true, + errorContains: "missing bearer token", + }, + { + name: "expired token", + setupHeaders: func() http.Header { + token := ts.createToken(t, map[string]interface{}{ + "sub": "user789", + }, -time.Hour) // Already expired + h := http.Header{} + h.Set("Authorization", "Bearer "+token) + return h + }, + setupQuery: func() url.Values { return url.Values{} }, + expectError: true, + errorContains: "invalid or expired token", + }, + { + name: "invalid token format", + setupHeaders: func() http.Header { + h := http.Header{} + h.Set("Authorization", "Bearer invalid-token") + return h + }, + setupQuery: func() url.Values { return url.Values{} }, + expectError: true, + errorContains: "invalid or expired token", + }, + { + name: "bearer prefix case insensitive", + setupHeaders: func() http.Header { + token := ts.createToken(t, map[string]interface{}{ + "sub": "user-case", + }, time.Hour) + h := http.Header{} + h.Set("Authorization", "BEARER "+token) + return h + }, + setupQuery: func() url.Values { return url.Values{} }, + expectedUserID: "user-case", + expectError: false, + }, + { + name: "roles as space-separated string", + setupHeaders: func() http.Header { + token := ts.createToken(t, map[string]interface{}{ + "sub": "user-roles", + "roles": "role1 role2 role3", + }, time.Hour) + h := http.Header{} + h.Set("Authorization", "Bearer "+token) + return h + }, + setupQuery: func() url.Values { return url.Values{} }, + expectedUserID: "user-roles", + expectedRoles: []string{"role1", "role2", "role3"}, + expectError: false, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + ctx := context.Background() + session, err := authenticator.Authenticate(ctx, tt.setupHeaders(), tt.setupQuery()) + + if tt.expectError { + assert.Error(t, err) + if tt.errorContains != "" { + assert.Contains(t, err.Error(), tt.errorContains) + } + return + } + + require.NoError(t, err) + require.NotNil(t, session) + + principal := session.Principal() + assert.Equal(t, tt.expectedUserID, principal.User.ID) + if len(tt.expectedRoles) > 0 { + assert.Equal(t, tt.expectedRoles, principal.User.Roles) + } + + // Verify it's an OAuth2Session + oauth2Session, ok := session.(*OAuth2Session) + require.True(t, ok) + assert.NotEmpty(t, oauth2Session.AccessToken()) + assert.NotNil(t, oauth2Session.Claims()) + }) + } +} + +func TestOAuth2Authenticator_RequiredScopes(t *testing.T) { + ts := newTestOIDCServer(t) + defer ts.Close() + + authenticator, err := NewOAuth2Authenticator(OAuth2Config{ + IssuerURL: ts.URL(), + RequiredScopes: []string{"read", "write"}, + }) + require.NoError(t, err) + + tests := []struct { + name string + scopes interface{} + expectError bool + }{ + { + name: "has all required scopes as string", + scopes: "read write delete", + expectError: false, + }, + { + name: "has all required scopes as array", + scopes: []interface{}{"read", "write", "admin"}, + expectError: false, + }, + { + name: "missing required scope", + scopes: "read", + expectError: true, + }, + { + name: "no scopes", + scopes: nil, + expectError: true, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + claims := map[string]interface{}{ + "sub": "user123", + } + if tt.scopes != nil { + claims["scope"] = tt.scopes + } + + token := ts.createToken(t, claims, time.Hour) + headers := http.Header{} + headers.Set("Authorization", "Bearer "+token) + + ctx := context.Background() + _, err := authenticator.Authenticate(ctx, headers, url.Values{}) + + if tt.expectError { + assert.Error(t, err) + } else { + assert.NoError(t, err) + } + }) + } +} + +func TestOAuth2Authenticator_AudienceValidation(t *testing.T) { + ts := newTestOIDCServer(t) + defer ts.Close() + + authenticator, err := NewOAuth2Authenticator(OAuth2Config{ + IssuerURL: ts.URL(), + Audience: "my-api", + }) + require.NoError(t, err) + + tests := []struct { + name string + audience interface{} + expectError bool + }{ + { + name: "matching audience", + audience: "my-api", + expectError: false, + }, + { + name: "audience in array", + audience: []interface{}{"other-api", "my-api"}, + expectError: false, + }, + { + name: "wrong audience", + audience: "wrong-api", + expectError: true, + }, + { + name: "missing audience", + audience: nil, + expectError: true, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + claims := map[string]interface{}{ + "sub": "user123", + } + if tt.audience != nil { + claims["aud"] = tt.audience + } + + token := ts.createToken(t, claims, time.Hour) + headers := http.Header{} + headers.Set("Authorization", "Bearer "+token) + + ctx := context.Background() + _, err := authenticator.Authenticate(ctx, headers, url.Values{}) + + if tt.expectError { + assert.Error(t, err) + } else { + assert.NoError(t, err) + } + }) + } +} + +func TestOAuth2Authenticator_UpstreamAuth(t *testing.T) { + authenticator, err := NewOAuth2Authenticator(OAuth2Config{ + IssuerURL: "https://example.com", + }) + require.NoError(t, err) + + tests := []struct { + name string + session auth.Session + expectedUserID string + expectedAuth string + }{ + { + name: "forwards user ID and token", + session: &OAuth2Session{ + principal: auth.Principal{ + User: auth.User{ + ID: "user123", + }, + }, + accessToken: "test-token", + }, + expectedUserID: "user123", + expectedAuth: "Bearer test-token", + }, + { + name: "nil session", + session: nil, + expectedUserID: "", + expectedAuth: "", + }, + { + name: "session without token", + session: &OAuth2Session{ + principal: auth.Principal{ + User: auth.User{ + ID: "user456", + }, + }, + }, + expectedUserID: "user456", + expectedAuth: "", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + req := httptest.NewRequest(http.MethodGet, "/test", nil) + + err := authenticator.UpstreamAuth(req, tt.session, auth.Principal{}) + require.NoError(t, err) + + if tt.expectedUserID != "" { + assert.Equal(t, tt.expectedUserID, req.Header.Get("X-User-Id")) + } else { + assert.Empty(t, req.Header.Get("X-User-Id")) + } + + if tt.expectedAuth != "" { + assert.Equal(t, tt.expectedAuth, req.Header.Get("Authorization")) + } else { + assert.Empty(t, req.Header.Get("Authorization")) + } + }) + } +} + +func TestExtractBearerToken(t *testing.T) { + tests := []struct { + name string + authorization string + expected string + }{ + { + name: "valid bearer token", + authorization: "Bearer abc123", + expected: "abc123", + }, + { + name: "bearer lowercase", + authorization: "bearer xyz789", + expected: "xyz789", + }, + { + name: "BEARER uppercase", + authorization: "BEARER TOKEN123", + expected: "TOKEN123", + }, + { + name: "empty header", + authorization: "", + expected: "", + }, + { + name: "basic auth", + authorization: "Basic dXNlcjpwYXNz", + expected: "", + }, + { + name: "no token after bearer", + authorization: "Bearer", + expected: "", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + headers := http.Header{} + if tt.authorization != "" { + headers.Set("Authorization", tt.authorization) + } + result := extractBearerToken(headers) + assert.Equal(t, tt.expected, result) + }) + } +} + +func TestOAuth2Session(t *testing.T) { + session := &OAuth2Session{ + principal: auth.Principal{ + User: auth.User{ + ID: "test-user", + Roles: []string{"admin"}, + }, + Agent: auth.Agent{ + ID: "agent-1", + }, + }, + accessToken: "test-token", + claims: map[string]interface{}{ + "sub": "test-user", + "email": "test@example.com", + }, + expiresAt: time.Now().Add(time.Hour), + } + + t.Run("Principal", func(t *testing.T) { + p := session.Principal() + assert.Equal(t, "test-user", p.User.ID) + assert.Equal(t, []string{"admin"}, p.User.Roles) + assert.Equal(t, "agent-1", p.Agent.ID) + }) + + t.Run("AccessToken", func(t *testing.T) { + assert.Equal(t, "test-token", session.AccessToken()) + }) + + t.Run("Claims", func(t *testing.T) { + claims := session.Claims() + assert.Equal(t, "test-user", claims["sub"]) + assert.Equal(t, "test@example.com", claims["email"]) + }) + + t.Run("ExpiresAt", func(t *testing.T) { + assert.False(t, session.ExpiresAt().IsZero()) + assert.True(t, session.ExpiresAt().After(time.Now())) + }) +} + +func TestOAuth2Authenticator_CacheClearing(t *testing.T) { + ts := newTestOIDCServer(t) + defer ts.Close() + + authenticator, err := NewOAuth2Authenticator(OAuth2Config{ + IssuerURL: ts.URL(), + JWKSCacheDuration: time.Hour, + }) + require.NoError(t, err) + + // Make a request to populate the cache + token := ts.createToken(t, map[string]interface{}{ + "sub": "user123", + }, time.Hour) + headers := http.Header{} + headers.Set("Authorization", "Bearer "+token) + + ctx := context.Background() + _, err = authenticator.Authenticate(ctx, headers, url.Values{}) + require.NoError(t, err) + + // Clear the cache + authenticator.ClearCache() + + // Verify cache is cleared by checking internal state + authenticator.jwksMu.RLock() + assert.Nil(t, authenticator.jwksCache) + authenticator.jwksMu.RUnlock() + + authenticator.discoveryMu.RLock() + assert.Nil(t, authenticator.discoveryCache) + authenticator.discoveryMu.RUnlock() + + // Should still work after cache is cleared (will refetch) + _, err = authenticator.Authenticate(ctx, headers, url.Values{}) + require.NoError(t, err) +} + +func TestOAuth2Authenticator_SkipValidation(t *testing.T) { + ts := newTestOIDCServer(t) + defer ts.Close() + + t.Run("skip expiry validation", func(t *testing.T) { + authenticator, err := NewOAuth2Authenticator(OAuth2Config{ + IssuerURL: ts.URL(), + SkipExpiryValidation: true, + }) + require.NoError(t, err) + + // Create an expired token + token := ts.createToken(t, map[string]interface{}{ + "sub": "user123", + }, -time.Hour) + + headers := http.Header{} + headers.Set("Authorization", "Bearer "+token) + + ctx := context.Background() + session, err := authenticator.Authenticate(ctx, headers, url.Values{}) + require.NoError(t, err) + assert.Equal(t, "user123", session.Principal().User.ID) + }) + + t.Run("skip issuer validation", func(t *testing.T) { + authenticator, err := NewOAuth2Authenticator(OAuth2Config{ + IssuerURL: ts.URL(), + SkipIssuerValidation: true, + }) + require.NoError(t, err) + + // The token issuer from ts.createToken matches ts.URL(), + // so this should work regardless + token := ts.createToken(t, map[string]interface{}{ + "sub": "user456", + }, time.Hour) + + headers := http.Header{} + headers.Set("Authorization", "Bearer "+token) + + ctx := context.Background() + session, err := authenticator.Authenticate(ctx, headers, url.Values{}) + require.NoError(t, err) + assert.Equal(t, "user456", session.Principal().User.ID) + }) +} + +func TestOAuth2Authenticator_CustomClaims(t *testing.T) { + ts := newTestOIDCServer(t) + defer ts.Close() + + authenticator, err := NewOAuth2Authenticator(OAuth2Config{ + IssuerURL: ts.URL(), + UserIDClaim: "email", + RolesClaim: "groups", + }) + require.NoError(t, err) + + token := ts.createToken(t, map[string]interface{}{ + "sub": "user123", + "email": "test@example.com", + "groups": []interface{}{"developers", "admins"}, + }, time.Hour) + + headers := http.Header{} + headers.Set("Authorization", "Bearer "+token) + + ctx := context.Background() + session, err := authenticator.Authenticate(ctx, headers, url.Values{}) + require.NoError(t, err) + + // Should use email as user ID instead of sub + assert.Equal(t, "test@example.com", session.Principal().User.ID) + // Should use groups as roles instead of roles + assert.Equal(t, []string{"developers", "admins"}, session.Principal().User.Roles) +} + +// Verify interface compliance +var _ auth.Session = (*OAuth2Session)(nil) +var _ auth.AuthProvider = (*OAuth2Authenticator)(nil) diff --git a/go/internal/httpserver/middleware.go b/go/internal/httpserver/middleware.go index 23170352d..bc25eac97 100644 --- a/go/internal/httpserver/middleware.go +++ b/go/internal/httpserver/middleware.go @@ -2,12 +2,163 @@ package httpserver import ( "net/http" + "os" + "regexp" + "strings" "time" + "github.com/google/uuid" "github.com/kagent-dev/kagent/go/internal/httpserver/handlers" + "github.com/kagent-dev/kagent/go/pkg/auth" ctrllog "sigs.k8s.io/controller-runtime/pkg/log" ) +// AuditLogConfig holds configuration for the audit logging middleware +type AuditLogConfig struct { + // Enabled controls whether audit logging is active + Enabled bool + // LogLevel controls the verbosity level (0=Info, 1+=Debug) + LogLevel int + // IncludeHeaders specifies which headers to include in audit logs (for compliance) + IncludeHeaders []string +} + +// DefaultAuditLogConfig returns the default audit logging configuration +func DefaultAuditLogConfig() AuditLogConfig { + enabled := os.Getenv("KAGENT_AUDIT_LOG_ENABLED") != "false" + return AuditLogConfig{ + Enabled: enabled, + LogLevel: 0, + IncludeHeaders: []string{}, + } +} + +// requestIDKey is the context key for request ID +type requestIDKey struct{} + +// namespacePattern matches namespace in API paths like /api/agents/{namespace}/{name} +var namespacePattern = regexp.MustCompile(`^/api/[^/]+/([^/]+)(?:/|$)`) + +// auditLoggingMiddleware creates a middleware for structured audit logging +// It logs compliance-ready audit trail with user, namespace, action, result, and duration +func auditLoggingMiddleware(config AuditLogConfig) func(http.Handler) http.Handler { + return func(next http.Handler) http.Handler { + return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + if !config.Enabled { + next.ServeHTTP(w, r) + return + } + + start := time.Now() + + // Generate or extract request ID for correlation + requestID := r.Header.Get("X-Request-ID") + if requestID == "" { + requestID = uuid.New().String() + } + + // Extract user information from auth session + userID := "anonymous" + userRoles := []string{} + if session, ok := auth.AuthSessionFrom(r.Context()); ok && session != nil { + principal := session.Principal() + if principal.User.ID != "" { + userID = principal.User.ID + } + userRoles = principal.User.Roles + } + + // Extract namespace from path or header + namespace := extractNamespace(r) + + // Build action string (HTTP method + path pattern) + action := r.Method + " " + r.URL.Path + + // Create audit logger with structured fields + auditLog := ctrllog.Log.WithName("audit").WithValues( + "request_id", requestID, + "timestamp", start.UTC().Format(time.RFC3339Nano), + "user", userID, + "user_roles", userRoles, + "namespace", namespace, + "action", action, + "method", r.Method, + "path", r.URL.Path, + "remote_addr", r.RemoteAddr, + "user_agent", r.Header.Get("User-Agent"), + ) + + // Include specified headers for compliance if configured + for _, header := range config.IncludeHeaders { + if val := r.Header.Get(header); val != "" { + auditLog = auditLog.WithValues("header_"+strings.ToLower(strings.ReplaceAll(header, "-", "_")), val) + } + } + + // Wrap response writer to capture status code + ww := newStatusResponseWriter(w) + + // Log request start at configured level + auditLog.V(config.LogLevel).Info("Audit: request started") + + // Serve the request + next.ServeHTTP(ww, r) + + // Calculate duration + duration := time.Since(start) + + // Determine result category for compliance + resultCategory := categorizeResult(ww.status) + + // Log request completion with full audit trail + auditLog.Info("Audit: request completed", + "status", ww.status, + "result", resultCategory, + "duration_ms", duration.Milliseconds(), + "duration", duration.String(), + ) + }) + } +} + +// extractNamespace extracts the namespace from the request path or headers +func extractNamespace(r *http.Request) string { + // First, try to extract from the URL path pattern + // Patterns like /api/agents/{namespace}/{name} or /api/sessions/agent/{namespace}/{name} + matches := namespacePattern.FindStringSubmatch(r.URL.Path) + if len(matches) > 1 { + return matches[1] + } + + // Try query parameter + if ns := r.URL.Query().Get("namespace"); ns != "" { + return ns + } + + // Try header + if ns := r.Header.Get("X-Namespace"); ns != "" { + return ns + } + + return "unknown" +} + +// categorizeResult returns a human-readable result category for the status code +func categorizeResult(status int) string { + switch { + case status >= 200 && status < 300: + return "success" + case status >= 300 && status < 400: + return "redirect" + case status >= 400 && status < 500: + return "client_error" + case status >= 500: + return "server_error" + default: + return "unknown" + } +} + func loggingMiddleware(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { start := time.Now() diff --git a/go/internal/httpserver/middleware_audit_test.go b/go/internal/httpserver/middleware_audit_test.go new file mode 100644 index 000000000..6ef4e24c4 --- /dev/null +++ b/go/internal/httpserver/middleware_audit_test.go @@ -0,0 +1,357 @@ +package httpserver + +import ( + "net/http" + "net/http/httptest" + "testing" + + "github.com/kagent-dev/kagent/go/pkg/auth" +) + +// mockSession implements auth.Session for testing +type mockSession struct { + principal auth.Principal +} + +func (m *mockSession) Principal() auth.Principal { + return m.principal +} + +func TestAuditLoggingMiddleware_Disabled(t *testing.T) { + config := AuditLogConfig{ + Enabled: false, + } + + handlerCalled := false + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + handlerCalled = true + w.WriteHeader(http.StatusOK) + }) + + middleware := auditLoggingMiddleware(config) + wrappedHandler := middleware(handler) + + req := httptest.NewRequest(http.MethodGet, "/api/agents/default/test", nil) + rec := httptest.NewRecorder() + + wrappedHandler.ServeHTTP(rec, req) + + if !handlerCalled { + t.Error("Expected handler to be called when audit logging is disabled") + } + if rec.Code != http.StatusOK { + t.Errorf("Expected status 200, got %d", rec.Code) + } +} + +func TestAuditLoggingMiddleware_Enabled(t *testing.T) { + config := AuditLogConfig{ + Enabled: true, + LogLevel: 0, + } + + handlerCalled := false + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + handlerCalled = true + w.WriteHeader(http.StatusCreated) + }) + + middleware := auditLoggingMiddleware(config) + wrappedHandler := middleware(handler) + + req := httptest.NewRequest(http.MethodPost, "/api/agents/my-namespace/my-agent", nil) + rec := httptest.NewRecorder() + + wrappedHandler.ServeHTTP(rec, req) + + if !handlerCalled { + t.Error("Expected handler to be called when audit logging is enabled") + } + if rec.Code != http.StatusCreated { + t.Errorf("Expected status 201, got %d", rec.Code) + } +} + +func TestAuditLoggingMiddleware_WithAuthenticatedUser(t *testing.T) { + config := AuditLogConfig{ + Enabled: true, + LogLevel: 0, + } + + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusOK) + }) + + middleware := auditLoggingMiddleware(config) + wrappedHandler := middleware(handler) + + // Create request with auth session in context + req := httptest.NewRequest(http.MethodGet, "/api/agents/test-ns/test-agent", nil) + session := &mockSession{ + principal: auth.Principal{ + User: auth.User{ + ID: "test-user-123", + Roles: []string{"admin", "user"}, + }, + }, + } + ctx := auth.AuthSessionTo(req.Context(), session) + req = req.WithContext(ctx) + + rec := httptest.NewRecorder() + wrappedHandler.ServeHTTP(rec, req) + + if rec.Code != http.StatusOK { + t.Errorf("Expected status 200, got %d", rec.Code) + } +} + +func TestAuditLoggingMiddleware_WithRequestID(t *testing.T) { + config := AuditLogConfig{ + Enabled: true, + LogLevel: 0, + } + + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusOK) + }) + + middleware := auditLoggingMiddleware(config) + wrappedHandler := middleware(handler) + + req := httptest.NewRequest(http.MethodGet, "/api/agents", nil) + req.Header.Set("X-Request-ID", "custom-request-id-12345") + rec := httptest.NewRecorder() + + wrappedHandler.ServeHTTP(rec, req) + + if rec.Code != http.StatusOK { + t.Errorf("Expected status 200, got %d", rec.Code) + } +} + +func TestAuditLoggingMiddleware_WithIncludeHeaders(t *testing.T) { + config := AuditLogConfig{ + Enabled: true, + LogLevel: 0, + IncludeHeaders: []string{"X-Correlation-ID", "X-Trace-ID"}, + } + + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusOK) + }) + + middleware := auditLoggingMiddleware(config) + wrappedHandler := middleware(handler) + + req := httptest.NewRequest(http.MethodGet, "/api/agents", nil) + req.Header.Set("X-Correlation-ID", "corr-12345") + req.Header.Set("X-Trace-ID", "trace-67890") + rec := httptest.NewRecorder() + + wrappedHandler.ServeHTTP(rec, req) + + if rec.Code != http.StatusOK { + t.Errorf("Expected status 200, got %d", rec.Code) + } +} + +func TestExtractNamespace(t *testing.T) { + tests := []struct { + name string + path string + query string + header string + expected string + }{ + { + name: "extract from path - agents", + path: "/api/agents/my-namespace/my-agent", + expected: "my-namespace", + }, + { + name: "extract from path - sessions", + path: "/api/sessions/agent/test-ns/test-agent", + expected: "agent", // The first segment after /api/sessions/ + }, + { + name: "extract from path - tools", + path: "/api/tools/production/my-tool", + expected: "production", + }, + { + name: "extract from query parameter", + path: "/api/agents", + query: "namespace=query-ns", + expected: "query-ns", + }, + { + name: "extract from header", + path: "/api/agents", + header: "header-ns", + expected: "header-ns", + }, + { + name: "fallback to unknown", + path: "/api/agents", + expected: "unknown", + }, + { + name: "path takes precedence over query", + path: "/api/agents/path-ns/agent", + query: "namespace=query-ns", + expected: "path-ns", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + url := tt.path + if tt.query != "" { + url += "?" + tt.query + } + req := httptest.NewRequest(http.MethodGet, url, nil) + if tt.header != "" { + req.Header.Set("X-Namespace", tt.header) + } + + result := extractNamespace(req) + + if result != tt.expected { + t.Errorf("extractNamespace() = %q, want %q", result, tt.expected) + } + }) + } +} + +func TestCategorizeResult(t *testing.T) { + tests := []struct { + status int + expected string + }{ + {200, "success"}, + {201, "success"}, + {204, "success"}, + {299, "success"}, + {301, "redirect"}, + {302, "redirect"}, + {304, "redirect"}, + {400, "client_error"}, + {401, "client_error"}, + {403, "client_error"}, + {404, "client_error"}, + {422, "client_error"}, + {500, "server_error"}, + {502, "server_error"}, + {503, "server_error"}, + {100, "unknown"}, + {199, "unknown"}, + } + + for _, tt := range tests { + t.Run(http.StatusText(tt.status), func(t *testing.T) { + result := categorizeResult(tt.status) + if result != tt.expected { + t.Errorf("categorizeResult(%d) = %q, want %q", tt.status, result, tt.expected) + } + }) + } +} + +func TestDefaultAuditLogConfig(t *testing.T) { + // Test default config (without env var set) + config := DefaultAuditLogConfig() + + if !config.Enabled { + t.Error("Expected audit logging to be enabled by default") + } + if config.LogLevel != 0 { + t.Errorf("Expected LogLevel 0, got %d", config.LogLevel) + } + if len(config.IncludeHeaders) != 0 { + t.Errorf("Expected empty IncludeHeaders, got %v", config.IncludeHeaders) + } +} + +func TestAuditLoggingMiddleware_ErrorStatus(t *testing.T) { + config := AuditLogConfig{ + Enabled: true, + LogLevel: 0, + } + + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusInternalServerError) + }) + + middleware := auditLoggingMiddleware(config) + wrappedHandler := middleware(handler) + + req := httptest.NewRequest(http.MethodGet, "/api/agents", nil) + rec := httptest.NewRecorder() + + wrappedHandler.ServeHTTP(rec, req) + + if rec.Code != http.StatusInternalServerError { + t.Errorf("Expected status 500, got %d", rec.Code) + } +} + +func TestAuditLoggingMiddleware_AnonymousUser(t *testing.T) { + config := AuditLogConfig{ + Enabled: true, + LogLevel: 0, + } + + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusOK) + }) + + middleware := auditLoggingMiddleware(config) + wrappedHandler := middleware(handler) + + // Request without auth session + req := httptest.NewRequest(http.MethodGet, "/api/health", nil) + rec := httptest.NewRecorder() + + wrappedHandler.ServeHTTP(rec, req) + + if rec.Code != http.StatusOK { + t.Errorf("Expected status 200, got %d", rec.Code) + } +} + +// TestAuditLoggingMiddleware_AllHTTPMethods tests that all HTTP methods are logged +func TestAuditLoggingMiddleware_AllHTTPMethods(t *testing.T) { + config := AuditLogConfig{ + Enabled: true, + LogLevel: 0, + } + + methods := []string{ + http.MethodGet, + http.MethodPost, + http.MethodPut, + http.MethodDelete, + http.MethodPatch, + } + + for _, method := range methods { + t.Run(method, func(t *testing.T) { + handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusOK) + }) + + middleware := auditLoggingMiddleware(config) + wrappedHandler := middleware(handler) + + req := httptest.NewRequest(method, "/api/agents/ns/agent", nil) + rec := httptest.NewRecorder() + + wrappedHandler.ServeHTTP(rec, req) + + if rec.Code != http.StatusOK { + t.Errorf("Expected status 200 for %s, got %d", method, rec.Code) + } + }) + } +} diff --git a/go/internal/httpserver/server.go b/go/internal/httpserver/server.go index 0875cf025..6be6caa4a 100644 --- a/go/internal/httpserver/server.go +++ b/go/internal/httpserver/server.go @@ -226,6 +226,7 @@ func (s *HTTPServer) setupRoutes() { // Use middleware for common functionality s.router.Use(auth.AuthnMiddleware(s.authenticator)) + s.router.Use(auditLoggingMiddleware(DefaultAuditLogConfig())) s.router.Use(contentTypeMiddleware) s.router.Use(loggingMiddleware) s.router.Use(errorHandlerMiddleware) diff --git a/go/pkg/app/app.go b/go/pkg/app/app.go index 8ee3b9589..312d0392d 100644 --- a/go/pkg/app/app.go +++ b/go/pkg/app/app.go @@ -379,6 +379,7 @@ func Start(getExtensionConfig GetExtensionConfig) { mgr.GetClient(), dbClient, cfg.DefaultModelConfig, + watchNamespacesList, ) if err := (&controller.ServiceController{ diff --git a/helm/kagent/templates/pdb.yaml b/helm/kagent/templates/pdb.yaml new file mode 100644 index 000000000..dcb67d310 --- /dev/null +++ b/helm/kagent/templates/pdb.yaml @@ -0,0 +1,40 @@ +{{- if .Values.pdb.enabled }} +--- +apiVersion: policy/v1 +kind: PodDisruptionBudget +metadata: + name: {{ include "kagent.fullname" . }}-controller + namespace: {{ include "kagent.namespace" . }} + labels: + {{- include "kagent.controller.labels" . | nindent 4 }} +spec: + {{- if .Values.pdb.controller.minAvailable }} + minAvailable: {{ .Values.pdb.controller.minAvailable }} + {{- else if .Values.pdb.controller.maxUnavailable }} + maxUnavailable: {{ .Values.pdb.controller.maxUnavailable }} + {{- else }} + minAvailable: 1 + {{- end }} + selector: + matchLabels: + {{- include "kagent.controller.selectorLabels" . | nindent 6 }} +--- +apiVersion: policy/v1 +kind: PodDisruptionBudget +metadata: + name: {{ include "kagent.fullname" . }}-ui + namespace: {{ include "kagent.namespace" . }} + labels: + {{- include "kagent.ui.labels" . | nindent 4 }} +spec: + {{- if .Values.pdb.ui.minAvailable }} + minAvailable: {{ .Values.pdb.ui.minAvailable }} + {{- else if .Values.pdb.ui.maxUnavailable }} + maxUnavailable: {{ .Values.pdb.ui.maxUnavailable }} + {{- else }} + minAvailable: 1 + {{- end }} + selector: + matchLabels: + {{- include "kagent.ui.selectorLabels" . | nindent 6 }} +{{- end }} diff --git a/helm/kagent/templates/prometheusrule.yaml b/helm/kagent/templates/prometheusrule.yaml new file mode 100644 index 000000000..c59a6d96a --- /dev/null +++ b/helm/kagent/templates/prometheusrule.yaml @@ -0,0 +1,65 @@ +{{- if and .Values.metrics.enabled .Values.metrics.prometheusRule.enabled }} +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + name: {{ include "kagent.fullname" . }}-controller + namespace: {{ include "kagent.namespace" . }} + labels: + {{- include "kagent.controller.labels" . | nindent 4 }} + {{- with .Values.metrics.prometheusRule.labels }} + {{- toYaml . | nindent 4 }} + {{- end }} +spec: + groups: + - name: kagent-controller + rules: + {{- if .Values.metrics.prometheusRule.defaultRules.controllerDown }} + - alert: KagentControllerDown + expr: absent(up{job="{{ include "kagent.fullname" . }}-controller"} == 1) + for: 5m + labels: + severity: critical + {{- with .Values.metrics.prometheusRule.alertLabels }} + {{- toYaml . | nindent 12 }} + {{- end }} + annotations: + summary: Kagent controller is down + description: "Kagent controller has been down for more than 5 minutes." + {{- end }} + {{- if .Values.metrics.prometheusRule.defaultRules.highErrorRate }} + - alert: KagentHighErrorRate + expr: | + ( + sum(rate(http_requests_total{job="{{ include "kagent.fullname" . }}-controller", status=~"5.."}[5m])) + / + sum(rate(http_requests_total{job="{{ include "kagent.fullname" . }}-controller"}[5m])) + ) > {{ .Values.metrics.prometheusRule.thresholds.errorRatePercent | default 0.05 }} + for: 5m + labels: + severity: warning + {{- with .Values.metrics.prometheusRule.alertLabels }} + {{- toYaml . | nindent 12 }} + {{- end }} + annotations: + summary: Kagent controller high error rate + description: "Kagent controller error rate is above {{ mul (.Values.metrics.prometheusRule.thresholds.errorRatePercent | default 0.05) 100 }}% for more than 5 minutes." + {{- end }} + {{- if .Values.metrics.prometheusRule.defaultRules.highLatency }} + - alert: KagentHighLatency + expr: | + histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{job="{{ include "kagent.fullname" . }}-controller"}[5m])) by (le)) + > {{ .Values.metrics.prometheusRule.thresholds.latencySeconds | default 5 }} + for: 5m + labels: + severity: warning + {{- with .Values.metrics.prometheusRule.alertLabels }} + {{- toYaml . | nindent 12 }} + {{- end }} + annotations: + summary: Kagent controller high latency + description: "Kagent controller 99th percentile latency is above {{ .Values.metrics.prometheusRule.thresholds.latencySeconds | default 5 }}s for more than 5 minutes." + {{- end }} + {{- with .Values.metrics.prometheusRule.additionalRules }} + {{- toYaml . | nindent 8 }} + {{- end }} +{{- end }} diff --git a/helm/kagent/templates/servicemonitor.yaml b/helm/kagent/templates/servicemonitor.yaml new file mode 100644 index 000000000..c8ac2083c --- /dev/null +++ b/helm/kagent/templates/servicemonitor.yaml @@ -0,0 +1,41 @@ +{{- if and .Values.metrics.enabled .Values.metrics.serviceMonitor.enabled }} +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: {{ include "kagent.fullname" . }}-controller + namespace: {{ include "kagent.namespace" . }} + labels: + {{- include "kagent.controller.labels" . | nindent 4 }} + {{- with .Values.metrics.serviceMonitor.labels }} + {{- toYaml . | nindent 4 }} + {{- end }} +spec: + selector: + matchLabels: + {{- include "kagent.controller.selectorLabels" . | nindent 6 }} + namespaceSelector: + matchNames: + - {{ include "kagent.namespace" . }} + endpoints: + - port: controller + {{- with .Values.metrics.serviceMonitor.interval }} + interval: {{ . }} + {{- end }} + {{- with .Values.metrics.serviceMonitor.scrapeTimeout }} + scrapeTimeout: {{ . }} + {{- end }} + path: {{ .Values.metrics.serviceMonitor.path | default "/metrics" }} + scheme: {{ .Values.metrics.serviceMonitor.scheme | default "http" }} + {{- with .Values.metrics.serviceMonitor.tlsConfig }} + tlsConfig: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.metrics.serviceMonitor.relabelings }} + relabelings: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.metrics.serviceMonitor.metricRelabelings }} + metricRelabelings: + {{- toYaml . | nindent 8 }} + {{- end }} +{{- end }} diff --git a/helm/kagent/tests/pdb_test.yaml b/helm/kagent/tests/pdb_test.yaml new file mode 100644 index 000000000..933e2ee83 --- /dev/null +++ b/helm/kagent/tests/pdb_test.yaml @@ -0,0 +1,158 @@ +suite: test pod disruption budgets +templates: + - pdb.yaml +tests: + - it: should not render PDBs when disabled + set: + pdb: + enabled: false + asserts: + - hasDocuments: + count: 0 + + - it: should render PDBs when enabled + set: + pdb: + enabled: true + controller: + minAvailable: 1 + ui: + minAvailable: 1 + asserts: + - hasDocuments: + count: 2 + + - it: should render controller PDB with correct name and labels + set: + pdb: + enabled: true + controller: + minAvailable: 1 + ui: + minAvailable: 1 + documentIndex: 0 + asserts: + - isKind: + of: PodDisruptionBudget + - equal: + path: metadata.name + value: RELEASE-NAME-controller + - equal: + path: metadata.labels["app.kubernetes.io/component"] + value: controller + + - it: should render ui PDB with correct name and labels + set: + pdb: + enabled: true + controller: + minAvailable: 1 + ui: + minAvailable: 1 + documentIndex: 1 + asserts: + - isKind: + of: PodDisruptionBudget + - equal: + path: metadata.name + value: RELEASE-NAME-ui + - equal: + path: metadata.labels["app.kubernetes.io/component"] + value: ui + + - it: should use minAvailable for controller when specified + set: + pdb: + enabled: true + controller: + minAvailable: 2 + ui: + minAvailable: 1 + documentIndex: 0 + asserts: + - equal: + path: spec.minAvailable + value: 2 + + - it: should use maxUnavailable for controller when specified + set: + pdb: + enabled: true + controller: + maxUnavailable: 1 + ui: + minAvailable: 1 + documentIndex: 0 + asserts: + - equal: + path: spec.maxUnavailable + value: 1 + + - it: should use minAvailable for ui when specified + set: + pdb: + enabled: true + controller: + minAvailable: 1 + ui: + minAvailable: 2 + documentIndex: 1 + asserts: + - equal: + path: spec.minAvailable + value: 2 + + - it: should use maxUnavailable for ui when specified + set: + pdb: + enabled: true + controller: + minAvailable: 1 + ui: + maxUnavailable: 1 + documentIndex: 1 + asserts: + - equal: + path: spec.maxUnavailable + value: 1 + + - it: should default to minAvailable 1 when neither is specified for controller + set: + pdb: + enabled: true + controller: {} + ui: + minAvailable: 1 + documentIndex: 0 + asserts: + - equal: + path: spec.minAvailable + value: 1 + + - it: should match controller selector labels + set: + pdb: + enabled: true + controller: + minAvailable: 1 + ui: + minAvailable: 1 + documentIndex: 0 + asserts: + - equal: + path: spec.selector.matchLabels["app.kubernetes.io/component"] + value: controller + + - it: should match ui selector labels + set: + pdb: + enabled: true + controller: + minAvailable: 1 + ui: + minAvailable: 1 + documentIndex: 1 + asserts: + - equal: + path: spec.selector.matchLabels["app.kubernetes.io/component"] + value: ui diff --git a/helm/kagent/values.yaml b/helm/kagent/values.yaml index 1bb8168e6..0aa26dd00 100644 --- a/helm/kagent/values.yaml +++ b/helm/kagent/values.yaml @@ -46,6 +46,26 @@ tolerations: [] # -- Node labels to match for `Pod` [scheduling](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/). nodeSelector: {} +# ============================================================================== +# HIGH AVAILABILITY CONFIGURATION +# ============================================================================== + +pdb: + # -- Enable PodDisruptionBudgets for controller and UI deployments + enabled: false + controller: + # -- Minimum number of controller pods that must be available during disruption + # Only one of minAvailable or maxUnavailable can be set + minAvailable: 1 + # -- Maximum number of controller pods that can be unavailable during disruption + # maxUnavailable: 1 + ui: + # -- Minimum number of UI pods that must be available during disruption + # Only one of minAvailable or maxUnavailable can be set + minAvailable: 1 + # -- Maximum number of UI pods that can be unavailable during disruption + # maxUnavailable: 1 + # ============================================================================== # DATABASE CONFIGURATION # ============================================================================== @@ -76,8 +96,11 @@ controller: maxBufSize: 1Mi # 1024 * 1024 initialBufSize: 4Ki # 4 * 1024 timeout: 600s # 600 seconds - # -- Namespaces the controller should watch. - # If empty, the controller will watch ALL available namespaces. + # -- Namespaces the controller should watch (namespace-scoped mode). + # When set, the controller enforces namespace isolation and only reconciles + # resources within the specified namespaces. This enables multi-tenant + # deployments where different tenants' resources are isolated. + # If empty, the controller watches ALL namespaces (cluster-wide mode). # @default -- [] (watches all available namespaces) watchNamespaces: [] # - watch-ns-1 @@ -380,3 +403,64 @@ otel: endpoint: http://host.docker.internal:4317 timeout: 15 insecure: true + +# ============================================================================== +# MONITORING CONFIGURATION (Prometheus Operator) +# ============================================================================== + +metrics: + # -- Enable metrics (controller exposes metrics by default) + enabled: true + + serviceMonitor: + # -- Enable ServiceMonitor for Prometheus Operator + enabled: false + # -- Additional labels for the ServiceMonitor + labels: {} + # -- Scrape interval (e.g., "30s") + # @default -- Prometheus default + interval: "" + # -- Scrape timeout (e.g., "10s") + # @default -- Prometheus default + scrapeTimeout: "" + # -- Metrics path + path: /metrics + # -- Scheme to use for scraping (http or https) + scheme: http + # -- TLS configuration for scraping + tlsConfig: {} + # -- RelabelConfigs to apply to samples before scraping + relabelings: [] + # -- MetricRelabelConfigs to apply to samples before ingestion + metricRelabelings: [] + + prometheusRule: + # -- Enable PrometheusRule for alerting + enabled: false + # -- Additional labels for the PrometheusRule + labels: {} + # -- Additional labels to add to all alerts + alertLabels: {} + # -- Enable/disable default alerting rules + defaultRules: + # -- Alert when controller is down + controllerDown: true + # -- Alert on high error rate + highErrorRate: true + # -- Alert on high latency + highLatency: true + # -- Thresholds for default alerts + thresholds: + # -- Error rate threshold (0.05 = 5%) + errorRatePercent: 0.05 + # -- Latency threshold in seconds (p99) + latencySeconds: 5 + # -- Additional custom alerting rules + additionalRules: [] + # - alert: CustomAlert + # expr: some_metric > 100 + # for: 5m + # labels: + # severity: warning + # annotations: + # summary: Custom alert triggered