Skip to content

Conversation

@kyungho-for-ops
Copy link
Contributor

Overview

This PR adds support for monitoring Kubernetes Node node-not-ready events in khook, enabling the system to detect and respond to node-level issues automatically.

What Changed

✨ New Features

  • Added node-not-ready event type to supported event configurations
  • Implemented dedicated node event processing logic (mapNodeEventType)
  • Enhanced event filtering to handle Node and Pod events separately

🔧 Technical Changes

  • API Changes: Extended EventConfiguration enum with node-not-ready type
  • Event Processing: Added mapNodeEventType function for node-specific events
  • Event Filtering: Modified event type mapping to distinguish Pod vs Node events
  • CRD Updates: Updated Kubernetes CRD schemas and Helm chart CRDs
  • Code Generation: Updated auto-generated deepcopy code

📚 Documentation Updates

  • Updated README.md with node-not-ready event type documentation
  • Added comprehensive node recovery agent example configuration
  • Enhanced event types table with node monitoring capabilities

Files Modified

api/v1alpha2/hook_types.go              # Added node-not-ready validation
internal/event/watcher.go               # Implemented node event detection  
README.md                               # Updated docs and examples
config/crd/bases/kagent.dev_hooks.yaml  # Updated CRD schema
helm/khook-crds/crds/kagent.dev_hooks.yaml # Updated Helm CRD
api/v1alpha2/zz_generated.deepcopy.go  # Auto-generated updates
.gitignore                              # Excluded build scripts

Usage Example

apiVersion: kagent.dev/v1alpha2
kind: Hook
metadata:
  name: node-monitoring-hook
spec:
  eventConfigurations:
  - eventType: node-not-ready
    agentId: node-recovery-specialist
    prompt: |
      CRITICAL: Node {{.ResourceName}} is not ready at {{.EventTime}}.
      
      AUTONOMOUS MODE: Diagnose and resolve node issues immediately:
      • Check node conditions (Ready, MemoryPressure, DiskPressure, PIDPressure)
      • Analyze kubelet logs and system resources
      • Verify network connectivity and DNS resolution
      • Attempt node recovery procedures
      • If recovery fails, safely drain and replace node
      • Annotate fixed resources with: kagentFix=<timestamp>

Testing

  • Existing Pod event processing functionality remains intact
  • Node event detection logic implemented and validated
  • CRD validation rules updated and tested
  • Backward compatibility maintained

Breaking Changes

None - This is a purely additive feature maintaining full backward compatibility.

Type of Change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Copy link
Collaborator

@antweiss antweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. Just need to remove the redundant .sh file in helm requirements.lock and sign off the commit

.gitignore Outdated
# Generated files
*.pb.go
zz_generated.*.go
zz_generated.*.gobuild-multiarch.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it dosen't necessary.
i'll delete it

@antweiss
Copy link
Collaborator

Thank you for this! Also - if you have any thoughts on how to make new event support more streamlined - LMK.

@antweiss
Copy link
Collaborator

antweiss commented Oct 8, 2025

hi @kyungho-for-ops - waiting for you to sign this off.

@kyungho-for-ops
Copy link
Contributor Author

kyungho-for-ops commented Oct 15, 2025

Hi @antweiss

I've rebased the branch using git rebase HEAD~4 --signoff and force-pushed to fix all DCO issues. All commits should now be signed off correctly.

Could you please approve the pending workflow so the CI/CD checks (Docker Build, Test) can start? Once the checks pass, I would appreciate your final approving review for merging. Thank you for your patience!

Copy link
Collaborator

@antweiss antweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit (trigger final checks) isn't signed-off :))

Kyungho-Dable and others added 4 commits October 20, 2025 11:29
- Add 'node-not-ready' event type to EventConfiguration enum
- Implement mapNodeEventType function in event watcher
- Update event type filtering to handle Node events separately from Pod events
- Update CRD schemas to include node-not-ready event type
- Update documentation and examples with node monitoring capabilities
- Generate updated deepcopy code for API changes

This enables khook to monitor Kubernetes node readiness events and trigger
appropriate agent responses for node-level issues like kubelet failures,
network problems, or resource pressure.

Signed-off-by: Kyungho Kang <kyungho@dable.io>
Signed-off-by: kyungho-for-ops <kyungho1495@gmail.com>
Signed-off-by: Kyungho Kang <kyungho@dable.io>
Signed-off-by: kyungho-for-ops <kyungho1495@gmail.com>
Signed-off-by: Kyungho Kang <kyungho@dable.io>
Signed-off-by: Kyungho Kang <kyungho@dable.io>
@kyungho-for-ops kyungho-for-ops force-pushed the feature/add-node-monitoring branch from d48bd3c to b435184 Compare October 20, 2025 02:30
@kyungho-for-ops
Copy link
Contributor Author

Sorry for the late response; I just got back last night. I've finished the changes.

@antweiss antweiss self-requested a review October 21, 2025 15:16
@antweiss antweiss merged commit 9c3ffe6 into kagent-dev:main Oct 21, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants