A monitoring and management agent for SQD nodes running on Linux servers.
- Written in Go
- Runs as a systemd service on Linux systems
- Monitors multiple SQD nodes on each server
- Uses SQD GraphQL API to get node information (APR, online status, jailed status, etc.)
- Uses custom commands to discover and manage SQD nodes
- Supports notifications via webhooks and Discord
- Exposes Prometheus metrics for monitoring
- Auto-updates to the latest version
-
Clone the repository:
git clone https://github.com/nodexeus/sqd-agent.git cd sqd-agent -
Build and install:
make install -
Enable and start the service:
systemctl enable --now sqd-agent
-
Download the latest
.debpackage from the releases page. -
Install the package:
sudo dpkg -i sqd-agent_*.deb -
Enable and start the service:
systemctl enable --now sqd-agent
The configuration file is located at /etc/sqd-agent/config.yaml. Here's an example configuration:
# General settings
logLevel: "info"
monitorPeriod: "5m" # How often to check node status
actionPeriod: "6h" # How often to take action on unhealthy nodes
passiveMode: false # If true, only monitor and don't take actions
autoUpdate: true # Automatically update the agent when new versions are available
# Notification settings
notifications:
enabled: true
webhookEnabled: false
webhookUrl: "https://example.com/webhook"
discordEnabled: false
discordWebhooks:
- name: "alerts"
url: "https://discord.com/api/webhooks/your-webhook-url"
# Prometheus metrics settings
prometheus:
enabled: true
port: 9090
path: "/metrics"
# GraphQL API settings
graphql:
endpoint: "https://your-graphql-endpoint.com"
# Custom commands
commands:
discoverNodes: "apptainer instance list -j | jq -r '.instances[]| .instance'"
getNodePeerID: "bv node run address"
restartNode: "bv node restart"
getNodeStatus: "bv node status"When enabled, the agent exposes the following metrics on the configured port:
sqd_node_apr: Annual Percentage Rate (APR) of the SQD nodesqd_node_jailed: Whether the SQD node is jailed (1) or not (0)sqd_node_online: Whether the SQD node is online (1) or not (0)sqd_node_local_status: Local status of the SQD node (1=running, 0=not running)sqd_node_healthy: Whether the SQD node is healthy (1) or not (0)sqd_node_last_restart_timestamp: Timestamp of the last restart attempt for the SQD node
A node is considered healthy if all of the following conditions are met:
- Local status is "running"
- Network status is "online"
- Not jailed
- APR is greater than 0
If any of these conditions are not met, the node is considered unhealthy and will be restarted according to the configured action period.
- Go 1.16 or later
- Make
make build
make test
make deb
Package repository hosting is graciously provided by Cloudsmith. Cloudsmith is the only fully hosted, cloud-native, universal package management solution, that enables your organization to create, store and share packages in any format, to any place, with total confidence.