OpsBuddy is a comprehensive microservices monitoring platform that provides real-time health monitoring, intelligent log analysis, and automated incident response with Quickfixes.
OpsBuddy consists of several interconnected microservices:
- Real-time Health Checks: Continuous monitoring of service endpoints
- Downtime Tracking: Automatic detection and recording of service failures
- Log Aggregation: Centralized log collection via gRPC SDK
- Performance Metrics: Response time and availability monitoring
- Intelligent Quick Fixes: Gemini AI analyzes logs to suggest actionable solutions
- Pattern Recognition: Identifies common failure patterns (DB timeouts, memory issues, etc.)
- Context-Aware Suggestions: Uses service descriptions for targeted recommendations
- Email Notifications: Instant alerts for service down/up events
- Rich Analysis: Includes AI-generated summaries and quick fixes
- Customizable Templates: Professional email formatting with actionable insights
- Multi-language SDKs: TypeScript/Node.js and Go SDK support
- Easy Integration: Simple gRPC-based log ingestion
- Demo Applications: Working and failing apps for testing
opsbuddy/
├── demo/ # Demo application
│ ├── working-app/ # Healthy service example
│ └── failing-app/ # Failing service example
├── sdk/ # Client SDKs
│ ├── nodejs/ # TypeScript/Node.js SDK
│ └── go/ # Go SDK
├── services/
│ ├── ping-service/ # Health monitoring service
│ ├── notification-service/ # AI analysis & alerts
│ ├── log-consumer-service/ # Log processing
│ ├── log-ingestion-service/ # gRPC log ingestion
│ └── http/ # REST API service
├── ui/ # Frontend dashboard
├── scripts/ # Install Postgres extensions and creation Kafka topics
├── docker-compose.yml # Infrastructure setup
└── README.md
import { OpsBuddySDK } from 'opsbuddy-sdk';
const sdk = new OpsBuddySDK({
serviceId: "my-service",
authToken: "your-token",
grpcEndpoint: "localhost:50051"
});
sdk.startIntercepting(); // Auto-capture console logsWork in progress
- Service Down: Immediate alerts with AI analysis and quick fixes
- Service Recovery: Confirmation with downtime duration
- Ping service detects failure → Creates downtime record
- Kafka event triggers notification service
- AI analyzes last 20 logs + service context
- Email sent with summary + prioritized quick fixes
- Batch Log Processing: Handles high-volume log ingestion
- Connection Pooling: Optimized database connections
- Async Processing: Non-blocking notification handling
- Graceful Degradation: Continues operation during component failures
- Horizontal Scaling: Stateless services support multiple instances
- Kafka Partitioning: Distributes load across consumers
- TimescaleDB: Optimized for time-series data at scale
Nodejs Proto generation
#in /sdk
protoc \
--plugin=./nodejs/node_modules/.bin/protoc-gen-ts_proto \
--ts_proto_out=./nodejs/src/proto \
--ts_proto_opt=esModuleInterop=true,outputServices=grpc-js \
-I ./proto \
./proto/ingestion.protoGo Proto generation
# in /sdk/go
protoc --plugin=protoc-gen-ts=./node_modules/.bin/protoc-gen-ts --js_out=import_style=commonjs,binary:../ts --ts_out=../ts --proto_path=../proto ../proto/ingestion.proto