Skip to content

[CRITICAL] Add integration tests for monitoring system #12

@emiperez95

Description

@emiperez95

Testing Gap

No integration tests exist for the complete monitoring pipeline, creating risk of system-level failures in production.

Components Needing Integration Testing

1. End-to-End Tool Tracking

  • Agent execution → Hook triggers → Database writes → Metrics export → Prometheus scraping
  • Verify complete data flow accuracy

2. Database Consistency

  • Foreign key constraints between tables
  • Transaction isolation
  • Concurrent access handling
  • Data integrity under load

3. Prometheus Integration

  • Metric format validation
  • HTTP endpoint functionality
  • Registry management
  • Scraping reliability

Test Architecture

Integration Test Structure

tests/
├── integration/
│   ├── test_end_to_end_tracking.py       # Full workflow
│   ├── test_database_integration.py      # Multi-table operations
│   ├── test_prometheus_integration.py    # Metrics export
│   ├── test_concurrent_access.py         # Parallel execution
│   └── test_performance_integration.py   # Load testing
├── docker/
│   ├── test-prometheus.yml              # Test Prometheus
│   └── test-environment.yml             # Full stack
└── fixtures/
    └── integration_scenarios.py         # Test scenarios

Test Scenarios

Scenario 1: Complete Agent Workflow

def test_complete_agent_workflow():
    # 1. Start prometheus exporter
    exporter = start_test_exporter()
    
    # 2. Simulate agent execution with tool usage
    mock_agent_execution([
        ('Read', {'file_path': 'test.py'}),
        ('Edit', {'content': 'new content'}),
        ('Bash', {'command': 'pytest'})
    ])
    
    # 3. Verify database entries
    assert_database_entries([
        'agent_invocations',
        'agent_tool_uses', 
        'sessions'
    ])
    
    # 4. Verify metrics endpoint
    response = requests.get('http://localhost:9091/metrics')
    assert 'agent_tool_usage_total' in response.text
    
    # 5. Verify metric accuracy
    assert_metric_value('agent_tool_usage_total{tool_name="Read"}', 1)

Scenario 2: Concurrent Agent Execution

def test_concurrent_agent_execution():
    # Simulate 5 agents running in parallel
    agents = []
    for i in range(5):
        agent = MockAgent(f'agent-{i}')
        agents.append(agent)
        agent.start()
    
    # Wait for completion
    for agent in agents:
        agent.join()
    
    # Verify data integrity
    assert_no_race_conditions()
    assert_session_consistency()
    assert_metric_accuracy()

Scenario 3: Database Load Test

def test_database_load():
    # Generate 10K invocations with 100K tool uses
    generate_load_data(
        sessions=100,
        invocations_per_session=100,
        tools_per_invocation=10
    )
    
    # Start collector
    collector = MetricsCollector(test_db)
    
    # Measure performance
    start = time.time()
    collector.collect_metrics()
    duration = time.time() - start
    
    # Performance assertions
    assert duration < 1.0  # Under 1 second
    assert memory_usage() < 100 * 1024 * 1024  # Under 100MB

Docker Test Environment

# docker/test-stack.yml
version: '3.8'
services:
  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./test-prometheus.yml:/etc/prometheus/prometheus.yml
  
  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=test
  
  agent-exporter:
    build: .
    ports:
      - "9091:9091"
    volumes:
      - ./test_data:/app/logs
    command: python tools/prometheus_exporter.py --port 9091

Test Implementation

Database Integration Tests

class TestDatabaseIntegration:
    def test_foreign_key_constraints(self):
        # Insert agent_tool_uses without corresponding agent_invocation
        # Should fail with foreign key constraint
        with pytest.raises(sqlite3.IntegrityError):
            insert_tool_use_without_invocation()
    
    def test_transaction_rollback(self):
        # Start transaction, make changes, rollback
        # Verify no partial data persists
        pass
    
    def test_concurrent_writes(self):
        # Multiple threads writing simultaneously
        # Verify no deadlocks or data corruption
        pass

Prometheus Integration Tests

class TestPrometheusIntegration:
    def test_metrics_format_validation(self):
        # Scrape metrics endpoint
        response = requests.get('http://localhost:9091/metrics')
        
        # Parse with prometheus parser
        metrics = prometheus_parser.parse(response.text)
        
        # Validate metric names and labels
        for metric in metrics:
            assert is_valid_metric_name(metric.name)
            assert all(is_valid_label(label) for label in metric.labels)
    
    def test_metric_cardinality(self):
        # Generate high-cardinality scenario
        # Verify metrics don't exceed limits
        assert len(get_unique_label_combinations()) < 10000
    
    def test_prometheus_scraping(self):
        # Start prometheus with test config
        # Verify successful scraping
        # Check prometheus targets are healthy
        pass

Performance Integration Tests

def test_performance_under_load():
    # Test with realistic production data volumes
    scenarios = [
        (1000, 10000),     # 1K sessions, 10K invocations
        (5000, 50000),     # 5K sessions, 50K invocations
        (10000, 100000),   # 10K sessions, 100K invocations
    ]
    
    for sessions, invocations in scenarios:
        setup_test_data(sessions, invocations)
        
        # Measure collection performance
        collector = MetricsCollector(test_db)
        duration = time_collection(collector)
        
        # Assert performance requirements
        assert duration < (invocations / 10000)  # Linear scaling

CI/CD Integration

# .github/workflows/integration-test.yml
name: Integration Tests
on: [push, pull_request]

jobs:
  integration:
    runs-on: ubuntu-latest
    services:
      prometheus:
        image: prom/prometheus
        ports:
          - 9090:9090
    
    steps:
      - uses: actions/checkout@v3
      - name: Setup test environment
        run: docker-compose -f docker/test-stack.yml up -d
      
      - name: Wait for services
        run: |
          wget --retry-connrefused --waitretry=5 --timeout=30 \
               http://localhost:9090/api/v1/targets
      
      - name: Run integration tests
        run: |
          pytest tests/integration/ -v \
                 --tb=short \
                 --durations=10

Test Data Management

@pytest.fixture(scope='session')
def integration_database():
    # Create temporary database with realistic data
    db_path = create_test_database()
    populate_with_realistic_data(db_path)
    yield db_path
    cleanup_test_database(db_path)

def populate_with_realistic_data(db_path):
    # Insert representative data that matches production patterns
    # - Various agent types and execution patterns
    # - Realistic tool usage distributions
    # - Time-based data spanning multiple days
    pass

Validation Criteria

  • End-to-end workflow tests passing
  • Database consistency verified under load
  • Prometheus integration working correctly
  • Concurrent access handling validated
  • Performance requirements met
  • Error scenarios handled gracefully
  • Docker test environment stable

Success Metrics

  • Test Coverage: End-to-end scenarios covered
  • Performance: 100K records processed in < 1 second
  • Reliability: 0 failures in 1000 test runs
  • Concurrency: No race conditions with 10 parallel agents

Effort Estimate

6 hours total

  • 2 hours: End-to-end workflow tests
  • 2 hours: Database integration tests
  • 1 hour: Prometheus integration tests
  • 1 hour: Docker environment and CI/CD

Dependencies

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingintegrationIntegration testingtechnical-debtCode quality issues

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions