Add Automated Circuit Breaker Regression Detection System #943

nirmitparikh8 · 2025-12-19T15:27:25Z

🎯 What We're Trying to Achieve

Replace manual visual inspection of experiment results with an automated statistical regression detection system that can:

Automatically detect performance degradations in circuit breaker experiments
Provide boolean pass/fail decisions for CI/CD integration
Eliminate the need for manual chart analysis and subjective interpretation
Catch regressions early before they impact production systems

🔧 How We're Achieving It

Statistical Approach: Percentile-Based Control Charts

Baseline Collection: Collect 10+ historical "good" experiment runs for each experiment type
Percentile Analysis: Calculate configurable percentiles (currently 5th-95th) for key metrics:
- Deviation from Target: |actual_rate - target_rate| / target_rate * 100
- Raw Error Rate: Direct error percentage
- Raw Rejection Rate: Direct rejection percentage
Control Limits: Use percentile bounds as "normal operating range"
Violation Detection: Flag experiments where >X% of time windows fall outside bounds

Automated Pipeline Components

collect_baseline_data.rb: Automated baseline collection (runs experiments N times, organizes results)
compute_baselines.rb: Statistical analysis (calculates percentiles from historical data)
detect_regressions.rb: Main detection engine (compares current results vs baselines)
regression_config.rb: Centralized, tunable configuration
GitHub Actions Integration: Fully automated CI checks on every PR

📊 Pipeline Flow & Math

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────┐
│   Historical    │    │   Statistical    │    │   Current Results   │
│ Baseline Runs   │───▶│   Analysis       │───▶│    Comparison       │
│ (10-15 runs)    │    │ (Percentiles)    │    │  (Pass/Fail)        │
└─────────────────┘    └──────────────────┘    └─────────────────────┘
         │                       │                        │
         │              ┌──────────▼────────────┐         │
         │              │ Control Limits:       │         │
         │              │ p5 = 0.79% errors     │         │
         │              │ p95 = 4.76% errors    │         │
         │              │ p5 = 0.0% rejected    │         │
         │              │ p95 = 51.47% rejected |         │
         │              └───────────────────────┘         │
         │                                                │
┌────────▼────────────────────────────────────────────────▼───┐
│               Regression Detection Logic:                   │
│  IF violation_rate > THRESHOLD → FAIL (Regression)          │
│  Current: 50-80% thresholds (very generous)                 │
└─────────────────────────────────────────────────────────────┘

🚧 Current State & Progress

✅ What's Complete

Full pipeline architecture implemented and tested
GitHub Actions integration - automatically runs on every PR
Dynamic percentile system - fully configurable percentile ranges
Comprehensive documentation - setup guides, troubleshooting, examples
Robust error handling - graceful failures, clear messaging

🔧 What We're Still Tuning

Baseline Data Sufficiency: How many historical runs provide stable percentiles? (Currently requiring 10+)
Optimal Percentile Ranges: 5th-95th vs 3rd-97th vs 10th-90th percentiles?
Violation Thresholds: What % of windows can violate bounds before flagging regression?

🤖 Current Configuration (Intentionally Generous)

# Very generous settings to avoid blocking CI while we tune
LOWER_PERCENTILE = 5          # 5th percentile  
UPPER_PERCENTILE = 95         # 95th percentile
DEVIATION_VIOLATION_THRESHOLD = 0.8   # 80% of windows can violate (very high)
ERROR_RATE_VIOLATION_THRESHOLD = 0.8  # 80% of windows can violate (very high)  
REJECTION_RATE_VIOLATION_THRESHOLD = 0.8  # 80% of windows can violate (very high)

Why So Generous? I wanted to develop a baseline MVP for the team to iterate on before my internship ends. I didn't get time to go deeper.

📈 Next Steps

Collect Production Baseline Data: Run collect_baseline_data.rb 15 across multiple weeks
Analyze Natural Variation: Study percentile distributions to find optimal bounds
Iterative Threshold Tuning: Gradually tighten violation thresholds (50% → 30% → 15%)
Per-Experiment Calibration: Some experiments may need different sensitivity levels
False Positive Monitoring: Track and eliminate unnecessary CI failures

🎯 Success Metrics

Zero False Negatives: Catch all real performance regressions
Minimal False Positives: <5% of PRs blocked unnecessarily
CI Integration: Seamless GitHub Actions workflow
Developer Experience: Clear, actionable regression reports

This PR establishes the foundation for data-driven circuit breaker regression detection. The generous initial configuration ensures we don't block development while we gather data to optimize the system.

nirmitparikh8 · 2025-12-19T17:38:51Z

experiments/regression_config.rb

This is the file to tune configs. Let's iterate on this and make the configs more restrictive. Right now we allow 80 percent of changes in error trend. We should figure out why these violations are so high when we compare new csv to baseline and fix that. Could be something with the test or with hoq we check for regressions (This new system I built)

nirmitparikh8 changed the base branch from main to pid-take-2 December 19, 2025 15:27

nirmitparikh8 changed the title ~~Goodput analysis of experiments~~ Add Automated Circuit Breaker Regression Detection System Dec 19, 2025

nirmitparikh8 force-pushed the goodput-analysis-of-experiments branch from be0d5c2 to 2d17fc3 Compare December 19, 2025 17:30

nirmitparikh8 commented Dec 19, 2025

View reviewed changes

nirmitparikh8 force-pushed the goodput-analysis-of-experiments branch from b730d85 to 2d17fc3 Compare December 19, 2025 17:39

nirmitparikh8 added 10 commits December 19, 2025 14:35

Implemnet regression detction

c774da1

Update CI

69a30c4

fix ci

0c6b124

fix ci

488494c

add imagemagick to ci

fe41caa

Build semian native extension in CI

6dde118

Add readme

b3a6845

update regression config

0b81613

Update read me

ba8de7d

Update threshold

2231b64

nirmitparikh8 force-pushed the goodput-analysis-of-experiments branch from aeba5a1 to 2231b64 Compare December 19, 2025 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Automated Circuit Breaker Regression Detection System #943

Add Automated Circuit Breaker Regression Detection System #943

Uh oh!

nirmitparikh8 commented Dec 19, 2025 •

edited

Loading

Uh oh!

nirmitparikh8 Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Automated Circuit Breaker Regression Detection System #943

Are you sure you want to change the base?

Add Automated Circuit Breaker Regression Detection System #943

Uh oh!

Conversation

nirmitparikh8 commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What We're Trying to Achieve

🔧 How We're Achieving It

Statistical Approach: Percentile-Based Control Charts

Automated Pipeline Components

📊 Pipeline Flow & Math

🚧 Current State & Progress

✅ What's Complete

🔧 What We're Still Tuning

🤖 Current Configuration (Intentionally Generous)

📈 Next Steps

🎯 Success Metrics

Uh oh!

nirmitparikh8 Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nirmitparikh8 commented Dec 19, 2025 •

edited

Loading