A/B Testing
Test different implementations side-by-side in production. No feature flags, no external tools - just conditions and metrics. Conductor makes A/B testing a first-class citizen. Test prompts, models, agents, entire workflows - anything.Simple A/B Test
Test two AI models:Traffic Splitting
50/50 Split
90/10 Split
33/33/33 Split (3 variants)
Dynamic Split (via KV)
Sticky Sessions
Critical: Users must get the same variant every time.Bad (Random)
Good (Sticky)
Hash Implementation
What to Test
1. AI Models
Test different models:2. Prompts
Test different prompts (with Edgit versioning):3. Agent Implementations
Test different agent versions:4. Entire Workflows
Test different ensemble implementations:Metrics Collection
Track variant performance:Multivariate Testing
Test multiple variables simultaneously.22 Test: Model Prompt
Analysis & Decision Making
Statistical Significance
Auto-Promote Winner
Best Practices
- Sticky Sessions - Use consistent hashing, not random
- Sample Size - Collect at least 1000 samples per variant
- Statistical Significance - Wait for p < 0.05 or Bayesian > 95%
- Monitor for Weeks - Capture weekly patterns
- One Variable at a Time - Or use multivariate with large samples
- Track Costs - Some variants may be more expensive
- Monitor Failures - Track error rates per variant
- Document Results - Keep records of what worked

