- 2 agent versions × 3 prompt versions × 2 configs = 12 variants
- All running in production at the same time
- Each user gets a consistent experience
- Data-driven decisions on what actually works
A/B Testing Basics
Simple A/B Test
Test two prompt versions:Test Results
Multivariate Testing
Test multiple variables simultaneously.2×2 Test: Prompt × Model
3×3 Test: Agent × Prompt × Config
analyzer v2.0.0 + prompt v1.0.0 + config v2.0.0 is the optimal combination.
Sticky Sessions
Critical: Users must get the same variant every time.Bad (Random)
Good (Sticky)
Implementation
Traffic Splitting
50/50 Split
90/10 Split
33/33/33 Split (3 variants)
Dynamic Split (via KV)
Metrics Collection
Track variant performance:Advanced: Sequential Testing
Don’t run forever. Stop when you have statistical significance.Bayesian A/B Test
Auto-Promote Winner
Real-World Examples
Example 1: Prompt Iteration
Example 2: Model Selection
Example 3: Agent Implementation
Best Practices
1. Start with Small Traffic
2. Use Statistical Significance
3. Test One Thing at a Time
4. Monitor for Weeks, Not Hours
5. Consider Sample Size
Next Steps
Deployment Strategies
Canaries, progressive rollouts
Versioning Guide
Master independent versioning
Rollback & Time Travel
Emergency rollbacks
CLI Reference
Complete command documentation

