What is AI Provider Routing?
AI Provider Routing determines how Conductor connects to AI providers (OpenAI, Anthropic, Workers AI, Groq). The right routing mode can dramatically improve performance, reduce costs, and increase reliability.cloudflare
Platform-native Workers AI with ultra-low latency
cloudflare-gateway
AI Gateway with caching, analytics, and cost controls
direct
Direct API calls to OpenAI, Anthropic, Groq, etc.
Three Routing Modes
1. Cloudflare (Platform-Native)
For: Workers AI models running on Cloudflare’s network- ⚡ Ultra-fast - Sub-10ms latency to model
- 💰 Cost-effective - Cloudflare’s pricing (often free tier)
- 🔐 No API keys - Uses Workers AI binding
- 🌍 Edge execution - Runs closest to your users
- 📦 Smaller models - 7B-70B parameter range
- Latency is critical (< 50ms cold start)
- Cost optimization for high-volume workloads
- Simple tasks (summarization, classification, extraction)
- No external API key management desired
- Only Workers AI models available
- Smaller context windows than GPT-4/Claude
- Less sophisticated reasoning for complex tasks
2. Cloudflare Gateway (Recommended)
For: OpenAI, Anthropic, Groq through AI Gateway- 🗄️ Persistent cache - Cache spans deployments and users
- 📊 Real-time analytics - Dashboard for costs, latency, errors
- 💵 Cost controls - Set spending limits and rate limits
- 🔄 Retry logic - Automatic retries with exponential backoff
- 🌐 Multi-provider - OpenAI, Anthropic, Groq, etc.
- Always for production AI calls (unless using Workers AI)
- Need caching across requests and deployments
- Want visibility into AI spending and usage
- Multiple environments (dev, staging, prod)
- A/B testing different models or prompts
Persistent Caching
Persistent Caching
Cache survives deployments. First user pays, everyone else benefits. Can reduce costs by 90%+ for repeated queries.
Analytics Dashboard
Analytics Dashboard
Real-time metrics on:
- Cache hit rates
- Request volume
- Cost per model
- Latency percentiles
- Error rates by provider
Cost Controls
Cost Controls
Set hard limits:
- Max spend per day/month
- Rate limits per user
- Alert thresholds
- Budget allocation by environment
A/B Testing
A/B Testing
Split traffic between:
- Different models (GPT-4 vs Claude)
- Different prompts
- Different temperatures
- Track which performs better
3. Direct
For: Direct API calls bypassing AI Gateway- 🎯 Direct connection - No intermediary
- 🆕 Latest features - Provider-specific capabilities
- 🔧 Full control - All provider parameters available
- ❌ No gateway benefits - No cache, analytics, or limits
- Testing new provider features not yet in gateway
- Provider-specific parameters needed
- Debugging provider-specific issues
- Very low request volume (caching not beneficial)
- Miss out on persistent caching
- No analytics or cost controls
- Manual retry logic needed
- Direct API keys required
Configuration Examples
Workers AI (Platform-Native)
OpenAI via Gateway
Anthropic via Gateway
Groq via Gateway
Direct OpenAI (No Gateway)
Multi-Model Strategies
Cascade Pattern
Start with fast/cheap model, escalate to powerful model if needed:Load Balancing
Distribute requests across providers:Fallback Pattern
Try primary provider, fallback to secondary if unavailable:Cost Optimization
Use cheaper models for simple tasks, expensive for complex:Performance Comparison
Latency
| Routing Mode | Cold Start | Warm | Cache Hit |
|---|---|---|---|
| cloudflare | < 50ms | < 10ms | < 5ms |
| cloudflare-gateway | < 100ms | 500-2000ms | < 5ms |
| direct | < 100ms | 500-2000ms | N/A |
Cost (per 1K tokens)
| Provider | Model | Approx Cost |
|---|---|---|
| Workers AI | llama-3.1-8b | $0.001 |
| OpenAI | gpt-4o | $0.005 |
| OpenAI | gpt-4o-mini | $0.0002 |
| Anthropic | claude-3.5-sonnet | $0.003 |
| Groq | llama-3.1-70b | $0.0008 |
- Effective cost: 10% of base cost
- Example: gpt-4o at $0.0005 per 1K tokens
Reliability
| Routing Mode | Caching | Retry | Analytics | Rate Limiting |
|---|---|---|---|---|
| cloudflare | KV only | Manual | Basic | Platform |
| cloudflare-gateway | Persistent | Automatic | Full | Configurable |
| direct | None | Manual | None | Provider |
Best Practices
1. Default to AI Gateway
2. Use Workers AI for Simple Tasks
3. Configure Appropriate Cache TTL
4. Monitor Gateway Analytics
5. Use Environment-Specific Routing
Troubleshooting
Gateway Not Caching
Symptom: Every request hits provider API Causes:- Gateway not configured in wrangler.toml
- Cache TTL set to 0
- Requests have unique parameters (temperature varies)
- Using streaming (not cacheable)
High Latency
Symptom: Requests taking > 5s Causes:- Using direct routing (no edge optimization)
- Cold start + large model
- No caching enabled
- Provider issues
Rate Limit Errors
Symptom: 429 errors from provider Causes:- Exceeding provider limits
- No rate limiting in gateway
- Burst traffic
Cost Overruns
Symptom: Higher than expected AI costs Causes:- Low cache hit rate
- Using expensive models unnecessarily
- No spending limits

