Skip to main content

What is Routing?

Routing determines how Conductor connects to AI providers (OpenAI, Anthropic, Workers AI, Groq). The right routing mode can dramatically improve performance, reduce costs, and increase reliability.

cloudflare

Platform-native Workers AI with ultra-low latency

cloudflare-gateway

AI Gateway with caching, analytics, and cost controls

direct

Direct API calls to OpenAI, Anthropic, Groq, etc.

Three Routing Modes

1. Cloudflare (Platform-Native)

For: Workers AI models running on Cloudflare’s network
- member: generate-summary
  type: Think
  config:
    provider: workers-ai
    model: "@cf/meta/llama-3.1-8b-instruct"
    routing: cloudflare  # Platform-native
    temperature: 0.7
Characteristics:
  • Ultra-fast - Sub-10ms latency to model
  • 💰 Cost-effective - Cloudflare’s pricing (often free tier)
  • 🔐 No API keys - Uses Workers AI binding
  • 🌍 Edge execution - Runs closest to your users
  • 📦 Smaller models - 7B-70B parameter range
When to use:
  • Latency is critical (< 50ms cold start)
  • Cost optimization for high-volume workloads
  • Simple tasks (summarization, classification, extraction)
  • No external API key management desired
Limitations:
  • Only Workers AI models available
  • Smaller context windows than GPT-4/Claude
  • Less sophisticated reasoning for complex tasks
For: OpenAI, Anthropic, Groq through AI Gateway
- member: generate-analysis
  type: Think
  config:
    provider: anthropic
    model: claude-3-5-sonnet-20241022
    routing: cloudflare-gateway  # Route through AI Gateway
    temperature: 0.7
Characteristics:
  • 🗄️ Persistent cache - Cache spans deployments and users
  • 📊 Real-time analytics - Dashboard for costs, latency, errors
  • 💵 Cost controls - Set spending limits and rate limits
  • 🔄 Retry logic - Automatic retries with exponential backoff
  • 🌐 Multi-provider - OpenAI, Anthropic, Groq, etc.
When to use:
  • Always for production AI calls (unless using Workers AI)
  • Need caching across requests and deployments
  • Want visibility into AI spending and usage
  • Multiple environments (dev, staging, prod)
  • A/B testing different models or prompts
Benefits:
Cache survives deployments. First user pays, everyone else benefits. Can reduce costs by 90%+ for repeated queries.
Real-time metrics on:
  • Cache hit rates
  • Request volume
  • Cost per model
  • Latency percentiles
  • Error rates by provider
Set hard limits:
  • Max spend per day/month
  • Rate limits per user
  • Alert thresholds
  • Budget allocation by environment
Split traffic between:
  • Different models (GPT-4 vs Claude)
  • Different prompts
  • Different temperatures
  • Track which performs better

3. Direct

For: Direct API calls bypassing AI Gateway
- member: generate-code
  type: Think
  config:
    provider: openai
    model: o1-preview
    routing: direct  # Bypass gateway
    temperature: 1.0
Characteristics:
  • 🎯 Direct connection - No intermediary
  • 🆕 Latest features - Provider-specific capabilities
  • 🔧 Full control - All provider parameters available
  • No gateway benefits - No cache, analytics, or limits
When to use:
  • Testing new provider features not yet in gateway
  • Provider-specific parameters needed
  • Debugging provider-specific issues
  • Very low request volume (caching not beneficial)
Tradeoffs:
  • Miss out on persistent caching
  • No analytics or cost controls
  • Manual retry logic needed
  • Direct API keys required

Configuration Examples

Workers AI (Platform-Native)

name: edge-classification

flow:
  - member: classify-intent
    type: Think
    config:
      provider: workers-ai
      model: "@cf/meta/llama-3.1-8b-instruct"
      routing: cloudflare
      systemPrompt: "Classify user intent into: question, request, or complaint."
Setup required:
# wrangler.toml
[ai]
binding = "AI"

OpenAI via Gateway

name: generate-article

flow:
  - member: write-draft
    type: Think
    config:
      provider: openai
      model: gpt-4o
      routing: cloudflare-gateway
      temperature: 0.7
      maxTokens: 2000
Setup required:
# wrangler.toml
[[ai.gateway]]
id = "my-gateway"
cache_ttl = 3600
# Environment variable
export OPENAI_API_KEY="sk-..."

Anthropic via Gateway

name: analyze-sentiment

flow:
  - member: detect-sentiment
    type: Think
    config:
      provider: anthropic
      model: claude-3-5-sonnet-20241022
      routing: cloudflare-gateway
      temperature: 0.3
Setup required:
# Environment variable
export ANTHROPIC_API_KEY="sk-ant-..."

Groq via Gateway

name: fast-extraction

flow:
  - member: extract-entities
    type: Think
    config:
      provider: groq
      model: llama-3.1-70b-versatile
      routing: cloudflare-gateway  # Ultra-fast with caching
      temperature: 0.2
Setup required:
# Environment variable
export GROQ_API_KEY="gsk_..."

Direct OpenAI (No Gateway)

name: test-new-model

flow:
  - member: try-o1
    type: Think
    config:
      provider: openai
      model: o1-preview
      routing: direct  # Bypass gateway for testing
      temperature: 1.0

Multi-Model Strategies

Cascade Pattern

Start with fast/cheap model, escalate to powerful model if needed:
flow:
  # Try fast edge model first
  - member: quick-answer
    config:
      provider: workers-ai
      model: "@cf/meta/llama-3.1-8b-instruct"
      routing: cloudflare
    scoring:
      thresholds:
        minimum: 0.7
      onFailure: continue  # Don't abort if low quality

  # Escalate to powerful model if needed
  - member: detailed-answer
    condition: ${quick-answer.scoring.score < 0.7}
    config:
      provider: anthropic
      model: claude-3-5-sonnet-20241022
      routing: cloudflare-gateway

Load Balancing

Distribute requests across providers:
flow:
  - member: generate-summary
    config:
      # Alternate between providers
      provider: ${input.requestId % 2 === 0 ? 'openai' : 'anthropic'}
      model: ${input.requestId % 2 === 0 ? 'gpt-4o' : 'claude-3-5-sonnet-20241022'}
      routing: cloudflare-gateway

Fallback Pattern

Try primary provider, fallback to secondary if unavailable:
flow:
  - member: generate-content
    config:
      provider: openai
      model: gpt-4o
      routing: cloudflare-gateway
    retry:
      maxAttempts: 2
      onFailure: continue

  # Fallback if OpenAI fails
  - member: generate-content-fallback
    condition: ${!generate-content.success}
    config:
      provider: anthropic
      model: claude-3-5-sonnet-20241022
      routing: cloudflare-gateway

Cost Optimization

Use cheaper models for simple tasks, expensive for complex:
flow:
  # Classification with edge model (cheap)
  - member: classify-intent
    config:
      provider: workers-ai
      model: "@cf/meta/llama-3.1-8b-instruct"
      routing: cloudflare

  # Complex reasoning with flagship model (expensive)
  - member: generate-strategy
    condition: ${classify-intent.output.intent === 'complex'}
    config:
      provider: anthropic
      model: claude-3-5-sonnet-20241022
      routing: cloudflare-gateway

Performance Comparison

Latency

Routing ModeCold StartWarmCache Hit
cloudflare< 50ms< 10ms< 5ms
cloudflare-gateway< 100ms500-2000ms< 5ms
direct< 100ms500-2000msN/A

Cost (per 1K tokens)

ProviderModelApprox Cost
Workers AIllama-3.1-8b$0.001
OpenAIgpt-4o$0.005
OpenAIgpt-4o-mini$0.0002
Anthropicclaude-3.5-sonnet$0.003
Groqllama-3.1-70b$0.0008
With Gateway Cache (90% hit rate):
  • Effective cost: 10% of base cost
  • Example: gpt-4o at $0.0005 per 1K tokens

Reliability

Routing ModeCachingRetryAnalyticsRate Limiting
cloudflareKV onlyManualBasicPlatform
cloudflare-gatewayPersistentAutomaticFullConfigurable
directNoneManualNoneProvider

Best Practices

1. Default to AI Gateway

# ✅ Use gateway for production
config:
  routing: cloudflare-gateway

# ❌ Don't use direct without reason
config:
  routing: direct

2. Use Workers AI for Simple Tasks

# ✅ Edge model for classification
- member: classify-email
  config:
    provider: workers-ai
    routing: cloudflare

# ❌ Don't waste GPT-4 on simple tasks
- member: classify-email
  config:
    provider: openai
    model: gpt-4o  # Overkill

3. Configure Appropriate Cache TTL

# ✅ Long cache for stable queries
- member: summarize-docs
  config:
    routing: cloudflare-gateway
  cache:
    ttl: 86400  # 24 hours

# ✅ Short cache for dynamic data
- member: analyze-live-feed
  config:
    routing: cloudflare-gateway
  cache:
    ttl: 60  # 1 minute

4. Monitor Gateway Analytics

// Check dashboard regularly
// Cloudflare Dashboard -> AI Gateway
// - Cache hit rates (target: > 80%)
// - Error rates (target: < 1%)
// - Latency p95 (target: < 3s)
// - Cost trends

5. Use Environment-Specific Routing

flow:
  - member: generate-text
    config:
      # Development: use cheap edge models
      # Production: use powerful models via gateway
      provider: ${env.ENVIRONMENT === 'production' ? 'anthropic' : 'workers-ai'}
      model: ${env.ENVIRONMENT === 'production' ? 'claude-3-5-sonnet-20241022' : '@cf/meta/llama-3.1-8b-instruct'}
      routing: ${env.ENVIRONMENT === 'production' ? 'cloudflare-gateway' : 'cloudflare'}

Troubleshooting

Gateway Not Caching

Symptom: Every request hits provider API Causes:
  1. Gateway not configured in wrangler.toml
  2. Cache TTL set to 0
  3. Requests have unique parameters (temperature varies)
  4. Using streaming (not cacheable)
Fix:
# wrangler.toml
[[ai.gateway]]
id = "my-gateway"
cache_ttl = 3600  # Enable caching

High Latency

Symptom: Requests taking > 5s Causes:
  1. Using direct routing (no edge optimization)
  2. Cold start + large model
  3. No caching enabled
  4. Provider issues
Fix:
# Use gateway for caching
config:
  routing: cloudflare-gateway
cache:
  ttl: 3600  # Enable KV cache too

Rate Limit Errors

Symptom: 429 errors from provider Causes:
  1. Exceeding provider limits
  2. No rate limiting in gateway
  3. Burst traffic
Fix:
# Configure rate limits in gateway
[[ai.gateway]]
id = "my-gateway"
rate_limiting_requests_per_minute = 60

Cost Overruns

Symptom: Higher than expected AI costs Causes:
  1. Low cache hit rate
  2. Using expensive models unnecessarily
  3. No spending limits
Fix:
# Use edge models for simple tasks
- member: simple-task
  config:
    provider: workers-ai
    routing: cloudflare

# Set gateway spending limits via dashboard
# Cloudflare Dashboard -> AI Gateway -> Settings