Skip to main content

Overview

Provider routing enables dynamic selection of AI providers based on task requirements, availability, cost constraints, or performance metrics. Implement fallback strategies, load balancing, and intelligent routing to optimize your AI workflows.

Routing Strategies

1. Static Routing

Hardcode provider selection:
- member: generate-content
  type: Think
  config:
    provider: anthropic
    model: claude-3-5-sonnet-20241022
  input:
    prompt: ${input.prompt}
Use when:
  • Single provider meets all needs
  • Simplicity is priority
  • No fallback required

2. Environment-Based Routing

Select provider via environment variables:
- member: generate-content
  type: Think
  config:
    provider: ${env.AI_PROVIDER}
    model: ${env.AI_MODEL}
  input:
    prompt: ${input.prompt}
# Development
AI_PROVIDER=cloudflare
AI_MODEL=@cf/meta/llama-3.1-8b-instruct

# Production
AI_PROVIDER=anthropic
AI_MODEL=claude-3-5-sonnet-20241022
Use when:
  • Different providers per environment
  • Easy configuration changes
  • No code changes needed

3. Task-Based Routing

Route based on task characteristics:
- member: route-by-task
  type: Function
  config:
    handler: |-
      (input) => {
        const { taskType, complexity } = input;
        
        if (taskType === 'classification') {
          return {
            provider: 'cloudflare',
            model: '@cf/meta/llama-3.1-8b-instruct'
          };
        } else if (complexity === 'high') {
          return {
            provider: 'anthropic',
            model: 'claude-3-opus-20240229'
          };
        } else {
          return {
            provider: 'anthropic',
            model: 'claude-3-5-sonnet-20241022'
          };
        }
      }

- member: execute-with-route
  type: Think
  config:
    provider: ${route-by-task.output.provider}
    model: ${route-by-task.output.model}
  input:
    prompt: ${input.prompt}
Use when:
  • Multiple task types
  • Cost optimization needed
  • Performance varies by task

4. Fallback Routing

Try multiple providers in sequence:
flow:
  # Try primary provider
  - member: try-anthropic
    type: Think
    config:
      provider: anthropic
      model: claude-3-5-sonnet-20241022
    input:
      prompt: ${input.prompt}
  
  # Fallback to OpenAI if Anthropic fails
  - member: fallback-openai
    condition: ${try-anthropic.error}
    type: Think
    config:
      provider: openai
      model: gpt-4-turbo-preview
    input:
      prompt: ${input.prompt}
  
  # Final fallback to Cloudflare
  - member: fallback-cloudflare
    condition: ${fallback-openai.error}
    type: Think
    config:
      provider: cloudflare
      model: '@cf/meta/llama-3.1-8b-instruct'
    input:
      prompt: ${input.prompt}
  
  # Combine results
  - member: get-result
    type: Function
    config:
      handler: |-
        (input) => ({
          result: input.tryAnthropic?.output || 
                  input.fallbackOpenai?.output || 
                  input.fallbackCloudflare?.output,
          provider: input.tryAnthropic?.output ? 'anthropic' :
                   input.fallbackOpenai?.output ? 'openai' : 'cloudflare'
        })
Use when:
  • High availability required
  • Provider outages possible
  • Redundancy needed

5. Cost-Based Routing

Route based on budget constraints:
- member: check-budget
  type: Data
  config:
    type: kv
    operation: get
    key: 'monthly_ai_spend'

- member: route-by-cost
  type: Function
  config:
    handler: |-
      (input) => {
        const spent = input.checkBudget.value || 0;
        const budget = 1000; // $1000/month
        
        if (spent < budget * 0.8) {
          // Under 80% - use best model
          return {
            provider: 'anthropic',
            model: 'claude-3-5-sonnet-20241022'
          };
        } else if (spent < budget * 0.95) {
          // 80-95% - use cheaper model
          return {
            provider: 'anthropic',
            model: 'claude-3-haiku-20240307'
          };
        } else {
          // Near budget - use free model
          return {
            provider: 'cloudflare',
            model: '@cf/meta/llama-3.1-8b-instruct'
          };
        }
      }

- member: generate-with-budget
  type: Think
  config:
    provider: ${route-by-cost.output.provider}
    model: ${route-by-cost.output.model}
  input:
    prompt: ${input.prompt}
Use when:
  • Budget constraints exist
  • Cost control is priority
  • Usage varies monthly

6. Load Balancing

Distribute requests across providers:
- member: select-provider
  type: Function
  config:
    handler: |-
      (input) => {
        const providers = [
          { provider: 'anthropic', model: 'claude-3-5-sonnet-20241022', weight: 0.5 },
          { provider: 'openai', model: 'gpt-4-turbo-preview', weight: 0.3 },
          { provider: 'cloudflare', model: '@cf/meta/llama-3.1-8b-instruct', weight: 0.2 }
        ];
        
        const random = Math.random();
        let cumulative = 0;
        
        for (const p of providers) {
          cumulative += p.weight;
          if (random <= cumulative) {
            return p;
          }
        }
        
        return providers[0];
      }

- member: execute-balanced
  type: Think
  config:
    provider: ${select-provider.output.provider}
    model: ${select-provider.output.model}
  input:
    prompt: ${input.prompt}
Use when:
  • High volume requests
  • Multiple providers available
  • Rate limits are concern

7. Performance-Based Routing

Route based on latency requirements:
- member: route-by-latency
  type: Function
  config:
    handler: |-
      (input) => {
        const { maxLatency } = input;
        
        if (maxLatency < 1000) {
          // Fast: < 1 second
          return {
            provider: 'anthropic',
            model: 'claude-3-haiku-20240307'
          };
        } else if (maxLatency < 3000) {
          // Standard: 1-3 seconds
          return {
            provider: 'anthropic',
            model: 'claude-3-5-sonnet-20241022'
          };
        } else {
          // Complex: 3+ seconds OK
          return {
            provider: 'anthropic',
            model: 'claude-3-opus-20240229'
          };
        }
      }

- member: execute-with-latency
  type: Think
  config:
    provider: ${route-by-latency.output.provider}
    model: ${route-by-latency.output.model}
  input:
    prompt: ${input.prompt}
Use when:
  • SLAs define latency
  • Real-time vs batch processing
  • User-facing vs background

8. Feature-Based Routing

Route based on required capabilities:
- member: route-by-features
  type: Function
  config:
    handler: |-
      (input) => {
        const { needsVision, needsCode, needsLongContext } = input;
        
        if (needsVision) {
          return {
            provider: 'anthropic',
            model: 'claude-3-5-sonnet-20241022' // Supports vision
          };
        } else if (needsLongContext) {
          return {
            provider: 'openai',
            model: 'gpt-4-turbo-preview' // 128K context
          };
        } else if (needsCode) {
          return {
            provider: 'anthropic',
            model: 'claude-3-5-sonnet-20241022' // Best for code
          };
        } else {
          return {
            provider: 'cloudflare',
            model: '@cf/meta/llama-3.1-8b-instruct' // General
          };
        }
      }
Use when:
  • Different capabilities needed
  • Model-specific features required
  • Capabilities vary by provider

Advanced Patterns

Circuit Breaker

Temporarily disable failing providers:
- member: check-circuit
  type: Data
  config:
    type: kv
    operation: get
    key: 'circuit:anthropic'

- member: select-provider
  type: Function
  config:
    handler: |-
      (input) => {
        const circuit = input.checkCircuit.value;
        
        if (circuit?.open && Date.now() < circuit.resetAt) {
          // Circuit open - use alternative
          return { provider: 'openai', model: 'gpt-4-turbo-preview' };
        }
        
        // Circuit closed or reset - use primary
        return { provider: 'anthropic', model: 'claude-3-5-sonnet-20241022' };
      }

- member: execute-request
  type: Think
  config:
    provider: ${select-provider.output.provider}
    model: ${select-provider.output.model}
  input:
    prompt: ${input.prompt}

- member: update-circuit
  condition: ${execute-request.error}
  type: Data
  config:
    type: kv
    operation: put
    key: 'circuit:${select-provider.output.provider}'
    value:
      open: true
      resetAt: ${Date.now() + 60000}  # 1 minute

Retry with Backoff

Retry failed requests with different providers:
- member: execute-with-retry
  type: Function
  config:
    handler: |-
      async (input, context) => {
        const providers = [
          { provider: 'anthropic', model: 'claude-3-5-sonnet-20241022' },
          { provider: 'openai', model: 'gpt-4-turbo-preview' },
          { provider: 'cloudflare', model: '@cf/meta/llama-3.1-8b-instruct' }
        ];
        
        for (let i = 0; i < providers.length; i++) {
          try {
            const result = await executeAI(providers[i], input.prompt, context);
            return { result, provider: providers[i].provider, attempts: i + 1 };
          } catch (error) {
            if (i === providers.length - 1) throw error;
            await new Promise(r => setTimeout(r, 1000 * Math.pow(2, i)));
          }
        }
      }

A/B Testing

Test different providers/models:
- member: ab-test-provider
  type: Function
  config:
    handler: |-
      (input) => {
        const userId = input.userId;
        const hash = userId.split('').reduce((a, b) => a + b.charCodeAt(0), 0);
        const bucket = hash % 100;
        
        if (bucket < 50) {
          // Group A - Claude
          return {
            provider: 'anthropic',
            model: 'claude-3-5-sonnet-20241022',
            group: 'A'
          };
        } else {
          // Group B - GPT-4
          return {
            provider: 'openai',
            model: 'gpt-4-turbo-preview',
            group: 'B'
          };
        }
      }

- member: execute-ab-test
  type: Think
  config:
    provider: ${ab-test-provider.output.provider}
    model: ${ab-test-provider.output.model}
  input:
    prompt: ${input.prompt}

- member: log-ab-result
  type: Data
  config:
    type: d1
    operation: execute
    query: |-
      INSERT INTO ab_tests (user_id, group, provider, model, latency, tokens)
      VALUES (?, ?, ?, ?, ?, ?)
    params:
      - ${input.userId}
      - ${ab-test-provider.output.group}
      - ${ab-test-provider.output.provider}
      - ${ab-test-provider.output.model}
      - ${execute-ab-test.duration}
      - ${execute-ab-test.output.tokensUsed}

Monitoring and Metrics

Track routing effectiveness:
- member: log-provider-usage
  type: Data
  config:
    type: d1
    operation: execute
    query: |-
      INSERT INTO provider_metrics (
        provider, model, latency, tokens, cost, success, timestamp
      ) VALUES (?, ?, ?, ?, ?, ?, ?)
    params:
      - ${execute-request.provider}
      - ${execute-request.model}
      - ${execute-request.duration}
      - ${execute-request.tokensUsed}
      - ${execute-request.estimatedCost}
      - ${!execute-request.error}
      - ${Date.now()}

Best Practices

  1. Start simple - Use static routing initially
  2. Add fallbacks - Implement for production
  3. Monitor costs - Track spend per provider
  4. Test routing logic - Validate with different scenarios
  5. Use circuit breakers - Handle provider outages
  6. Implement timeouts - Prevent hanging requests
  7. Log routing decisions - Debug and optimize
  8. A/B test providers - Compare quality and cost
  9. Cache routing decisions - Avoid redundant logic
  10. Document strategy - Explain routing choices