Skip to main content

Overview

Optimize your Conductor workflows for maximum performance, minimal latency, and cost efficiency. Learn caching strategies, parallel execution, model selection, and edge optimization techniques.

Performance Goals

  • Sub-50ms cold starts - Cloudflare Workers edge performance
  • Parallel execution - Concurrent operations when possible
  • Intelligent caching - Reduce redundant operations
  • Model optimization - Right model for the job
  • Efficient data access - Minimize database queries

Quick Wins

1. Enable Parallel Execution

# ❌ Slow - sequential (300ms)
flow:
  - member: fetch-1  # 100ms
  - member: fetch-2  # 100ms
  - member: fetch-3  # 100ms

# ✅ Fast - parallel (100ms)
flow:
  parallel:
    - member: fetch-1
    - member: fetch-2
    - member: fetch-3
Impact: 3x faster execution

2. Cache Aggressively

# ❌ Expensive - no cache
- member: fetch-data
  type: Fetch

# ✅ Cheap - cached
- member: fetch-data
  type: Fetch
  cache:
    ttl: 3600  # 1 hour
Impact: 95%+ cost reduction for repeated queries

3. Use Faster Models

# ❌ Slow - flagship model (~2s)
- member: classify
  config:
    model: gpt-4o

# ✅ Fast - mini model (~200ms)
- member: classify
  config:
    model: gpt-4o-mini
Impact: 10x faster, 97% cheaper

Caching Strategies

Member-Level Caching

- member: expensive-operation
  cache:
    ttl: 3600  # Cache for 1 hour
    key: "custom:${input.userId}:${input.type}"

AI Gateway Caching

config:
  provider: openai
  model: gpt-4o
  routing: cloudflare-gateway  # Persistent cache
  temperature: 0.1  # Lower temp = better cache hits

Database Query Caching

- member: fetch-user
  type: Data
  cache:
    ttl: 300  # Cache user data for 5 minutes
  config:
    storage: d1
    operation: query
    query: "SELECT * FROM users WHERE id = ?"

Cache Invalidation

flow:
  # Update data
  - member: update-user
    type: Data

  # Invalidate cache
  - member: clear-cache
    type: Data
    config:
      storage: kv
      operation: delete
      binding: CACHE
    input:
      key: "user:${input.userId}"

Parallel Execution

Parallel Data Fetching

parallel:
  - member: fetch-user
  - member: fetch-orders
  - member: fetch-products
  - member: fetch-analytics

# All complete in max(individual_times) instead of sum(individual_times)

Parallel AI Calls

parallel:
  - member: classify-sentiment
  - member: extract-entities
  - member: summarize
  - member: translate

Nested Parallelism

flow:
  parallel:
    - member: group-1
      flow:
        parallel:
          - member: task-1a
          - member: task-1b

    - member: group-2
      flow:
        parallel:
          - member: task-2a
          - member: task-2b

Model Selection

By Task Complexity

# Simple classification - Mini
- member: classify
  config:
    model: gpt-4o-mini  # Fast, cheap

# Complex reasoning - Flagship
- member: analyze
  config:
    model: gpt-4o  # Slower, expensive

# Long-form writing - Sonnet
- member: write
  config:
    provider: anthropic
    model: claude-3-5-sonnet-20241022

By Latency Requirements

# Ultra-fast (< 100ms) - Workers AI
- member: quick-classify
  config:
    provider: workers-ai
    model: "@cf/meta/llama-3.1-8b-instruct"

# Fast (< 500ms) - Mini models
- member: fast-task
  config:
    model: gpt-4o-mini

# Quality (1-2s) - Flagship models
- member: quality-task
  config:
    model: gpt-4o

Cascade Pattern

flow:
  # Try fast model first
  - member: quick-attempt
    config:
      model: gpt-4o-mini
    scoring:
      thresholds:
        minimum: 0.7
      onFailure: continue

  # Escalate to better model if needed
  - member: quality-attempt
    condition: ${quick-attempt.scoring.score < 0.7}
    config:
      model: gpt-4o

Temperature Optimization

For Caching

# ✅ Better cache hit rate
config:
  temperature: 0.1  # More deterministic

# ❌ Poor cache hit rate
config:
  temperature: 0.9  # Less deterministic

By Use Case

# Factual tasks - Low temperature
- member: extract-facts
  config:
    temperature: 0.1

# Creative tasks - High temperature
- member: generate-ideas
  config:
    temperature: 0.9

# Balanced tasks - Medium temperature
- member: write-content
  config:
    temperature: 0.7

Database Optimization

Query Optimization

# ✅ Indexed query
query: "SELECT * FROM users WHERE id = ?"  # Fast with index

# ❌ Full table scan
query: "SELECT * FROM users WHERE email LIKE '%@%'"  # Slow

Batch Operations

# ❌ Slow - individual inserts
- member: insert-item
  foreach: ${items}
  type: Data
  config:
    query: "INSERT INTO items VALUES (?)"

# ✅ Fast - batch insert
- member: insert-batch
  type: Data
  config:
    query: "INSERT INTO items VALUES ${items.map(() => '(?)').join(',')}"

Connection Pooling

Conductor automatically manages database connections efficiently.

Edge Optimization

Minimize Cold Starts

# ✅ Lightweight imports
import { Conductor } from '@ensemble-edge/conductor';

# ❌ Heavy imports increase cold start
import entire_ml_library from 'huge-package';

Use Workers AI

# ✅ Sub-50ms cold start
- member: classify
  config:
    provider: workers-ai
    routing: cloudflare

# ❌ Slower cold start
- member: classify
  config:
    provider: openai
    routing: direct

Regional Deployment

Deploy to regions closest to users for minimum latency.

Cost Optimization

Model Costs

# Cost per 1M tokens (input/output)
gpt-4o: $5.00 / $15.00
gpt-4o-mini: $0.15 / $0.60  # 97% cheaper
claude-3-5-sonnet: $3.00 / $15.00
workers-ai: ~$0.01  # Edge model

Reduce Token Usage

# ✅ Concise prompt
prompt: "Classify sentiment: ${text}"

# ❌ Verbose prompt
prompt: |
  I would like you to analyze the sentiment of the following text.
  Please carefully read it and determine if it's positive or negative.
  Here is the text: ${text}

Cache Everything

- member: expensive-ai-call
  config:
    routing: cloudflare-gateway  # Automatic caching
  cache:
    ttl: 3600  # Member-level cache

Batch AI Requests

# Process multiple items in one prompt
prompt: |
  Classify sentiment for each:
  ${items.map((item, i) => `${i+1}. ${item}`).join('\n')}

Monitoring Performance

Track Execution Time

const startTime = Date.now();
const result = await conductor.executeEnsemble('my-workflow', input);
const duration = Date.now() - startTime;

console.log(`Execution time: ${duration}ms`);

Member-Level Metrics

output:
  metrics:
    fetchDuration: ${fetch-data.duration}
    aiDuration: ${analyze.duration}
    totalDuration: ${execution.duration}

CloudFlare Analytics

Use Cloudflare Workers Analytics to track:
  • Request count
  • Response time (p50, p95, p99)
  • Error rate
  • Cache hit rate

Real-World Optimizations

Before Optimization

# Sequential, no caching, flagship model
# Cost: $0.50 per request, 5s latency

flow:
  - member: fetch-1  # 1s
  - member: fetch-2  # 1s
  - member: analyze  # 2s, gpt-4o
  - member: fetch-3  # 1s

After Optimization

# Parallel, cached, optimized model
# Cost: $0.02 per request, 2s latency

flow:
  parallel:
    - member: fetch-1  # 1s, cached
      cache: { ttl: 3600 }
    - member: fetch-2  # 1s, cached
      cache: { ttl: 3600 }
    - member: fetch-3  # 1s, cached
      cache: { ttl: 3600 }

  - member: analyze  # 2s, gpt-4o-mini
    config:
      model: gpt-4o-mini
      routing: cloudflare-gateway
Result: 25x cheaper, 2.5x faster

Benchmarking

import { describe, it } from 'vitest';
import { TestConductor } from '@ensemble-edge/conductor/testing';

describe('performance benchmarks', () => {
  it('should execute under 2 seconds', async () => {
    const conductor = await TestConductor.create();

    const start = performance.now();
    await conductor.executeEnsemble('my-workflow', input);
    const duration = performance.now() - start;

    expect(duration).toBeLessThan(2000);
  });

  it('should cache effectively', async () => {
    const conductor = await TestConductor.create();

    // First call
    const start1 = performance.now();
    await conductor.executeEnsemble('cached-workflow', input);
    const duration1 = performance.now() - start1;

    // Second call (cached)
    const start2 = performance.now();
    await conductor.executeEnsemble('cached-workflow', input);
    const duration2 = performance.now() - start2;

    expect(duration2).toBeLessThan(duration1 * 0.1);  // 10x faster
  });
});

Best Practices

  1. Parallelize independent operations - Use parallel: blocks
  2. Cache aggressively - Set appropriate TTLs
  3. Choose right model - Don’t use flagship for simple tasks
  4. Lower temperature - When determinism helps
  5. Batch operations - Reduce database round-trips
  6. Monitor metrics - Track performance over time
  7. Test with realistic data - Use production-like volumes
  8. Profile bottlenecks - Find and fix slowest operations
  9. Use edge compute - Cloudflare Workers for low latency
  10. Optimize prompts - Concise = faster + cheaper