Basic Usage
Configuration Options
Required Fields
Optional Fields
Provider Selection
OpenAI (GPT Models)
Fast, high-quality models with structured outputs.gpt-4o- Most capable, multimodalgpt-4o-mini- Fast, cost-effective (recommended)o1-mini- Advanced reasoninggpt-4-turbo- Previous generation
gpt-4o-mini: 0.60 outputgpt-4o: 10.00 output
Anthropic (Claude Models)
Strong reasoning, long context, extended thinking.claude-3-5-sonnet-20241022- Most capableclaude-3-5-haiku-20241022- Fast, cost-effectiveclaude-3-opus-20240229- Previous generation
claude-3-5-haiku: 4.00 outputclaude-3-5-sonnet: 15.00 output
Cloudflare Workers AI
Edge-native models with free tier.@cf/meta/llama-3.1-8b-instruct- Fast, general purpose@cf/meta/llama-3.1-70b-instruct- More capable@cf/mistral/mistral-7b-instruct-v0.1- Fast instruction following
Groq
Ultra-fast inference with LPU acceleration.llama-3.1-8b-instant- Fastest (~200ms response)llama-3.1-70b-versatile- More capablemixtral-8x7b-32768- Long context window
Machine Learning Models
For ML inference (embeddings, image classification, object detection, vision), use Workers AI models via theworkers-ai provider.
See: Machine Learning for complete guide including:
- Text embeddings (7 models)
- Image classification
- Object detection
- Vision models
- Text classification
System Prompts
Basic System Prompt
Structured Output Format
Role-Based Prompts
Few-Shot Prompts
Common Patterns
Sentiment Analysis
Classification
Entity Extraction
Text Summarization
Content Generation
Question Answering (RAG)
Translation
Structured Outputs
JSON Mode (OpenAI)
JSON Schema (OpenAI Structured Outputs)
Temperature Guide
Temperature controls randomness and creativity:Token Limits
Control output length and cost:Input Handling
Simple String Input
Multiple Fields
Messages Array (Conversations)
Advanced Techniques
Chain of Thought
Encourage step-by-step reasoning:Self-Consistency
Run multiple times and pick most common answer:Multi-Turn Conversations
Build context across operations:Cost Optimization
1. Use Cheaper Models
2. Aggressive Caching
3. Lower Temperature for Cache Hits
4. Limit Token Usage
5. Use Workers AI Free Tier
6. Track AI Costs with Telemetry
Emit token usage to Analytics Engine for cost tracking and billing:Performance Tips
Use Workers AI for Speed
Use Groq for Fast Inference
Parallel Operations
Run multiple AI operations in parallel:Error Handling
Retry on Failure
Fallback Operation
Handle Rate Limits
Output Parsing
Think operations support schema-aware output mapping - when you define an output schema, the AI response is automatically mapped to your schema field names, making outputs intuitive to use in ensembles.Schema-Aware Output (Recommended)
Define your output schema and access results using your field names:- Schema defines
output: { greeting: string }→ AI response maps togreetingfield - If AI returns valid JSON, all fields are spread to top level
- Metadata (model, provider, tokensUsed) available via
_meta
Text Output (Simple)
For simple text responses without schema:JSON Output (Structured)
When the AI returns JSON, fields are automatically available at top level:Output Metadata
All think operations include metadata in the_meta field:

