think operation using the workers-ai provider.
Overview
Workers AI provides serverless GPU inference for ML models:- Free tier: 10,000 requests/day
- Latency: Runs at edge, near your users
- Provider: Use
workers-aiinthinkoperation - Binding: Requires
[ai]binding in wrangler.toml
- Text Embeddings (7 models)
- Image Classification (1 model)
- Object Detection (1 model)
- Image-to-Text (2 models)
- Vision Models (2 multimodal LLMs)
- Text Classification (2 models)
Configuration
wrangler.toml
Environment Variable
SetCONDUCTOR_AI_PROVIDER=workers-ai or configure per-agent.
Text Embeddings
Convert text into vector representations for semantic search, RAG, clustering, and similarity tasks.Available Models
English Models (BGE):@cf/baai/bge-small-en-v1.5- 384 dimensions, fastest@cf/baai/bge-base-en-v1.5- 768 dimensions, balanced@cf/baai/bge-large-en-v1.5- 1024 dimensions, most accurate
@cf/baai/bge-m3- 1024 dims, 100+ languages, multi-vector retrieval
@cf/google/embeddinggemma-300m- From Gemma 3, 100+ languages@cf/pfnet/plamo-embedding-1b- Japanese text@cf/qwen/qwen3-embedding-0.6b- Chinese/multilingual
Generate Embeddings
Store in Vectorize
Semantic Search
Choosing an Embedding Model
Use bge-small-en-v1.5 when:- Speed is critical
- Low latency required
- English-only content
- Cost-sensitive (fewer dimensions = cheaper storage)
- Balanced performance needed
- General-purpose embeddings
- English content with some multilingual
- Maximum accuracy required
- Complex semantic understanding
- Willing to trade speed for quality
- Multilingual content (100+ languages)
- Need multi-vector retrieval
- Cross-language search
Image Classification
Classify images into categories using ResNet-50.Model
@cf/microsoft/resnet-50- 1000 ImageNet classes
Classify Image
Use Cases
Content Moderation:Object Detection
Detect objects in images with bounding boxes and class labels.Model
@cf/facebook/detr-resnet-50- Detection Transformer
Detect Objects
Use Cases
Count Objects:Image-to-Text
Generate text descriptions or answers from images.Models
@cf/llava-hf/llava-1.5-7b-hf- Vision Q&A and captioning@cf/unum/uform-gen2-qwen-500m- Lightweight image-to-text
Generate Caption
Image Q&A
Use Cases
Accessibility:Vision Models (Multimodal LLMs)
Advanced vision understanding using multimodal language models.Models
@cf/meta/llama-3.2-11b-vision-instruct- Llama with vision@cf/google/gemma-3-12b-it- Gemma with image support
Visual Reasoning
Document OCR
Visual Q&A with Context
Use Cases
Invoice Processing:Text Classification & Reranking
Classify text or rerank search results for better relevance.Models
Reranking:@cf/baai/bge-reranker-base- Semantic similarity scoring
@cf/huggingface/distilbert-sst-2-int8- Positive/negative classification
Rerank Search Results
Sentiment Analysis
Complete Examples
Semantic Search with Reranking
Image Upload Pipeline
Visual Document Processing
Best Practices
Model Selection
Embeddings:- English-only → bge-base-en-v1.5
- Multilingual → bge-m3
- Speed critical → bge-small-en-v1.5
- Max accuracy → bge-large-en-v1.5
- Simple classification → resnet-50
- Object detection → detr-resnet-50
- Image Q&A → llava-1.5-7b-hf
- Complex reasoning → llama-3.2-vision or gemma-3
Caching
Workers AI responses can be cached:Error Handling
Performance Tips
- Batch requests when possible
- Use smaller models for simple tasks
- Cache embeddings for repeated queries
- Parallelize independent operations
- Choose appropriate dimensions (smaller = faster + cheaper storage)
Limitations
Free Tier:- 10,000 requests/day
- Rate limits apply
- Max size varies by model
- Supported formats: JPEG, PNG, WebP
- Must be accessible URLs or base64
- Some models may be in beta
- Check Cloudflare Workers AI docs for latest

