Built-in - Framework-level agent. Configure only - cannot modify source.
Overview
The RAG agent provides manual Retrieval-Augmented Generation using Cloudflare AI for embeddings and Cloudflare Vectorize for vector storage. This is a low-level agent for building custom RAG pipelines where you control indexing and search operations.
For fully managed RAG with automatic document processing, see AutoRAG in the Starter Kit.
Key Features:
- Manual control over indexing and search operations
- Cloudflare AI embeddings using
@cf/baai/bge-base-en-v1.5 (default)
- Automatic chunking with semantic, fixed, or recursive strategies
- Optional reranking with cross-encoder models for better relevance
- Multi-tenant isolation via namespaces
- Batch processing for efficient large-scale indexing (100 texts per embedding batch, 1000 vectors per upsert)
Required Bindings
Add these to your wrangler.toml:
# Required: Cloudflare AI binding
[ai]
binding = "AI"
# Required: Vectorize index binding
[[vectorize]]
binding = "VECTORIZE"
index_name = "my-rag-index"
Create the Vectorize index:
wrangler vectorize create my-rag-index --dimensions 768 --metric cosine
The default embedding model @cf/baai/bge-base-en-v1.5 produces 768-dimensional vectors. Match your index dimensions accordingly.
Operations
The RAG agent supports two primary operations:
1. Index Operation
Index documents into the vector store with automatic chunking and embedding.
agents:
- name: index-doc
agent: rag
config:
operation: index
embeddingModel: "@cf/baai/bge-base-en-v1.5"
chunkStrategy: semantic
chunkSize: 512
overlap: 50
input:
content: ${input.document}
id: ${input.documentId}
metadata:
source: ${input.source}
timestamp: ${input.timestamp}
Configuration:
config:
operation: index
embeddingModel: string # Model (default: @cf/baai/bge-base-en-v1.5)
chunkStrategy: string # "semantic" | "fixed" | "recursive" (default: semantic)
chunkSize: number # Target chunk size in characters (default: 512)
overlap: number # Overlap between chunks (default: 50)
namespace: string # Optional namespace for multi-tenant isolation
input:
content: string # Text to index (required)
id: string # Document ID (required)
source: string # Optional source identifier
metadata: object # Additional metadata to store
Output:
{
indexed: number; // Number of documents indexed
chunks: number; // Number of chunks created
embeddingModel: string; // Model used for embeddings
chunkStrategy: string; // Strategy used for chunking
}
2. Search Operation
Search the vector store using semantic similarity.
agents:
- name: search
agent: rag
config:
operation: search
topK: 5
rerank: true
rerankModel: "@cf/baai/bge-reranker-base"
input:
query: ${input.question}
Configuration:
config:
operation: search
embeddingModel: string # Model (default: @cf/baai/bge-base-en-v1.5)
topK: number # Number of results (default: 5)
rerank: boolean # Enable cross-encoder reranking (default: false)
rerankModel: string # Reranker model (default: @cf/baai/bge-reranker-base)
namespace: string # Optional namespace filter
input:
query: string # Search query (required)
filter: object # Metadata filters
topK: number # Override default topK
rerank: boolean # Override default rerank setting
Output:
{
results: Array<{
id: string; // Vector ID
score: number; // Similarity score (0-1)
content: string; // Original content chunk
metadata: object; // Stored metadata
}>;
count: number; // Total number of results
reranked: boolean; // Whether results were reranked
}
Complete RAG Pipeline
Build a complete question-answering system with manual RAG:
ensemble: rag-qa
agents:
# 1. Search for relevant docs
- name: search
agent: rag
config:
operation: search
topK: 5
rerank: true
filter:
source: documentation
input:
query: ${input.question}
# 2. Generate answer using context
- name: answer
operation: think
config:
provider: anthropic
model: claude-sonnet-4
prompt: |
Answer this question using ONLY the following context.
If the answer isn't in the context, say "I don't know."
Context:
${search.output.results.map(r => r.content).join('\n\n')}
Question: ${input.question}
# 3. Format response with sources
- name: format-response
operation: code
config:
script: scripts/format-response
input:
answer: ${answer.output}
results: ${search.output.results}
output:
response: ${format-response.output}
// scripts/format-response.ts
import type { AgentExecutionContext } from '@ensemble-edge/conductor'
export default function formatResponse(context: AgentExecutionContext) {
const { answer, results } = context.input
return {
answer: answer,
sources: results.map((r: any) => ({
text: r.content.substring(0, 200),
metadata: r.metadata,
score: r.score
}))
}
}
Advanced Patterns
Hybrid Search
Combine vector search with keyword search for better results:
agents:
# Vector search
- name: vector-search
agent: rag
config:
operation: search
topK: 10
input:
query: ${input.question}
# Keyword search (D1)
- name: keyword-search
operation: data
config:
backend: d1
binding: DB
operation: query
sql: |
SELECT * FROM documents
WHERE content LIKE ?
LIMIT 10
params: ['%${input.question}%']
# Combine and rerank results
- name: combine
operation: code
config:
script: scripts/combine-results
input:
vectorResults: ${vector-search.output.results}
keywordResults: ${keyword-search.output}
Multi-Query RAG
Generate multiple search queries for better coverage:
agents:
# Generate multiple search queries
- name: generate-queries
operation: think
config:
provider: anthropic
model: claude-sonnet-4
prompt: |
Generate 3 different search queries for: ${input.question}
Return as JSON array.
# Search with each query
- name: search-1
agent: rag
config:
operation: search
topK: 5
input:
query: ${generate-queries.output[0]}
- name: search-2
agent: rag
config:
operation: search
topK: 5
input:
query: ${generate-queries.output[1]}
- name: search-3
agent: rag
config:
operation: search
topK: 5
input:
query: ${generate-queries.output[2]}
# Deduplicate and combine
- name: combine
operation: code
config:
script: scripts/combine-search-results
input:
results1: ${search-1.output.results}
results2: ${search-2.output.results}
results3: ${search-3.output.results}
// scripts/combine-search-results.ts
import type { AgentExecutionContext } from '@ensemble-edge/conductor'
export default function combineSearchResults(context: AgentExecutionContext) {
const { results1, results2, results3 } = context.input
const all = [...results1, ...results2, ...results3]
// Deduplicate by ID
const unique = [...new Map(all.map((r: any) => [r.id, r])).values()]
return { results: unique.slice(0, 10) }
}
Use metadata filters to narrow search scope:
agents:
- name: search
agent: rag
config:
operation: search
topK: 5
namespace: production
input:
query: ${input.question}
filter:
category: ${input.category}
published_after: ${input.date}
author: ${input.author}
Incremental Indexing
Index documents in batches with custom chunking:
agents:
- name: chunk
operation: code
config:
script: scripts/chunk-document
input:
document: ${input.document}
chunkSize: 1000
- name: index-chunks
agent: rag
config:
operation: index
namespace: ${input.tenant_id}
input:
content: ${chunk.output.chunks}
id: ${input.document_id}
metadata:
document_id: ${input.document_id}
chunk_index: ${chunk.output.chunks.index}
// scripts/chunk-document.ts
import type { AgentExecutionContext } from '@ensemble-edge/conductor'
export default function chunkDocument(context: AgentExecutionContext) {
const { document, chunkSize } = context.input
const chunks = []
for (let i = 0; i < document.length; i += chunkSize) {
chunks.push({
text: document.substring(i, i + chunkSize),
index: i / chunkSize
})
}
return { chunks }
}
Best Practices
1. Chunk Documents Intelligently
# Good: Semantic chunks with overlap
chunkStrategy: semantic
chunkSize: 512
overlap: 50
# Avoid: Large chunks without overlap
chunkStrategy: fixed
chunkSize: 5000
overlap: 0
metadata:
source: url or file path
title: document title
author: content creator
published: ISO date string
category: classification
tags: [tag1, tag2]
3. Use Namespaces for Isolation
# Separate by tenant, version, or document type
namespace: tenant-${input.tenant_id}
namespace: docs-v2
namespace: support-tickets
4. Enable Reranking for Quality
config:
topK: 10 # Get more initial results
rerank: true # Re-score with cross-encoder
rerankModel: "@cf/baai/bge-reranker-base"
5. Cache Search Results
agents:
- name: search
agent: rag
cache:
ttl: 3600
key: search-${hash(input.query)}
Common Use Cases
Documentation Q&A
ensemble: docs-qa
agents:
- name: search-docs
agent: rag
config:
operation: search
namespace: documentation
topK: 3
rerank: true
input:
query: ${input.question}
- name: answer
operation: think
config:
provider: anthropic
model: claude-sonnet-4
prompt: |
Answer using these docs:
${search-docs.output.results.map(r => r.content).join('\n\n')}
Question: ${input.question}
Customer Support Assistant
ensemble: support-assistant
agents:
- name: search-tickets
agent: rag
config:
operation: search
topK: 5
namespace: support-tickets
input:
query: ${input.issue}
filter:
status: resolved
sentiment: positive
- name: suggest-solution
operation: think
config:
provider: anthropic
model: claude-sonnet-4
prompt: |
Suggest a solution based on these similar resolved tickets:
${search-tickets.output.results.map(r => r.content).join('\n\n')}
Current issue: ${input.issue}
Content Recommendations
ensemble: recommend-articles
agents:
- name: search-similar
agent: rag
config:
operation: search
topK: 5
input:
query: ${input.current_article}
filter:
category: ${input.category}
- name: format
operation: code
config:
script: scripts/format-recommendations
input:
results: ${search-similar.output.results}
// scripts/format-recommendations.ts
import type { AgentExecutionContext } from '@ensemble-edge/conductor'
export default function formatRecommendations(context: AgentExecutionContext) {
const { results } = context.input
return {
recommendations: results.map((r: any) => ({
title: r.metadata.title,
url: r.metadata.url,
relevance: r.score
}))
}
}
Limit topK for Speed
topK: 5 # Usually sufficient for most use cases
Use Metadata Filters
# Reduce search space dramatically
filter:
category: ${input.category}
published_after: "2024-01-01"
Partition with Namespaces
# Smaller search space = faster queries
namespace: ${input.tenant_id}
Cache Aggressively
cache:
ttl: 3600
key: search-${hash(input.query)}-${input.namespace}
Limitations
- Max document size: 8000 tokens per chunk
- Max topK: 100 results
- Metadata size: 10KB per document
- Namespace limit: 1000 per account
- Embedding model: Currently limited to Cloudflare AI models
Manual RAG vs AutoRAG
| Feature | Manual RAG (This Agent) | AutoRAG (Starter Kit) |
|---|
| Control | Full control over indexing & search | Fully managed |
| Document Processing | Manual chunking & indexing | Automatic file processing |
| Use Case | Custom pipelines | Quick deployment |
| Configuration | Low-level operations | High-level automation |
| Best For | Advanced users | Getting started |
For most use cases, consider starting with AutoRAG and migrate to manual RAG when you need fine-grained control.
Next Steps