Skip to main content
Built-in - Framework-level agent. Configure only - cannot modify source.

Overview

The RAG agent provides manual Retrieval-Augmented Generation using Cloudflare AI for embeddings and Cloudflare Vectorize for vector storage. This is a low-level agent for building custom RAG pipelines where you control indexing and search operations.
For fully managed RAG with automatic document processing, see AutoRAG in the Starter Kit.
Key Features:
  • Manual control over indexing and search operations
  • Cloudflare AI embeddings using @cf/baai/bge-base-en-v1.5 (default)
  • Automatic chunking with semantic, fixed, or recursive strategies
  • Optional reranking with cross-encoder models for better relevance
  • Multi-tenant isolation via namespaces
  • Batch processing for efficient large-scale indexing (100 texts per embedding batch, 1000 vectors per upsert)

Required Bindings

Add these to your wrangler.toml:
# Required: Cloudflare AI binding
[ai]
binding = "AI"

# Required: Vectorize index binding
[[vectorize]]
binding = "VECTORIZE"
index_name = "my-rag-index"
Create the Vectorize index:
wrangler vectorize create my-rag-index --dimensions 768 --metric cosine
The default embedding model @cf/baai/bge-base-en-v1.5 produces 768-dimensional vectors. Match your index dimensions accordingly.

Operations

The RAG agent supports two primary operations:

1. Index Operation

Index documents into the vector store with automatic chunking and embedding.
agents:
  - name: index-doc
    agent: rag
    config:
      operation: index
      embeddingModel: "@cf/baai/bge-base-en-v1.5"
      chunkStrategy: semantic
      chunkSize: 512
      overlap: 50
    input:
      content: ${input.document}
      id: ${input.documentId}
      metadata:
        source: ${input.source}
        timestamp: ${input.timestamp}
Configuration:
config:
  operation: index
  embeddingModel: string    # Model (default: @cf/baai/bge-base-en-v1.5)
  chunkStrategy: string     # "semantic" | "fixed" | "recursive" (default: semantic)
  chunkSize: number         # Target chunk size in characters (default: 512)
  overlap: number           # Overlap between chunks (default: 50)
  namespace: string         # Optional namespace for multi-tenant isolation

input:
  content: string           # Text to index (required)
  id: string               # Document ID (required)
  source: string           # Optional source identifier
  metadata: object         # Additional metadata to store
Output:
{
  indexed: number;           // Number of documents indexed
  chunks: number;            // Number of chunks created
  embeddingModel: string;    // Model used for embeddings
  chunkStrategy: string;     // Strategy used for chunking
}

2. Search Operation

Search the vector store using semantic similarity.
agents:
  - name: search
    agent: rag
    config:
      operation: search
      topK: 5
      rerank: true
      rerankModel: "@cf/baai/bge-reranker-base"
    input:
      query: ${input.question}
Configuration:
config:
  operation: search
  embeddingModel: string    # Model (default: @cf/baai/bge-base-en-v1.5)
  topK: number             # Number of results (default: 5)
  rerank: boolean          # Enable cross-encoder reranking (default: false)
  rerankModel: string      # Reranker model (default: @cf/baai/bge-reranker-base)
  namespace: string        # Optional namespace filter

input:
  query: string            # Search query (required)
  filter: object           # Metadata filters
  topK: number            # Override default topK
  rerank: boolean         # Override default rerank setting
Output:
{
  results: Array<{
    id: string;              // Vector ID
    score: number;           // Similarity score (0-1)
    content: string;         // Original content chunk
    metadata: object;        // Stored metadata
  }>;
  count: number;             // Total number of results
  reranked: boolean;         // Whether results were reranked
}

Complete RAG Pipeline

Build a complete question-answering system with manual RAG:
ensemble: rag-qa

agents:
  # 1. Search for relevant docs
  - name: search
    agent: rag
    config:
      operation: search
      topK: 5
      rerank: true
      filter:
        source: documentation
    input:
      query: ${input.question}

  # 2. Generate answer using context
  - name: answer
    operation: think
    config:
      provider: anthropic
      model: claude-sonnet-4
      prompt: |
        Answer this question using ONLY the following context.
        If the answer isn't in the context, say "I don't know."

        Context:
        ${search.output.results.map(r => r.content).join('\n\n')}

        Question: ${input.question}

  # 3. Format response with sources
  - name: format-response
    operation: code
    config:
      script: scripts/format-response
    input:
      answer: ${answer.output}
      results: ${search.output.results}

output:
  response: ${format-response.output}
// scripts/format-response.ts
import type { AgentExecutionContext } from '@ensemble-edge/conductor'

export default function formatResponse(context: AgentExecutionContext) {
  const { answer, results } = context.input
  return {
    answer: answer,
    sources: results.map((r: any) => ({
      text: r.content.substring(0, 200),
      metadata: r.metadata,
      score: r.score
    }))
  }
}

Advanced Patterns

Combine vector search with keyword search for better results:
agents:
  # Vector search
  - name: vector-search
    agent: rag
    config:
      operation: search
      topK: 10
    input:
      query: ${input.question}

  # Keyword search (D1)
  - name: keyword-search
    operation: data
    config:
      backend: d1
      binding: DB
      operation: query
      sql: |
        SELECT * FROM documents
        WHERE content LIKE ?
        LIMIT 10
      params: ['%${input.question}%']

  # Combine and rerank results
  - name: combine
    operation: code
    config:
      script: scripts/combine-results
    input:
      vectorResults: ${vector-search.output.results}
      keywordResults: ${keyword-search.output}

Multi-Query RAG

Generate multiple search queries for better coverage:
agents:
  # Generate multiple search queries
  - name: generate-queries
    operation: think
    config:
      provider: anthropic
      model: claude-sonnet-4
      prompt: |
        Generate 3 different search queries for: ${input.question}
        Return as JSON array.

  # Search with each query
  - name: search-1
    agent: rag
    config:
      operation: search
      topK: 5
    input:
      query: ${generate-queries.output[0]}

  - name: search-2
    agent: rag
    config:
      operation: search
      topK: 5
    input:
      query: ${generate-queries.output[1]}

  - name: search-3
    agent: rag
    config:
      operation: search
      topK: 5
    input:
      query: ${generate-queries.output[2]}

  # Deduplicate and combine
  - name: combine
    operation: code
    config:
      script: scripts/combine-search-results
    input:
      results1: ${search-1.output.results}
      results2: ${search-2.output.results}
      results3: ${search-3.output.results}
// scripts/combine-search-results.ts
import type { AgentExecutionContext } from '@ensemble-edge/conductor'

export default function combineSearchResults(context: AgentExecutionContext) {
  const { results1, results2, results3 } = context.input
  const all = [...results1, ...results2, ...results3]

  // Deduplicate by ID
  const unique = [...new Map(all.map((r: any) => [r.id, r])).values()]

  return { results: unique.slice(0, 10) }
}

Filtered Search with Metadata

Use metadata filters to narrow search scope:
agents:
  - name: search
    agent: rag
    config:
      operation: search
      topK: 5
      namespace: production
    input:
      query: ${input.question}
      filter:
        category: ${input.category}
        published_after: ${input.date}
        author: ${input.author}

Incremental Indexing

Index documents in batches with custom chunking:
agents:
  - name: chunk
    operation: code
    config:
      script: scripts/chunk-document
    input:
      document: ${input.document}
      chunkSize: 1000

  - name: index-chunks
    agent: rag
    config:
      operation: index
      namespace: ${input.tenant_id}
    input:
      content: ${chunk.output.chunks}
      id: ${input.document_id}
      metadata:
        document_id: ${input.document_id}
        chunk_index: ${chunk.output.chunks.index}
// scripts/chunk-document.ts
import type { AgentExecutionContext } from '@ensemble-edge/conductor'

export default function chunkDocument(context: AgentExecutionContext) {
  const { document, chunkSize } = context.input
  const chunks = []

  for (let i = 0; i < document.length; i += chunkSize) {
    chunks.push({
      text: document.substring(i, i + chunkSize),
      index: i / chunkSize
    })
  }

  return { chunks }
}

Best Practices

1. Chunk Documents Intelligently

# Good: Semantic chunks with overlap
chunkStrategy: semantic
chunkSize: 512
overlap: 50

# Avoid: Large chunks without overlap
chunkStrategy: fixed
chunkSize: 5000
overlap: 0

2. Add Rich Metadata

metadata:
  source: url or file path
  title: document title
  author: content creator
  published: ISO date string
  category: classification
  tags: [tag1, tag2]

3. Use Namespaces for Isolation

# Separate by tenant, version, or document type
namespace: tenant-${input.tenant_id}
namespace: docs-v2
namespace: support-tickets

4. Enable Reranking for Quality

config:
  topK: 10              # Get more initial results
  rerank: true          # Re-score with cross-encoder
  rerankModel: "@cf/baai/bge-reranker-base"

5. Cache Search Results

agents:
  - name: search
    agent: rag
    cache:
      ttl: 3600
      key: search-${hash(input.query)}

Common Use Cases

Documentation Q&A

ensemble: docs-qa

agents:
  - name: search-docs
    agent: rag
    config:
      operation: search
      namespace: documentation
      topK: 3
      rerank: true
    input:
      query: ${input.question}

  - name: answer
    operation: think
    config:
      provider: anthropic
      model: claude-sonnet-4
      prompt: |
        Answer using these docs:
        ${search-docs.output.results.map(r => r.content).join('\n\n')}

        Question: ${input.question}

Customer Support Assistant

ensemble: support-assistant

agents:
  - name: search-tickets
    agent: rag
    config:
      operation: search
      topK: 5
      namespace: support-tickets
    input:
      query: ${input.issue}
      filter:
        status: resolved
        sentiment: positive

  - name: suggest-solution
    operation: think
    config:
      provider: anthropic
      model: claude-sonnet-4
      prompt: |
        Suggest a solution based on these similar resolved tickets:
        ${search-tickets.output.results.map(r => r.content).join('\n\n')}

        Current issue: ${input.issue}

Content Recommendations

ensemble: recommend-articles

agents:
  - name: search-similar
    agent: rag
    config:
      operation: search
      topK: 5
    input:
      query: ${input.current_article}
      filter:
        category: ${input.category}

  - name: format
    operation: code
    config:
      script: scripts/format-recommendations
    input:
      results: ${search-similar.output.results}
// scripts/format-recommendations.ts
import type { AgentExecutionContext } from '@ensemble-edge/conductor'

export default function formatRecommendations(context: AgentExecutionContext) {
  const { results } = context.input
  return {
    recommendations: results.map((r: any) => ({
      title: r.metadata.title,
      url: r.metadata.url,
      relevance: r.score
    }))
  }
}

Performance Tips

Limit topK for Speed
topK: 5  # Usually sufficient for most use cases
Use Metadata Filters
# Reduce search space dramatically
filter:
  category: ${input.category}
  published_after: "2024-01-01"
Partition with Namespaces
# Smaller search space = faster queries
namespace: ${input.tenant_id}
Cache Aggressively
cache:
  ttl: 3600
  key: search-${hash(input.query)}-${input.namespace}

Limitations

  • Max document size: 8000 tokens per chunk
  • Max topK: 100 results
  • Metadata size: 10KB per document
  • Namespace limit: 1000 per account
  • Embedding model: Currently limited to Cloudflare AI models

Manual RAG vs AutoRAG

FeatureManual RAG (This Agent)AutoRAG (Starter Kit)
ControlFull control over indexing & searchFully managed
Document ProcessingManual chunking & indexingAutomatic file processing
Use CaseCustom pipelinesQuick deployment
ConfigurationLow-level operationsHigh-level automation
Best ForAdvanced usersGetting started
For most use cases, consider starting with AutoRAG and migrate to manual RAG when you need fine-grained control.

Next Steps