Skip to main content
Starter Kit - Ships with your template. You own it - modify freely.

Overview

AutoRAG is Cloudflare’s completely managed RAG (Retrieval-Augmented Generation) service that provides zero-configuration document retrieval with automatic R2 bucket integration. Unlike the built-in RAG agent which requires manual vector operations, AutoRAG handles everything automatically:
  • Automatic document ingestion from R2 buckets
  • Automatic chunking with configurable size and overlap
  • Automatic embedding generation via Workers AI
  • Automatic indexing in Vectorize
  • Continuous monitoring and updates
  • Multi-format support: PDFs, images, text, HTML, CSV, and more
This is the easiest way to do RAG on Cloudflare - just point to an R2 bucket and start querying!

AutoRAG vs Built-in RAG

FeatureAutoRAGBuilt-in RAG
SetupZero-config (point to R2)Manual vector operations
IngestionAutomatic from R2Manual via API
ChunkingAutomaticManual
EmbeddingsAutomaticManual generation
MonitoringBuilt-inDIY
UpdatesContinuousManual re-indexing
Use CaseDocument libraries, knowledge basesCustom workflows, fine-grained control
ConfigurationInstance name onlyFull vector operations
Choose AutoRAG when:
  • You want zero-config RAG
  • Documents are in R2 buckets
  • You need automatic updates
  • Simplicity is priority
Choose Built-in RAG when:
  • You need custom vector operations
  • You want fine-grained control
  • Documents come from multiple sources
  • You need custom chunking logic

Prerequisites

Before using the AutoRAG agent, you must set up an AutoRAG instance in the Cloudflare dashboard:
  1. Go to Cloudflare Dashboard → Workers & Pages → AutoRAG
  2. Create a new AutoRAG instance
  3. Connect it to your R2 bucket
  4. Configure chunking settings (optional)
  5. Add the instance to your wrangler.toml:
[[autorag]]
binding = "MY_AUTORAG"
instance_name = "my-autorag-instance"

Quick Start

Basic Usage (Answer Mode)

flow:
  - name: search-docs
    agent: autorag
    input:
      query: "What is the refund policy?"
    config:
      instance: "my-autorag"
      mode: answer
      topK: 5

Search-Only Mode

flow:
  - name: search-docs
    agent: autorag
    input:
      query: "pricing information"
    config:
      instance: "my-autorag"
      mode: results
      topK: 10

Input Schema

FieldTypeRequiredDescription
querystringYesQuery text to search for
topKintegerNoOverride number of results (optional)

Example Input

{
  "query": "What are the system requirements?",
  "topK": 5
}

Output Schema

The output format depends on the mode configuration:

Answer Mode (mode: answer)

FieldTypeDescription
answerstringAI-generated answer grounded in documents
sourcesarraySource documents used for answer
sources[].contentstringDocument content
sources[].scorenumberRelevance score (0-1)
sources[].metadataobjectDocument metadata
sources[].idstringDocument ID
querystringOriginal query
countintegerNumber of sources

Results Mode (mode: results)

FieldTypeDescription
resultsarrayRaw search results
results[].contentstringDocument content
results[].scorenumberRelevance score (0-1)
results[].metadataobjectDocument metadata
results[].idstringDocument ID
contextstringCombined context string for LLM use
countintegerNumber of results
querystringOriginal query

Configuration

Required Configuration

FieldTypeRequiredDescription
instancestringYesAutoRAG instance name (configured in wrangler.toml)

Optional Configuration

FieldTypeDefaultDescription
modestringanswerReturn format: answer (AI-generated) or results (raw search)
topKinteger-Number of results to retrieve
rewriteQuerybooleanfalseEnable query rewriting for better retrieval

Mode Options

answer mode:
  • Returns AI-generated response grounded in documents
  • Best for end-user Q&A
  • Includes source citations
  • Uses LLM to synthesize answer
results mode:
  • Returns raw search results without generation
  • Best for custom processing
  • Includes context string for LLM pipelines
  • No LLM cost for retrieval

Configuration Example

config:
  instance: "my-autorag"
  mode: answer
  topK: 5
  rewriteQuery: true

Examples

Example 1: AI-Generated Answer

Get an AI-generated answer grounded in your documents.
flow:
  - name: answer-question
    agent: autorag
    input:
      query: "What is the company's refund policy?"
    config:
      instance: "my-autorag"
      mode: answer
      topK: 5
Output:
{
  "answer": "Based on the documentation, the refund policy allows returns within 30 days of purchase for a full refund. Items must be in original condition with tags attached. Refunds are processed within 5-7 business days.",
  "sources": [
    {
      "content": "Refund Policy: Customers may return items within 30 days...",
      "score": 0.92,
      "id": "doc-123",
      "metadata": {
        "file": "policies.pdf",
        "page": 5
      }
    }
  ],
  "query": "What is the company's refund policy?",
  "count": 1
}

Example 2: Raw Search Results

Get raw search results for custom processing.
flow:
  - name: search-pricing
    agent: autorag
    input:
      query: "pricing tiers"
    config:
      instance: "my-autorag"
      mode: results
      topK: 10

  - name: custom-processing
    agent: process-results
    input:
      results: ${search-pricing.output.results}
Output:
{
  "results": [
    {
      "content": "Enterprise tier: $500/month for unlimited users...",
      "score": 0.88,
      "id": "pricing-doc",
      "metadata": {
        "file": "pricing.pdf"
      }
    }
  ],
  "context": "[1] Source: pricing-doc\nEnterprise tier: $500/month...",
  "count": 10,
  "query": "pricing tiers"
}

Example 3: Query Rewriting

Enable query rewriting for better retrieval with conversational queries.
flow:
  - name: search-with-rewrite
    agent: autorag
    input:
      query: "how much does it cost?"
    config:
      instance: "my-autorag"
      mode: answer
      topK: 5
      rewriteQuery: true
AutoRAG will rewrite “how much does it cost?” to “pricing information” for better document matching.

Example 4: Dynamic Top-K

Override the number of results at runtime.
flow:
  - name: flexible-search
    agent: autorag
    input:
      query: ${input.query}
      topK: ${input.resultCount}
    config:
      instance: "my-autorag"
      mode: results

Example 5: RAG Pipeline with Custom Response

Combine AutoRAG results with custom LLM processing.
flow:
  - name: retrieve-context
    agent: autorag
    input:
      query: ${input.question}
    config:
      instance: "my-autorag"
      mode: results
      topK: 5

  - name: generate-answer
    agent: custom-llm
    input:
      question: ${input.question}
      context: ${retrieve-context.output.context}
      sources: ${retrieve-context.output.results}

Example 6: Fallback Chain

Try AutoRAG first, fall back to web search if no results.
flow:
  - name: search-docs
    agent: autorag
    input:
      query: ${input.query}
    config:
      instance: "my-autorag"
      mode: answer
      topK: 3

  - name: web-search
    condition: ${search-docs.output.count === 0}
    agent: web-search
    input:
      query: ${input.query}

output:
  answer: ${search-docs.output.count > 0 ? search-docs.output.answer : web-search.output.answer}
  source: ${search-docs.output.count > 0 ? 'internal' : 'web'}

Best Practices

1. Choose the Right Mode

  • Use answer mode for end-user Q&A
  • Use results mode when building custom pipelines
  • Use results mode to save LLM costs if you don’t need generation

2. Optimize Top-K

  • Start with topK: 5 for most use cases
  • Increase to 10-20 for comprehensive searches
  • Decrease to 1-3 for precise answers
  • Remember: More results = higher latency + cost

3. Enable Query Rewriting Strategically

  • Enable for conversational queries (“how do I…”, “what is…”)
  • Disable for precise searches (product IDs, exact terms)
  • Adds slight latency but improves recall

4. Monitor Source Quality

flow:
  - name: search
    agent: autorag
    input:
      query: ${input.query}
    config:
      instance: "my-autorag"
      mode: answer

  - name: check-quality
    condition: ${search.output.sources[0].score < 0.7}
    agent: log-low-quality
    input:
      query: ${input.query}
      score: ${search.output.sources[0].score}

5. Cache Results

AutoRAG queries can be expensive. Cache when possible:
flow:
  - name: search
    agent: autorag
    input:
      query: ${input.query}
    config:
      instance: "my-autorag"
      mode: answer
    cache:
      ttl: 3600
      key: "autorag-${input.query}"

Troubleshooting

No Results Returned

Problem: count: 0 in output Solutions:
  • Check if R2 bucket has documents
  • Verify AutoRAG instance is processing documents
  • Try broader query terms
  • Enable rewriteQuery: true

Low Relevance Scores

Problem: score < 0.5 for all results Solutions:
  • Improve document quality and formatting
  • Adjust chunking settings in Cloudflare dashboard
  • Rephrase query to match document language
  • Increase topK to get more candidates

Instance Not Found

Problem: “AutoRAG instance not found” Solutions:
  • Verify instance name in wrangler.toml
  • Check binding name matches config
  • Ensure AutoRAG instance is deployed

Slow Queries

Problem: High latency on queries Solutions:
  • Reduce topK value
  • Disable rewriteQuery if not needed
  • Use mode: results instead of answer
  • Add caching for common queries