Starter Kit - Ships with your template. You own it - modify freely.
Overview
The Robots.txt ensemble generates a standards-compliant robots.txt file for controlling search engine crawler behavior. It provides a flexible, configurable approach to managing bot access with support for:
Block all crawlers or allow with exceptions
Custom path restrictions (e.g., /api/*, /admin/*)
Crawl delay configuration
Sitemap reference
CDN/browser caching (24-hour cache by default)
The ensemble serves robots.txt at the /robots.txt endpoint with proper HTTP cache headers for optimal performance.
Endpoint
Response Type: text/plain
Cache Headers:
Cache-Control: public, max-age=86400, stale-while-revalidate=3600
24-hour cache duration
1-hour stale-while-revalidate window
Configuration Options
disallowAll
Type: boolean
Default: false
Description: When true, blocks all search engine crawlers from accessing any part of your site.
Generated Output:
User-agent: *
Disallow: /
disallowPaths
Type: array of strings
Default: ["/api/*", "/admin/*", "/_*"]
Description: List of path patterns to disallow. Supports wildcards (*). Only applied when disallowAll is false.
input :
disallowPaths :
- /api/*
- /admin/*
- /private/*
- /_*
Generated Output:
User-agent: *
Allow: /
Disallow: /api/*
Disallow: /admin/*
Disallow: /private/*
Disallow: /_*
crawlDelay
Type: number (seconds)
Default: null (no delay)
Description: Requests crawlers to wait this many seconds between successive requests. Helps reduce server load.
Generated Output:
User-agent: *
Allow: /
Crawl-delay: 10
sitemap
Type: string (URL)
Default: https://example.com/sitemap.xml
Description: URL to your XML sitemap. Search engines use this to discover all pages on your site.
input :
sitemap : https://mysite.com/sitemap.xml
Generated Output:
User-agent: *
Allow: /
Sitemap: https://mysite.com/sitemap.xml
Customization Examples
Example 1: Development Site (Block All)
Block all crawlers during development or staging:
name : robots
description : Robots.txt for search engine crawlers
trigger :
- type : http
path : /robots.txt
methods : [ GET ]
public : true
httpCache :
public : true
maxAge : 86400
staleWhileRevalidate : 3600
agents :
- name : generate-robots
operation : html
config :
templateEngine : liquid
contentType : text/plain
template : |
User-agent: *
Disallow: /
flow :
- agent : generate-robots
input :
disallowAll : true # Block everything
input :
disallowAll :
type : boolean
required : false
default : true # Changed to true
Example 2: Production Site with Protected Paths
Allow crawlers but protect sensitive paths:
input :
disallowAll :
type : boolean
required : false
default : false
disallowPaths :
type : array
required : false
default :
- /api/*
- /admin/*
- /dashboard/*
- /auth/*
- /_*
crawlDelay :
type : number
required : false
default : 2 # 2 second delay
sitemap :
type : string
required : false
default : https://yoursite.com/sitemap.xml
Example 3: Public Site with Minimal Restrictions
Allow most content, only block internal paths:
input :
disallowAll :
type : boolean
required : false
default : false
disallowPaths :
type : array
required : false
default :
- /_* # Only block internal paths
crawlDelay :
type : number
required : false
default : null # No delay
sitemap :
type : string
required : false
default : https://yoursite.com/sitemap.xml
Example 4: Aggressive Crawler Throttling
Slow down aggressive bots:
input :
disallowAll :
type : boolean
required : false
default : false
disallowPaths :
type : array
required : false
default :
- /api/*
- /admin/*
crawlDelay :
type : number
required : false
default : 30 # 30 second delay between requests
sitemap :
type : string
required : false
default : https://yoursite.com/sitemap.xml
Full Ensemble YAML
name : robots
description : Robots.txt for search engine crawlers
trigger :
- type : http
path : /robots.txt
methods : [ GET ]
public : true
# HTTP cache headers for CDN/browser caching
# robots.txt rarely changes - cache for 24 hours
httpCache :
public : true
maxAge : 86400 # 24 hours
staleWhileRevalidate : 3600 # Serve stale for 1 hour while revalidating
responses :
html :
enabled : false
json :
enabled : false
agents :
- name : generate-robots
operation : html
config :
templateEngine : liquid
contentType : text/plain
template : |
User-agent: *
{% if disallowAll %}
Disallow: /
{% else %}
Allow: /
{% if disallowPaths %}
{% for path in disallowPaths %}
Disallow: {{path}}
{% endfor %}
{% endif %}
{% endif %}
{% if crawlDelay %}
Crawl-delay: {{crawlDelay}}
{% endif %}
{% if sitemap %}
Sitemap: {{sitemap}}
{% endif %}
flow :
- agent : generate-robots
input :
disallowAll : ${input.disallowAll}
disallowPaths : ${input.disallowPaths}
crawlDelay : ${input.crawlDelay}
sitemap : ${input.sitemap}
# Default configuration
input :
disallowAll :
type : boolean
required : false
default : false
disallowPaths :
type : array
required : false
default :
- /api/*
- /admin/*
- /_*
crawlDelay :
type : number
required : false
default : null
sitemap :
type : string
required : false
default : https://example.com/sitemap.xml
output :
robots : ${generate-robots.output}
Testing Your Configuration
Test Locally
# Start local dev server
ensemble conductor dev
# Visit http://localhost:8787/robots.txt
curl http://localhost:8787/robots.txt
Validate with Google
After deploying, use Google’s Robots Testing Tool to validate your robots.txt configuration.
Common Scenarios
Scenario 1: Verify API paths are blocked
curl https://yoursite.com/robots.txt | grep "Disallow: /api/"
Scenario 2: Check sitemap reference
curl https://yoursite.com/robots.txt | grep "Sitemap:"
Scenario 3: Verify cache headers
curl -I https://yoursite.com/robots.txt | grep -i cache-control
Best Practices
Update the sitemap URL : Replace https://example.com/sitemap.xml with your actual sitemap URL
Review default disallow paths : Customize the disallowPaths array to match your site structure
Consider crawl delay : Set crawlDelay only if experiencing high bot traffic
Test before deploying : Always test changes locally first
Monitor crawler behavior : Use Google Search Console to track how bots interact with your site
Keep it simple : Only disallow what’s necessary; over-blocking can hurt SEO
Sitemap Generator Generate XML sitemaps for search engines to discover your content