Overview
The scrape built-in member performs intelligent web scraping with a three-tier fallback strategy: fast (Cloudflare Browser Rendering), slow (Browserless), and HTML parsing.
- member: scrape-page
type: Function
config:
builtin: scrape
url: https://example.com
format: markdown
Configuration
Output format: markdown, html, text
Strategy: auto, fast, slow, html
CSS selector for content extraction
Wait for selector before scraping
Strategies
Auto (Default)
Tries strategies in order until one succeeds:
- Fast - Cloudflare Browser Rendering (fastest)
- Slow - Browserless with full browser (slower but reliable)
- HTML - Direct HTML parsing (fallback)
Fast
Uses Cloudflare Browser Rendering API:
- member: fast-scrape
type: Function
config:
builtin: scrape
url: ${input.url}
strategy: fast
Slow
Uses Browserless for JavaScript-heavy sites:
- member: slow-scrape
type: Function
config:
builtin: scrape
url: ${input.url}
strategy: slow
waitFor: '.content-loaded'
HTML
Direct HTML parsing (no JavaScript):
- member: html-scrape
type: Function
config:
builtin: scrape
url: ${input.url}
strategy: html
selector: 'article.content'
Markdown
Clean markdown output:
- member: to-markdown
type: Function
config:
builtin: scrape
url: ${input.url}
format: markdown
Output:
# Page Title
Content paragraph...
## Section Heading
More content...
HTML
Preserved HTML structure:
- member: to-html
type: Function
config:
builtin: scrape
url: ${input.url}
format: html
Text
Plain text only:
- member: to-text
type: Function
config:
builtin: scrape
url: ${input.url}
format: text
Examples
Basic Scraping
flow:
- member: scrape-article
type: Function
config:
builtin: scrape
url: https://blog.example.com/article-123
format: markdown
output:
content: ${scrape-article.output.content}
title: ${scrape-article.output.title}
With Selector
- member: scrape-content
type: Function
config:
builtin: scrape
url: ${input.url}
selector: 'main.article-content'
format: markdown
Wait for Dynamic Content
- member: scrape-spa
type: Function
config:
builtin: scrape
url: ${input.url}
strategy: slow
waitFor: '[data-content-loaded="true"]'
timeout: 60000
Batch Scraping
flow:
- foreach: ${input.urls}
as: url
do:
- member: scrape-page
type: Function
config:
builtin: scrape
url: ${url}
format: markdown
cache:
enabled: true
ttl: 3600000 # Cache 1 hour
key: ${url}
Output
interface ScrapeOutput {
content: string;
title?: string;
strategy: 'fast' | 'slow' | 'html';
duration: number;
cached?: boolean;
}
Error Handling
- member: safe-scrape
type: Function
config:
builtin: scrape
url: ${input.url}
retry:
maxAttempts: 3
backoff: exponential
Error codes:
SCRAPE_TIMEOUT - Exceeded timeout
SCRAPE_FAILED - All strategies failed
INVALID_URL - Malformed URL
NETWORK_ERROR - Connection failed
Best Practices
- Use auto strategy - Let it choose best method
- Cache results - Avoid redundant scraping
- Set reasonable timeouts - Prevent hanging
- Handle errors - Sites may be down
- Respect robots.txt - Be a good citizen
- Rate limit - Don’t overwhelm servers
- Use selectors - Extract specific content
- Test with various sites - Different structures