Scrape Member

Overview

The scrape built-in member performs intelligent web scraping with a three-tier fallback strategy: fast (Cloudflare Browser Rendering), slow (Browserless), and HTML parsing.

- member: scrape-page
  type: Function
  config:
    builtin: scrape
    url: https://example.com
    format: markdown

Configuration

url

string

required

URL to scrape

format

string

default:"markdown"

Output format: markdown, html, text

strategy

string

default:"auto"

Strategy: auto, fast, slow, html

selector

string

CSS selector for content extraction

waitFor

string

Wait for selector before scraping

timeout

number

default:"30000"

Timeout in milliseconds

Strategies

Auto (Default)

Tries strategies in order until one succeeds:

Fast - Cloudflare Browser Rendering (fastest)
Slow - Browserless with full browser (slower but reliable)
HTML - Direct HTML parsing (fallback)

Fast

Uses Cloudflare Browser Rendering API:

- member: fast-scrape
  type: Function
  config:
    builtin: scrape
    url: ${input.url}
    strategy: fast

Slow

Uses Browserless for JavaScript-heavy sites:

- member: slow-scrape
  type: Function
  config:
    builtin: scrape
    url: ${input.url}
    strategy: slow
    waitFor: '.content-loaded'

HTML

Direct HTML parsing (no JavaScript):

- member: html-scrape
  type: Function
  config:
    builtin: scrape
    url: ${input.url}
    strategy: html
    selector: 'article.content'

Output Formats

Markdown

Clean markdown output:

- member: to-markdown
  type: Function
  config:
    builtin: scrape
    url: ${input.url}
    format: markdown

Output:

# Page Title

Content paragraph...

## Section Heading

More content...

HTML

Preserved HTML structure:

- member: to-html
  type: Function
  config:
    builtin: scrape
    url: ${input.url}
    format: html

Text

Plain text only:

- member: to-text
  type: Function
  config:
    builtin: scrape
    url: ${input.url}
    format: text

Examples

Basic Scraping

flow:
  - member: scrape-article
    type: Function
    config:
      builtin: scrape
      url: https://blog.example.com/article-123
      format: markdown

output:
  content: ${scrape-article.output.content}
  title: ${scrape-article.output.title}

With Selector

- member: scrape-content
  type: Function
  config:
    builtin: scrape
    url: ${input.url}
    selector: 'main.article-content'
    format: markdown

Wait for Dynamic Content

- member: scrape-spa
  type: Function
  config:
    builtin: scrape
    url: ${input.url}
    strategy: slow
    waitFor: '[data-content-loaded="true"]'
    timeout: 60000

Batch Scraping

flow:
  - foreach: ${input.urls}
    as: url
    do:
      - member: scrape-page
        type: Function
        config:
          builtin: scrape
          url: ${url}
          format: markdown
    cache:
      enabled: true
      ttl: 3600000  # Cache 1 hour
      key: ${url}

Output

interface ScrapeOutput {
  content: string;
  title?: string;
  strategy: 'fast' | 'slow' | 'html';
  duration: number;
  cached?: boolean;
}

Error Handling

- member: safe-scrape
  type: Function
  config:
    builtin: scrape
    url: ${input.url}
  retry:
    maxAttempts: 3
    backoff: exponential

Error codes:

SCRAPE_TIMEOUT - Exceeded timeout
SCRAPE_FAILED - All strategies failed
INVALID_URL - Malformed URL
NETWORK_ERROR - Connection failed

Best Practices

Use auto strategy - Let it choose best method
Cache results - Avoid redundant scraping
Set reasonable timeouts - Prevent hanging
Handle errors - Sites may be down
Respect robots.txt - Be a good citizen
Rate limit - Don’t overwhelm servers
Use selectors - Extract specific content
Test with various sites - Different structures

Built-In Overview

All built-in members

Validate Member

Validate scraped content

Web Scraping Guide

Web scraping best practices

Conductor API

Core Classes

Member Types API

Built-In Members API

SDK API

Testing API

Durable Objects API

AI Providers API

HTTP API

Overview

Configuration

Strategies

Auto (Default)

Fast

Slow

HTML

Output Formats

Markdown

HTML

Text

Examples

Basic Scraping

With Selector

Wait for Dynamic Content

Batch Scraping

Output

Error Handling

Best Practices

Built-In Overview

Validate Member

Web Scraping Guide

Conductor API

Core Classes

Member Types API

Built-In Members API

SDK API

Testing API

Durable Objects API

AI Providers API

HTTP API

​Overview

​Configuration

​Strategies

​Auto (Default)

​Fast

​Slow

​HTML

​Output Formats

​Markdown

​HTML

​Text

​Examples

​Basic Scraping

​With Selector

​Wait for Dynamic Content

​Batch Scraping

​Output

​Error Handling

​Best Practices

​Related Documentation

Built-In Overview

Validate Member

Web Scraping Guide

Overview

Configuration

Strategies

Auto (Default)

Fast

Slow

HTML

Output Formats

Markdown

HTML

Text

Examples

Basic Scraping

With Selector

Wait for Dynamic Content

Batch Scraping

Output

Error Handling

Best Practices

Related Documentation