Batch Scrape

Batch scraping multiple URLs

You can now batch scrape multiple URLs at the same time. It takes the starting URLs and optional parameters as arguments. The params argument allows you to specify additional options for the batch scrape job, such as the output formats.

How it works

It is very similar to how the /crawl endpoint works. You can either start the batch and wait for completion, or start it and handle completion yourself.

batchScrape (JS) / batch_scrape (Python): starts a batch job and waits for it to complete, returning the results.
startBatchScrape (JS) / start_batch_scrape (Python): starts a batch job and returns the job ID so you can poll or use webhooks.

Usage

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

start = firecrawl.start_batch_scrape([
    "https://firecrawl.dev",
    "https://docs.firecrawl.dev",
], formats=["markdown"])  # returns id

job = firecrawl.batch_scrape([
    "https://firecrawl.dev",
    "https://docs.firecrawl.dev",
], formats=["markdown"], poll_interval=2, wait_timeout=120)

print(job.status, job.completed, job.total)

Response

Calling batchScrape/batch_scrape returns the full results when the batch completes.

Completed

{
  "status": "completed",
  "total": 36,
  "completed": 36,
  "creditsUsed": 36,
  "expiresAt": "2024-00-00T00:00:00.000Z",
  "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
  "data": [
    {
      "markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...",
      "html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...",
      "metadata": {
        "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
        "language": "en",
        "sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
        "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
        "ogLocaleAlternate": [],
        "statusCode": 200
      }
    },
    ...
  ]
}

Calling startBatchScrape/start_batch_scrape returns a job ID you can track via getBatchScrapeStatus/get_batch_scrape_status, using the API endpoint /batch/scrape/{id}, or webhooks. Job results are available via the API for 24 hours after completion. After this period, you can still view your batch scrape history and results in the activity logs.

{
  "success": true,
  "id": "123-456-789",
  "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}

Batch scrape with structured extraction

You can also use the batch scrape endpoint to extract structured data from the pages. This is useful if you want to get the same structured data from a list of URLs.

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")

# Scrape multiple websites:
batch_scrape_result = firecrawl.batch_scrape(
    ['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], 
    formats=[{
        'type': 'json',
        'prompt': 'Extract the title and description from the page.',
        'schema': {
            'type': 'object',
            'properties': {
                'title': {'type': 'string'},
                'description': {'type': 'string'}
            },
            'required': ['title', 'description']
        }
    }]
)
print(batch_scrape_result)

# Or, you can use the start method:
batch_scrape_job = firecrawl.start_batch_scrape(
    ['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], 
    formats=[{
        'type': 'json',
        'prompt': 'Extract the title and description from the page.',
        'schema': {
            'type': 'object',
            'properties': {
                'title': {'type': 'string'},
                'description': {'type': 'string'}
            },
            'required': ['title', 'description']
        }
    }]
)
print(batch_scrape_job)

# You can then use the job ID to check the status of the batch scrape:
batch_scrape_status = firecrawl.get_batch_scrape_status(batch_scrape_job.id)
print(batch_scrape_status)

Response

batchScrape/batch_scrape returns full results:

Completed

{
  "status": "completed",
  "total": 36,
  "completed": 36,
  "creditsUsed": 36,
  "expiresAt": "2024-00-00T00:00:00.000Z",
  "next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
  "data": [
    {
      "json": {
        "title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
        "description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot."
      }
    },
    ...
  ]
}

startBatchScrape/start_batch_scrape returns a job ID:

{
  "success": true,
  "id": "123-456-789",
  "url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}

Batch scrape with webhooks

You can configure webhooks to receive real-time notifications as each URL in your batch is scraped. This allows you to process results immediately instead of waiting for the entire batch to complete.

cURL

curl -X POST https://api.firecrawl.dev/v2/batch/scrape \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    -d '{
      "urls": [
        "https://example.com/page1",
        "https://example.com/page2",
        "https://example.com/page3"
      ],
      "webhook": {
        "url": "https://your-domain.com/webhook",
        "metadata": {
          "any_key": "any_value"
        },
        "events": ["started", "page", "completed"]
      }
    }' 

Quick Reference

Event Types:

batch_scrape.started - When the batch scrape begins
batch_scrape.page - For each URL successfully scraped
batch_scrape.completed - When all URLs are processed
batch_scrape.failed - If the batch scrape encounters an error

Basic Payload:

{
  "success": true,
  "type": "batch_scrape.page",
  "id": "batch-job-id",
  "data": [...], // Page data for 'page' events
  "metadata": {}, // Your custom metadata
  "error": null
}

Security: Verifying Webhook Signatures

Every webhook request from Firecrawl includes an X-Firecrawl-Signature header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and hasn’t been tampered with. How it works:

Get your webhook secret from the Advanced tab of your account settings
Extract the signature from the X-Firecrawl-Signature header
Compute HMAC-SHA256 of the raw request body using your secret
Compare with the signature header using a timing-safe function

Never process a webhook without verifying its signature first. The X-Firecrawl-Signature header contains the signature in the format: sha256=abc123def456...

For complete implementation examples in JavaScript and Python, see the Webhook Security documentation.

Full Documentation

For comprehensive webhook documentation including detailed event payloads, advanced configuration, and troubleshooting, see the Webhooks documentation.

Get Started

New Features

Standard Features

Webhooks

Developer Guides

Use Cases

Contributing

Batch scraping multiple URLs

How it works

Usage

Response

Batch scrape with structured extraction

Response

Batch scrape with webhooks

Quick Reference

Security: Verifying Webhook Signatures

Full Documentation

Get Started

New Features

Standard Features

Webhooks

Developer Guides

Use Cases

Contributing

​Batch scraping multiple URLs

​How it works

​Usage

​Response

​Batch scrape with structured extraction

​Response

​Batch scrape with webhooks

​Quick Reference

​Security: Verifying Webhook Signatures

​Full Documentation

Batch scraping multiple URLs

How it works

Usage

Response

Batch scrape with structured extraction

Response

Batch scrape with webhooks

Quick Reference

Security: Verifying Webhook Signatures

Full Documentation