Batch scraping multiple URLs
You can now batch scrape multiple URLs at the same time. It takes the starting URLs and optional parameters as arguments. The params argument allows you to specify additional options for the batch scrape job, such as the output formats.
How it works
It is very similar to how the /crawl endpoint works. You can either start the batch and wait for completion, or start it and handle completion yourself.
batchScrape (JS) / batch_scrape (Python): starts a batch job and waits for it to complete, returning the results.
startBatchScrape (JS) / start_batch_scrape (Python): starts a batch job and returns the job ID so you can poll or use webhooks.
Usage
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
start = firecrawl.start_batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"]) # returns id
job = firecrawl.batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"], poll_interval=2, wait_timeout=120)
print(job.status, job.completed, job.total)
Response
Calling batchScrape/batch_scrape returns the full results when the batch completes.
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "<!DOCTYPE html><html lang=\"en\" class=\"js-focus-visible lg:[--scroll-mt:9.5rem]\" data-js-focus-visible=\"\">...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
Calling startBatchScrape/start_batch_scrape returns
a job ID you can track via getBatchScrapeStatus/get_batch_scrape_status, using
the API endpoint /batch/scrape/{id}, or webhooks. Job results are available via the API for 24 hours after completion. After this period, you can still view your batch scrape history and results in the activity logs.
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
You can also use the batch scrape endpoint to extract structured data from the pages. This is useful if you want to get the same structured data from a list of URLs.
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Scrape multiple websites:
batch_scrape_result = firecrawl.batch_scrape(
['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'],
formats=[{
'type': 'json',
'prompt': 'Extract the title and description from the page.',
'schema': {
'type': 'object',
'properties': {
'title': {'type': 'string'},
'description': {'type': 'string'}
},
'required': ['title', 'description']
}
}]
)
print(batch_scrape_result)
# Or, you can use the start method:
batch_scrape_job = firecrawl.start_batch_scrape(
['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'],
formats=[{
'type': 'json',
'prompt': 'Extract the title and description from the page.',
'schema': {
'type': 'object',
'properties': {
'title': {'type': 'string'},
'description': {'type': 'string'}
},
'required': ['title', 'description']
}
}]
)
print(batch_scrape_job)
# You can then use the job ID to check the status of the batch scrape:
batch_scrape_status = firecrawl.get_batch_scrape_status(batch_scrape_job.id)
print(batch_scrape_status)
Response
batchScrape/batch_scrape returns full results:
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"json": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot."
}
},
...
]
}
startBatchScrape/start_batch_scrape returns a job ID:
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
Batch scrape with webhooks
You can configure webhooks to receive real-time notifications as each URL in your batch is scraped. This allows you to process results immediately instead of waiting for the entire batch to complete.
curl -X POST https://api.firecrawl.dev/v2/batch/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"webhook": {
"url": "https://your-domain.com/webhook",
"metadata": {
"any_key": "any_value"
},
"events": ["started", "page", "completed"]
}
}'
Quick Reference
Event Types:
batch_scrape.started - When the batch scrape begins
batch_scrape.page - For each URL successfully scraped
batch_scrape.completed - When all URLs are processed
batch_scrape.failed - If the batch scrape encounters an error
Basic Payload:
{
"success": true,
"type": "batch_scrape.page",
"id": "batch-job-id",
"data": [...], // Page data for 'page' events
"metadata": {}, // Your custom metadata
"error": null
}
Security: Verifying Webhook Signatures
Every webhook request from Firecrawl includes an X-Firecrawl-Signature header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and hasn’t been tampered with.
How it works:
- Get your webhook secret from the Advanced tab of your account settings
- Extract the signature from the
X-Firecrawl-Signature header
- Compute HMAC-SHA256 of the raw request body using your secret
- Compare with the signature header using a timing-safe function
Never process a webhook without verifying its signature first. The X-Firecrawl-Signature header contains the signature in the format: sha256=abc123def456...
For complete implementation examples in JavaScript and Python, see the Webhook Security documentation.
Full Documentation
For comprehensive webhook documentation including detailed event payloads, advanced configuration, and troubleshooting, see the Webhooks documentation.