Callbacks, Errors & Best Practices
Callback Details
The callback webhook will receive updates for each crawled page and a final summary, when the job is complete. Each callback includes:
For individual page results:
jobId: The unique identifier of the crawler job.url: The URL that was crawled.status: The status of the crawl attempt.credits: The number of credits used for this request.currentDepth: The current depth level of the crawl.currentCost: The total cost of the job so far.failReason: (if failed) The reason for the failure.responseStream: The actual content of the crawled page (for successful requests).
Example individual page result:
{
"url": "https://zillow.com/homedetails/55-Eleanor-St-APT-17-Chelsea-MA-02150/63440284_zpid/\\",
"jobId": "3283eab9-f0f9-4f5d-b081-43339643b8c2",
"status": "finished",
"credits": 1,
"attempts": 0,
"startUrl": "https://www.zillow.com/homes/44269_rid/",
"currentCost": 26,
"currentDepth": 2,
"response": {
"statusCode": 200,
"headers": {
"x-powered-by": "Express",
"access-control-allow-origin": "undefined",
"access-control-allow-headers": "Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods": "HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials": "true",
"x-robots-tag": "none",
"content-type": "text/html; charset=utf-8",
"sa-final-url": "https://www.zillow.com/homedetails/5-Lambert-St-5-Roxbury-MA-02119/2069696289_zpid/",
"sa-statuscode": "200",
"sa-credit-cost": "1",
"sa-proxy-hash": "undefined",
"etag": "W/\"fe810-G9Mn1GIfG51ph9p18EVHwluw8Xo\"",
"vary": "Accept-Encoding",
"date": "Thu, 31 Jul 2025 14:00:57 GMT",
"connection": "keep-alive",
"keep-alive": "timeout=5",
"transfer-encoding": "chunked"
},
"body":...........
"credits": 1
}
}For the final job summary:
jobId: The unique identifier of the crawler jobjobState: The final state of the jobjobCost: The total cost of the jobcompleted: Array of successfully crawled URLs with their individual costscancelled: Array of cancelled URLsfailed: Array of failed URLs with their failure reasonscrawlBudget: The total crawl budget that was set
Example final job summary result:
{
"jobId": "e25c4cf9-b521-4f97-8e0d-ff2220756a76",
"jobState": "finished",
"jobCost": 42,
"completed": [
{
"url": "https://www.zillow.com/homedetails/1-E-Pier-13th-ST-DOCK-B-Boston-MA-02129/2055047115_zpid/",
"cost": 1
},
{
"url": "https://www.zillow.com/homes/44269_rid/",
"cost": 1
},
{
"url": "https://www.zillow.com/homedetails/17-19-Beech-Glen-St-Roxbury-MA-02119/452661181_zpid/",
"cost": 1
},
{
"url": "https://www.zillow.com/homedetails/230-232-Washington-St-2-Boston-MA-02108/2084793714_zpid/",
"cost": 1
},
...
],
"cancelled": [],
"failed": [
{
"url": "<https://www.zillow.com/homedetails/49-Prince-St-2-Boston-MA-02130/59131187_zpid/>",
"failReason": "<ERROR MESSAGE>"
},
{
"url": "<https://www.zillow.com/homedetails/55-Devon-St-APT-6-Dorchester-MA-02121/71497287_zpid/>",
"failReason": "<ERROR MESSAGE>"
}],
"crawlBudget": 55
}Response Details
Each successful crawl response includes:
HTTPstatus code.Response headers.
Response body.
Number of credits used.
Error Handling
The crawler implements several error handling mechanisms:
Failed Requests:
Failed requests
don't costany API credits.Failed requests are included in the final summary with their failure reasons.
The crawler moves forward with other URLs even if some requests fail.
Duplicate URLs:
The crawler automatically detects and skips duplicate URLs. This helps prevent unnecessary credit usage and infinite loops.
Budget Exceeded:
If a single URL's cost exceeds the crawl budget, the job will fail to start.
If the cumulative cost exceeds the budget during crawling, the job will stop gracefully.
A final summary will be sent with all successfully crawled URLs.
Webhook Failures:
The system will retry failed webhook deliveries.
Webhook timeouts are handled gracefully.
Failed webhook deliveries don't affect the crawling process.
Best Practices
URL Regex Pattern:
Make sure your regex pattern is specific enough to only match the URLs you want to crawl.
Consider extracting both full URLs and relative URLs.
Test your regex pattern before starting a large crawl job with tools like regex101.
Crawl Budget vs Max Depth:
Use
crawl_budgetwhen you want to control costs.Use
max_depthwhen you want to control how deep the crawler goes.Consider using both to ensure you don't exceed your budget while maintaining depth control.
API Parameters:
Use
country_codeto specify the country for the requests.Add any other ScraperAPI parameters as needed in the
api_paramsobject.
Callback Webhook:
Ensure your webhook endpoint can handle the expected load.
Implement proper error handling for failed requests.
Store the results as they come in, as the final summary might be delayed.
Handle both individual page results and the final summary.
Implement proper timeout handling (default timeout is 2 hours).
Last updated
Was this helpful?

