How to use

Learn Async scraping with ScraperAPI in Python: job submission, status polling, and result retrieval. Includes POST requests, metadata, and response handling.

A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:

import requests

r = requests.post(url = 'https://async.scraperapi.com/jobs', json={ 'apiKey': 'xxxxxx', 'url': 'https://example.com' })
print(r.text)

You can also send POST requests to the Async scraper by using the parameter “method”: “POST”. Here is an example on how to make a POST request to the Async scraper:

import requests

r = requests.post(url = 'https://async.scraperapi.com/jobs', json={ 'apiKey': 'xxxxxx', 'url': 'https://example.com', 'method': 'POST', 'body': 'var1=value1&var2=value2' })
print(r.text)

Response:

{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}

Note the statusUrl field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:

import requests

r = requests.get(url = 'https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953')
print(r.text)

Response:

{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}

You can include a meta object in your request to store custom data (your own request ID for example), which will be returned in the response as well.

Once your job is finished, the response will change and will contain the results of your scraping job:

{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"finished",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com",
"response":{
"headers":{
"date":"Thu, 14 Apr 2022 11:10:44 GMT",
"content-type":"text/html; charset=utf-8",
"content-length":"1256",
"connection":"close",
"x-powered-by":"Express",
"access-control-allow-origin":"undefined","access-control-allow-headers":"Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods":"HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials":"true",
"x-robots-tag":"none",
"sa-final-url":"https://example.com/",
"sa-statuscode":"200",
"etag":"W/\"4e8-Sjzo7hHgkd15I/TYxuW15B7HwEc\"",
"vary":"Accept-Encoding"
},
"body":"<!doctype html>\n<html>\n<head>\n	<title>Example Domain</title>\n\n	<meta charset=\"utf-8\" />\n	<meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n	<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n	<style type=\"text/css\">\n	body {\n		background-color: #f0f0f2;\n		margin: 0;\n		padding: 0;\n		font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n		\n	}\n	div {\n		width: 600px;\n		margin: 5em auto;\n		padding: 2em;\n		background-color: #fdfdff;\n		border-radius: 0.5em;\n		box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n	}\n	a:link, a:visited {\n		color: #38488f;\n		text-decoration: none;\n	}\n	@media (max-width: 700px) {\n		div {\n			margin: 0 auto;\n			width: auto;\n		}\n	}\n	</style>	\n</head>\n\n<body>\n<div>\n	<h1>Example Domain</h1>\n	<p>This domain is for use in illustrative examples in documents. You may use this\n	domain in literature without prior coordination or asking for permission.</p>\n	<p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"statusCode":200
}
}

Please note that the response for an Async job is stored for up to 72 hours (24hrs guaranteed) or until you retrieve the results, whichever comes first. If you do not get the response in due time, it will be deleted from our side and you will have to send another request for the same job.

If callbacks are used and the results are successfully delivered, we automatically delete the results.

Last updated 9 months ago

Was this helpful?