How to use

Submit an async job

A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:

require 'net/http'
require 'json'

uri = URI('https://async.scraperapi.com/jobs')

website_content = Net::HTTP.post(uri, { "apiKey" => "xxxxxx", "url" => "https://example.com"}.to_json, "Content-Type" => "application/json")
print(website_content.body)

You can also send POST requests to the Async scraper now. To make this possible, you will need to use the parameter “method”: “POST”. Here is an example on how to make a POST request to the Async scraper:

require 'net/http'
require 'json'

uri = URI('https://async.scraperapi.com/jobs')
req = Net::HTTP::Post.new(uri)
req.content_type = 'application/json'

# The object won't be serialized exactly like this
# req.body = '{"apiKey": "xxxxxx", "url": "https://example.com", "method": "POST", "body": "var1=value1&var2=value2"}'
req.body = {
  'apiKey' => 'xxxxxx',
  'url' => 'https://example.com',
  'method' => 'POST',
  'body' => 'var1=value1&var2=value2'
}.to_json

req_options = {
  use_ssl: uri.scheme == 'https'
}
res = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
  http.request(req)
end

Response:

{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}

Note the statusUrl field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:

require 'net/http'
require 'json'

uri = URI('https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953')

website_content = Net::HTTP.get(uri)
print(website_content)

Response:

{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}

Once your job is finished, the response will change and will contain the results of your scraping job:

{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"finished",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com",
"response":{
"headers":{
"date":"Thu, 14 Apr 2022 11:10:44 GMT",
"content-type":"text/html; charset=utf-8",
"content-length":"1256",
"connection":"close",
"x-powered-by":"Express",
"access-control-allow-origin":"undefined","access-control-allow-headers":"Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods":"HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials":"true",
"x-robots-tag":"none",
"sa-final-url":"https://example.com/",
"sa-statuscode":"200",
"etag":"W/\"4e8-Sjzo7hHgkd15I/TYxuW15B7HwEc\"",
"vary":"Accept-Encoding"
},
"body":"<!doctype html>\n<html>\n<head>\n	<title>Example Domain</title>\n\n	<meta charset=\"utf-8\" />\n	<meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n	<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n	<style type=\"text/css\">\n	body {\n		background-color: #f0f0f2;\n		margin: 0;\n		padding: 0;\n		font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n		\n	}\n	div {\n		width: 600px;\n		margin: 5em auto;\n		padding: 2em;\n		background-color: #fdfdff;\n		border-radius: 0.5em;\n		box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n	}\n	a:link, a:visited {\n		color: #38488f;\n		text-decoration: none;\n	}\n	@media (max-width: 700px) {\n		div {\n			margin: 0 auto;\n			width: auto;\n		}\n	}\n	</style>	\n</head>\n\n<body>\n<div>\n	<h1>Example Domain</h1>\n	<p>This domain is for use in illustrative examples in documents. You may use this\n	domain in literature without prior coordination or asking for permission.</p>\n	<p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"statusCode":200
}
}

Please note that the response of an Async job is stored for up to 4 hours or until you retrieve the results, whichever comes first. If you do not get the response in due time, it will be deleted from our side and you will have to send another request for the same job.

Last updated