How to use

Submit an async job

A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:

try {
URL asyncApiUrl = new URL("https://async.scraperapi.com/jobs");
HttpURLConnection connection = (HttpURLConnection) asyncApiUrl.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/json");
connection.setDoOutput(true);
try (OutputStream outputStream = connection.getOutputStream()) {
outputStream.write("{\"apiKey\": \"xxxxxx\", \"url\": \"https://example.com\"}".getBytes(StandardCharsets.UTF_8));
}

int responseCode = connection.getResponseCode();

if (responseCode == HttpURLConnection.HTTP_OK) {
try (BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()))) {
String readLine = null;
StringBuffer response = new StringBuffer();

while ((readLine = in.readLine()) != null) {
	response.append(readLine);
}

System.out.println(response);
}
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}

Response:

{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}

Note the statusUrl field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:

try {
URL asyncApiUrl = new URL("https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953");
HttpURLConnection connection = (HttpURLConnection) asyncApiUrl.openConnection();
connection.setRequestMethod("GET");

int responseCode = connection.getResponseCode();

if (responseCode == HttpURLConnection.HTTP_OK) {
try (BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()))) {
String readLine = null;
StringBuffer response = new StringBuffer();

while ((readLine = in.readLine()) != null) {
	response.append(readLine);
}

System.out.println(response);
}
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}

Response:

{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"running",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com"
}

Once your job is finished, the response will change and will contain the results of your scraping job:

{
"id":"0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"status":"finished",
"statusUrl":"https://async.scraperapi.com/jobs/0962a8e0-5f1a-4e14-bf8c-5efcc18f0953",
"url":"https://example.com",
"response":{
"headers":{
"date":"Thu, 14 Apr 2022 11:10:44 GMT",
"content-type":"text/html; charset=utf-8",
"content-length":"1256",
"connection":"close",
"x-powered-by":"Express",
"access-control-allow-origin":"undefined","access-control-allow-headers":"Origin, X-Requested-With, Content-Type, Accept",
"access-control-allow-methods":"HEAD,GET,POST,DELETE,OPTIONS,PUT",
"access-control-allow-credentials":"true",
"x-robots-tag":"none",
"sa-final-url":"https://example.com/",
"sa-statuscode":"200",
"etag":"W/\"4e8-Sjzo7hHgkd15I/TYxuW15B7HwEc\"",
"vary":"Accept-Encoding"
},
"body":"<!doctype html>\n<html>\n<head>\n	<title>Example Domain</title>\n\n	<meta charset=\"utf-8\" />\n	<meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n	<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n	<style type=\"text/css\">\n	body {\n		background-color: #f0f0f2;\n		margin: 0;\n		padding: 0;\n		font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n		\n	}\n	div {\n		width: 600px;\n		margin: 5em auto;\n		padding: 2em;\n		background-color: #fdfdff;\n		border-radius: 0.5em;\n		box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n	}\n	a:link, a:visited {\n		color: #38488f;\n		text-decoration: none;\n	}\n	@media (max-width: 700px) {\n		div {\n			margin: 0 auto;\n			width: auto;\n		}\n	}\n	</style>	\n</head>\n\n<body>\n<div>\n	<h1>Example Domain</h1>\n	<p>This domain is for use in illustrative examples in documents. You may use this\n	domain in literature without prior coordination or asking for permission.</p>\n	<p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
"statusCode":200
}
}

Please note that the response of an Async job is stored for up to 4 hours or until you retrieve the results, whichever comes first. If you do not get the response in due time, it will be deleted from our side and you will have to send another request for the same job.

Last updated