Learn how to make requests using ScraperAPI. Sign up for a free trial to get 5,000 free API credits.
Using ScraperAPI is easy. Just send the URL you would like to scrape to the API along with your API key and the API will return the HTML response from the URL you want to scrape.
ScraperAPI uses API keys to authenticate requests. To use the API you need to sign up for an account and include your unique API key in every request.
If you haven’t signed up for an account yet, then sign up for a free trial here with 5,000 free API credits!
You can use the API to scrape web pages, API endpoints, images, documents, PDFs, or other files just as you would any other URL. Note: there is a 2MB limit per request.
There are five ways in which you can send GET requests to ScraperAPI:
Via our Async Scraper service http://async.scraperapi.com
Via our API endpoint http://api.scraperapi.com?
Via one of our SDKs (only available for some programming languages)
Via our proxy port http://scraperapi:APIKEY@proxy-server.scraperapi.com:8001
Via our Structured Data service https://api.scraperapi.com/structured/
Choose whichever option best suits your scraping requirements.
Important note: regardless of how you invoke the service, we highly recommend you set a 70 seconds timeout in your application to get the best possible success rates, especially for some hard-to-scrape domains.
ScraperAPI exposes a single API endpoint for you to send GET requests. Simply send a GET request to http://api.scraperapi.com
with two query string parameters and the API will return the HTML response for that URL:
api_key
which contains your API key, and
url
which contains the url you would like to scrape
You should format your requests to the API endpoint as follows:
To enable other API functionality when sending a request to the API endpoint simply add the appropriate query parameters to the end of the ScraperAPI URL.
For example, if you want to enable Javascript rendering with a request, then add render=true
to the request:
To use two or more parameters, simply separate them with the “&” sign.
Loading...
To ensure a higher level of successful requests when using our scraper, we’ve built a new product, Async Scraper. Rather than making requests to our endpoint waiting for the response, this endpoint submits a job of scraping, in which you can later collect the data from using our status endpoint.
Scraping websites can be a difficult process; it takes numerous steps and significant effort to get through some sites’ protection which sometimes proves to be difficult with the timeout constraints of synchronous APIs. The Async Scraper will work on your requested URLs until we have achieved a 100% success rate (when applicable), returning the data to you.
Async Scraping is the recommended way to scrape pages when success rate on difficult sites is more important to you than response time (e.g. you need a set of data periodically).
Submit an async job
A simple example showing how to submit a job for scraping and receive a status endpoint URL through which you can poll for the status (and later the result) of your scraping job:
You can also send POST requests to the Async scraper by using the parameter “method”: “POST”. Here is an example on how to make a POST request to the Async scraper:
Response:
Note the statusUrl
field in the response. That is a personal URL to retrieve the status and results of your scraping job. Invoking that endpoint provides you with the status first:
Response:
You can include a meta
object in your request to store custom data (your own request ID for example), which will be returned in the response as well.
Once your job is finished, the response will change and will contain the results of your scraping job:
Please note that the response for an Async job is stored for up to 72 hours (24hrs guaranteed) or until you retrieve the results, whichever comes first. If you do not get the response in due time, it will be deleted from our side and you will have to send another request for the same job.
If callbacks are used and the results are successfully delivered, we automatically delete the results.
Using a status URL is a great way to test the API or get started quickly, but some customer environments may require some more robust solutions, so we implemented callbacks. Currently only webhook
callbacks are supported but we are planning to introduce more over time (e.g. direct database callbacks, AWS S3, etc).
An example of using a webhook callback:
Using a callback you don’t need to use the status URL (although you still can) to fetch the status and results of the job. Once the job is finished the provided webhook URL will be invoked by our system with the same content as the status URL provides.
Just replace the https://yourcompany.com/scraperapi
URL with your preferred endpoint. You can even add basic auth to the URL in the following format: https://user:pass@yourcompany.com/scraperapi
By default, we'll call the webhook URL you provide for successful requests. If you'd like to receive data on failed attempts too, you will have to include the expectsUnsuccessReport: true
parameter in your request structure.
An example of using callbacks that report on the failed attempts as well:
Response:
Note: The system will try to invoke the webhook URL 3 times, then it cancels the job. So please make sure that the webhook URL is available through the public internet and will be capable of handling the traffic that you need.
Hint: Webhook.site is a free online service to test webhooks without requiring you to build a complex infrastructure.
You can use the usual API parameters just the same way you’d use it with our synchronous API. These parameters should go into an apiParams
object inside the POST data, e.g:
We have created a separate endpoint that accepts an array of URLs instead of just one to initiate scraping of multiple URLs at the same time: https://async.scraperapi.com/batchjobs. The API is almost the same as the single endpoint, but we expect an array of strings in the urls field instead of a string in url.
As a response you’ll also get an array of the same response that you get using our single job endpoint:
We recommend sending a maximum of 50,000 URLs in one batch job.
The responses returned by the Async API for binary requests require you to decode the data as they are encoded using Base64
encoding. This allows the binary data to be sent as a text string, which can then be decoded back into its original form when you want to use it.
Example request:
Decode response:
To make it even easier to get structured content, we created custom endpoints within our API that provide a shorthand method of retrieving content from supported domains. This method is ideal for users that need to receive structured data in JSON or CSV.
Amazon Endpoints:
Ebay Endpoints:
Google Endpoints:
Redfin Endpoints:
Walmart Endpoints:
This endpoint will retrieve product data from an Amazon product page and transform it into usable JSON. It also provides links to all variants of the product (if any).
API_KEY
(required)
User's normal API Key
ASIN
(required)
= Amazon Standard Identification Number. Please not that ASIN's are market specific (TLD). You can usually find the ASINs in the URL of an Amazon product (example: B07FTKQ97Q)
TLD
Amazon market to be scraped.
Valid values include:
com (amazon.com)
co.uk (amazon.co.uk)
ca (amazon.ca)
de (amazon.de)
es (amazon.es)
fr (amazon.fr)
it (amazon.it)
co.jp (amazon.co.jp)
in (amazon.in)
cn (amazon.cn)
com.sg (amazon.com.sg)
com.mx (amazon.com.mx)
ae (amazon.ae)
com.br (amazon.com.br)
nl (amazon.nl)
com.au (amazon.com.au)
com.tr (amazon.com.tr)
sa (amazon.sa)
se (amazon.se)
pl (amazon.pl)
COUNTRY
Valid values are two letter country codes for which we offer Geo Targeting (e.g. “au”, “es”, “it”, etc.).
Where an amazon domain needs to be scraped from another country (e.g. scraping amazon.com from Canada to get Canadian shipping information), both TLD and COUNTRY parameters must be specified.
OUTPUT_FORMAT
For structured data methods we offer CSV and JSON output. JSON is default if parameter is not added. Options:
csv
json
ZIP Code Targeting
To find out mote about ZIP Code targeting, please follow this link
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...