How to use 📚
To help you get started with the new DataPipeline API endpoints, please review the following examples. These will guide you through the process of creating, managing, and integrating your projects and jobs programmatically, ensuring a seamless transition to this powerful new feature.
Choose the endpoint for your desired action
Add the necessary parameters
Table of contents 📖
Create a New Project
Example:
curl -X POST --data '{ "projectInput": {"type": "list", "list":
["https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"] } }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects?api_key=xxxxxx'
Output:
{
"id": 522,
"name": "Project created at 2024-05-10T16:04:28.263Z",
"schedulingEnabled": true,
"scrapingInterval": "weekly",
"createdAt": "2024-05-10T16:04:28.306Z",
"projectType": "urls",
"projectInput": {
"type": "list",
"list": [
"https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
]
},
"projectOutput": {
"type": "save"
},
"notificationConfig": {
"notifyOnSuccess": "never",
"notifyOnFailure": "with_every_run"
}
}
Except for id
and the createdAt
fields, everything can be specified at the project creation.
Specifying Parameters
Lets see how sending a request looks like when sending parameters along
Example:
curl -X POST --data '{ "projectInput": {"type": "list", "list":
["https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"] },
"apiParams": {"premium": true} }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects/?api_key=xxxxxx>'
Output:
{
"id": 522,
"name": "Project created at 2024-05-10T16:04:28.263Z",
"schedulingEnabled": true,
"scrapingInterval": "weekly",
"createdAt": "2024-05-10T16:04:28.306Z",
"scheduledAt": "2024-05-12T19:00:00.211Z",
"projectType": "urls",
"apiParams": {
"premium": "true"
},
"projectInput": {
"type": "list",
"list": [
"https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
]
},
"projectOutput": {
"type": "save"
},
"notificationConfig": {
"notifyOnSuccess": "never",
"notifyOnFailure": "with_every_run"
}
}
Updating Parameters Of An Existing Project
Example:
curl -X PATCH --data '{ "apiParams": {"premium": true} }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects/><projectId>/?api_key=xxxxxx'
Output:
{
"id": 522,
"name": "Project created at 2024-05-10T16:04:28.263Z",
"schedulingEnabled": true,
"scrapingInterval": "weekly",
"createdAt": "2024-05-10T16:04:28.306Z",
"scheduledAt": "2024-05-12T11:12:17.211Z",
"projectType": "urls",
"apiParams": {
"premium": "true"
},
"projectInput": {
"type": "list",
"list": [
"https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
]
},
"projectOutput": {
"type": "save"
},
"notificationConfig": {
"notifyOnSuccess": "never",
"notifyOnFailure": "with_every_run"
}
}
Specifying Project Type
Similarly to the DataPipeline GUI, you can specify a project type (e.g. Google Search) here as well. If not specified, projectType
will be defaulted to URLs (for simple URL scraping).
Example:
curl -X POST --data '{ "name": "Google search project",
"projectInput": {"type": "list", "list": ["iPhone", "Android"] },
"projectType": "google_search"}'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects?api_key=xxxxxx'
Output:
{
"id": 525,
"name": "Google search project",
"schedulingEnabled": true,
"schedulingInterval": "weekly",
"createdAt": "2024-05-14T14:34:02.784Z",
"projectType": "google_search",
"projectInput": {
"type": "list",
"list": [
"iPhone",
"Android"
]
},
"notificationConfig": {
"notifyOnSuccess": "never",
"notifyOnFailure": "with_every_run"
}
}
Specifying Project Input
Choose the source for providing lists of URLs, search terms, and other relevant data for your project:
-A simple list (this is the default)
-Webhook Input - With every run the system downloads the contents from the webhook, allowing you to update the URLs list, ASINs, etc. without interacting with ScraperAPI.
Example:
curl -v -X POST --data '{ "projectInput": {"type": "webhook_input", "url": "<https://the.url.where.your.list.is>" } }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects?api_key=xxxxxx'
Output:
{
"id": 524,
"name": "Project created at 2024-05-10T16:04:28.263Z",
"schedulingEnabled": true,
"scrapingInterval": "weekly",
"createdAt": "2024-06-26T12:42:31.654Z",
"scheduledAt": "2024-06-26T12:42:31.652Z",
"projectType": "urls",
"projectInput": {
"url": "https://webhook.site/d10de6ed-68fb-4874-9c0f-3c32bab2e1ef",
"type": "webhook_input"
},
"notificationConfig": {
"notifyOnSuccess": "never",
"notifyOnFailure": "with_every_run"
}
}
Changing Output Format For SDE Project Types
The default output format is JSON, but you can opt for CSV if it better suits your needs for the SDE project types.
Example:
curl -v -X PATCH --data '{ "apiParams": {"outputFormat": "csv"} }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects/<projectId>/?api_key=xxxxxx'
Output:
{
"id": 525,
"name": "Google search project",
"schedulingEnabled": true,
"schedulingInterval": "weekly",
"createdAt": "2024-06-26T10:26:40.840Z",
"projectType": "google_search",
"apiParams": {
"outputFormat": "csv"
},
"projectInput": {
"type": "list",
"list": [
"iPhone",
"Android"
]
},
"notificationConfig": {
"notifyOnSuccess": "never",
"notifyOnFailure": "with_every_run"
}
}
Changing Scheduling Interval
Example:
curl -X PATCH --data '{ "scrapingInterval": "daily" }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects/<projectId>?api_key=xxxxxx'
Output:
{
"id": 522,
"name": "Project created at 2024-05-10T16:04:28.263Z",
"schedulingEnabled": true,
"scrapingInterval": "daily",
"createdAt": "2024-05-10T16:04:28.306Z",
"scheduledAt": "2024-05-10T19:00:00.211Z",
"projectType": "urls",
"projectInput": {
"type": "list",
"list": [
"https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
]
},
"projectOutput": {
"type": "save"
},
"notificationConfig": {
"notifyOnSuccess": "never",
"notifyOnFailure": "with_every_run"
}
}
Creating a Project With a Custom Cron Scheduling
Cron is a time-based job scheduler. It allows users to schedule jobs to run periodically at fixed times, dates, or intervals. A cron expression consists of five fields that define when a job should execute:
Minute (0-59)
Hour (0-23)
Day of the month (1-31)
Month (1-12 or JAN-DEC)
Day of the week (0-7 or SUN-SAT, where 0 and 7 represent Sunday)
For example, a cron expression "0 0 * * 1" means the job will run at midnight (00:00) every Monday (1 represents Monday in cron):
curl -X POST --data '{ "name": "Cron project",
"scheduingEnabled": true,
"projectInput": {"type": "list", "list":
["https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"] },
"scrapingInterval": "0 0 * * 1" }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects?api_key=xxxxxx'
Output:
{
"id": 522,
"name": "Project created at 2024-05-10T16:04:28.306Z",
"schedulingEnabled": true,
"scrapingInterval": "0 0 * * 1",
"createdAt": "2024-05-10T16:04:28.306Z",
"projectType": "urls",
"projectInput": {
"type": "list",
"list": [
"https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
]
},
"notificationConfig": {
"notifyOnSuccess": "never",
"notifyOnFailure": "with_every_run"
}
}
Rescheduling The Project's Next Run Date
This works even with schedulingEnabled
set to false
, ensuring the project runs once.
curl -X PATCH --data '{ "scheduledAt": "2024-05-07T11:12:17.211Z" }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects/><projectID>?api_key=xxxxxx'
curl -v -X PATCH --data '{ "scheduledAt": "now" }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects/><projectID>?api_key=xxxxxx'
Configuring Webhook Output
Set the webhook output
Example:
curl -X POST --data '{ "projectInput": {"type": "list", "list":
["https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"] },
"webhookOutput": {"url": "https://webhook.site/b58e4798-0bf8-44c8-ac2b-f416fffddd39",
"webhookEncoding": "multipart_form_data_encoding"} }'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects/?api_key=xxxxxx'
Output:
{
"id": 522,
"name": "Project created at 2024-05-10T16:04:28.263Z",
"schedulingEnabled": true,
"schedulingInterval": "weekly",
"createdAt": "2024-05-10T16:04:28.306Z",
"projectType": "urls",
"projectInput": {
"type": "list",
"list": [
"https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
]
},
"projectOutput": {
"type": "webhook"
},
"notificationConfig": {
"notifyOnSuccess": "never",
"notifyOnFailure": "with_every_run"
}
}
Unset the webhook output
curl -X PATCH --data '{"webhookOutput": null}'
-H 'content-type: application/json'
'https://datapipeline.scraperapi.com/api/projects/><projectId>/?api_key=xxxxxx'
Configuring Notifications
Define when you would like to receive email notifications for completed jobs. This can be done at project level creation or eddited later on. Both notifyOnSuccess and notifyOnFailure have to be specified.
Example (on project creation):
curl -X POST --data '{ "name": "Google search project",
"projectInput": {"type": "list", "list": ["iPhone", "Android"] },
"projectType": "google_search",
"notificationConfig": {"notifyOnSuccess": "weekly", "notifyOnFailure": "weekly"}}'
-H 'Content-Type: application/json'
'https://datapipeline.scraperapi.com/api/projects/?api_key=xxxxxx'
Output:
{
"id": 524,
"name": "Google search project",
"schedulingEnabled": true,
"scrapingInterval": "weekly",
"createdAt": "2024-05-10T21:04:28.306Z",
"scheduledAt": "2024-05-10T21:04:28.306Z",
"projectType": "google_search",
"projectInput": {
"type": "list",
"list": [
"iPhone",
"Android"
]
},
"notificationConfig": {
"notifyOnSuccess": "weekly",
"notifyOnFailure": "weekly"
}
}
Example (updating existing project):
curl -X PATCH --data '{ "name": "Google search project",
"projectInput": {"type": "list", "list": ["iPhone", "Android"] },
"projectType": "google_search",
"notificationConfig": {"notifyOnSuccess": "weekly", "notifyOnFailure": "weekly"}}'
-H 'Content-Type: application/json'
'https://datapipeline.scraperapi.com/api/projects/524/?api_key=xxxxxx'
Output:
{
"id": 524,
"name": "Google search project",
"schedulingEnabled": true,
"scrapingInterval": "weekly",
"createdAt": "2024-05-10T21:04:28.306Z",
"scheduledAt": "2024-05-10T21:04:28.306Z",
"projectType": "google_search",
"projectInput": {
"type": "list",
"list": [
"iPhone",
"Android"
]
},
"notificationConfig": {
"notifyOnSuccess": "weekly",
"notifyOnFailure": "weekly"
}
}
Check The List of Jobs For a Project
You can list project's jobs with this endpoint:
curl https://datapipeline.scraperapi.com/api/projects/<projectId>/jobs?api_key=xxxxxx
Cancel a Job Within a Project
Canceling a job within a project allows you to manage projects effectively by halting specific jobs that are no longer needed.
Example:
curl -X DELETE https://datapipeline.scraperapi.com/api/projects/522/jobs/6f933838-ec57-43cd-8e49-d8652716ddf8?api_key=xxxxx
Once you delete the job, the system will respond back with ok
if the command was successful
If we now lookup that same project, we'll see the job status is cancelled
:
{
"id": "6f933838-ec57-43cd-8e49-d8652716ddf8",
"projectId": 522,
"status": "cancelled",
"createdAt": "2024-05-10T16:04:28.306Z",
"finishedAt": "2024-05-10T20:04:28.306Z",
"successfulTasks": 0,
"failedTasks": 0,
"inProgressTasks": 0,
"inProgressTasksWithAttempts": 0,
"cancelledTasks": 0,
"credits": 0,
"resultUrl": "https://datapipeline.scraperapi.com/jobs/6f933838-ec57-43cd-8e49-d8652716ddf8/result",
"errorReportUrl": "https://datapipeline.scraperapi.com/jobs/6f933838-ec57-43cd-8e49-d8652716ddf8/errorreport"
}
Archiving a Project
To archive an existing project, just send the following request:
Example:
curl -X DELETE https://datapipeline.scraperapi.com/api/projects/525?api_key=xxxxxxxx
When you try to look up that project now, the system will return Project not found
Project not found
These changes are also visible in the Dashboard under the DataPipeline Projects list.
Before:

After:

Error Messages
// Spelling mistakes in body:
"name": "BAD_REQUEST",
"message": "Invalid project configuration: /projectType: must be equal to one of the allowed values",
"status": 400,
"errors": []
// Project does not exist:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>NOT_FOUND: Project 624 of user 231516 is not found<br> at new Exception (/app/node_modules/@tsed/exceptions/lib/cjs/core/Exception.js:55:15)<br> at new ClientException (/app/node_modules/@tsed/exceptions/lib/cjs/core/ClientException.js:8:9)<br> at new NotFound (/app/node_modules/@tsed/exceptions/lib/cjs/clientErrors/NotFound.js:8:9)<br> at HostedScraperExceptionFilter.catch (/app/build/filters/HostedScraperException.filter.js:16:19)<br> at PlatformExceptions.catch (/app/node_modules/@tsed/platform-exceptions/lib/cjs/services/PlatformExceptions.js:35:48)<br> at /app/node_modules/@tsed/platform-express/lib/cjs/components/PlatformExpress.js:74:56<br> at Layer.handle_error (/app/node_modules/express/lib/router/layer.js:71:5)<br> at trim_prefix (/app/node_modules/express/lib/router/index.js:326:13)<br> at /app/node_modules/express/lib/router/index.js:286:9<br> at Function.process_params (/app/node_modules/express/lib/router/index.js:346:12)</pre>
</body>
</html>
// Job not found:
{
"name": "NOT_FOUND",
"message": "Resource \"/api/projects/524/jobs/a4ed0080-1083-4889-9ad0-adee0503e542\" not found",
"status": 404,
"errors": []
}
Last updated
Was this helpful?