How to use

To help you get started with the DataPipeline API endpoints, please review the following examples. These will guide you through the process of creating, managing and integrating your projects and jobs programmatically, ensuring a seamless access to this powerful feature.

Choose the endpoint for your desired action.
Add the necessary parameters.

Create a New Project

Scheduling is enabled by default and set to a weekly interval. You can disable it by adding schedulingEnabled: false

Example:

curl -X POST --data '{ "projectInput": {"type": "list", "list": 
["https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"] } }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects?api_key=xxxxxx'

Output:

{
    "id": 522,
    "name": "Project created at 2024-05-10T16:04:28.263Z",
    "schedulingEnabled": true,
    "scrapingInterval": "weekly",
    "createdAt": "2024-05-10T16:04:28.306Z",
    "projectType": "urls",
    "projectInput": {
        "type": "list",
        "list": [
            "https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
        ]
    },
    "projectOutput": {
        "type": "save"
    },
    "notificationConfig": {
        "notifyOnSuccess": "never",
        "notifyOnFailure": "with_every_run"
    }
}

Except for id and the createdAt fields, everything can be specified during project creation.

Specifying Parameters

This is what project creation looks like when additional parameters are specified

Example:

curl -X POST --data '{ "projectInput": {"type": "list", "list": 
["https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"] }, 
"apiParams": {"premium": true} }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects/?api_key=xxxxxx>'

Output:

{
    "id": 522,
    "name": "Project created at 2024-05-10T16:04:28.263Z",
    "schedulingEnabled": true,
    "scrapingInterval": "weekly",
    "createdAt": "2024-05-10T16:04:28.306Z",
    "scheduledAt": "2024-05-12T19:00:00.211Z",
    "projectType": "urls",
    "apiParams": {
        "premium": "true"
    },
    "projectInput": {
        "type": "list",
        "list": [
            "https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
        ]
    },
    "projectOutput": {
        "type": "save"
    },
    "notificationConfig": {
        "notifyOnSuccess": "never",
        "notifyOnFailure": "with_every_run"
    }
}

Updating Parameters Of An Existing Project

Example:

curl -X PATCH  --data '{ "apiParams": {"premium": true} }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects/><projectId>/?api_key=xxxxxx'

Output:

{
    "id": 522,
    "name": "Project created at 2024-05-10T16:04:28.263Z",
    "schedulingEnabled": true,
    "scrapingInterval": "weekly",
    "createdAt": "2024-05-10T16:04:28.306Z",
    "scheduledAt": "2024-05-12T11:12:17.211Z",
    "projectType": "urls",
    "apiParams": {
        "premium": "true"
    },
    "projectInput": {
        "type": "list",
        "list": [
            "https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
        ]
    },
    "projectOutput": {
        "type": "save"
    },
    "notificationConfig": {
        "notifyOnSuccess": "never",
        "notifyOnFailure": "with_every_run"
    }
}

Specifying Project Type

Similarly to the DataPipeline GUI, you can specify a project type (e.g. Google Search) here as well. If not specified, projectType will be defaulted to URLs (for simple URL scraping).

Example:

curl -X POST --data '{ "name": "Google search project", 
"projectInput": {"type": "list", "list": ["iPhone", "Android"] }, 
"projectType": "google_search"}' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects?api_key=xxxxxx'

Output:

{
    "id": 525,
    "name": "Google search project",
    "schedulingEnabled": true,
    "schedulingInterval": "weekly",
    "createdAt": "2024-05-14T14:34:02.784Z",
    "projectType": "google_search",
    "projectInput": {
        "type": "list",
        "list": [
            "iPhone",
            "Android"
        ]
    },
    "notificationConfig": {
        "notifyOnSuccess": "never",
        "notifyOnFailure": "with_every_run"
    }
}

Specifying Project Input

Choose the source for providing lists of URLs, search terms, and other relevant data for your project:

-A simple list (this is the default)

-Webhook Input - With every run the system downloads the contents from the webhook, allowing you to update the URLs list, ASINs, etc. without interacting with ScraperAPI.

Example:

curl -v -X POST --data '{ "projectInput": {"type": "webhook_input", "url": "<https://the.url.where.your.list.is>" } }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects?api_key=xxxxxx'

Output:

{
    "id": 524,
    "name": "Project created at 2024-05-10T16:04:28.263Z",
    "schedulingEnabled": true,
    "scrapingInterval": "weekly",
    "createdAt": "2024-06-26T12:42:31.654Z",
    "scheduledAt": "2024-06-26T12:42:31.652Z",
    "projectType": "urls",
    "projectInput": {
        "url": "https://webhook.site/d10de6ed-68fb-4874-9c0f-3c32bab2e1ef",
        "type": "webhook_input"
    },
    "notificationConfig": {
        "notifyOnSuccess": "never",
        "notifyOnFailure": "with_every_run"
    }
}

Changing Output Format For SDE Project Types

The default output format is JSON, but you can opt for CSV if it better suits your needs for the SDE project types.

Example:

curl -v -X PATCH --data '{ "apiParams": {"outputFormat": "csv"} }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects/<projectId>/?api_key=xxxxxx'

Output:

{
    "id": 525,
    "name": "Google search project",
    "schedulingEnabled": true,
    "schedulingInterval": "weekly",
    "createdAt": "2024-06-26T10:26:40.840Z",
    "projectType": "google_search",
    "apiParams": {
        "outputFormat": "csv"
    },
    "projectInput": {
        "type": "list",
        "list": [
            "iPhone",
            "Android"
        ]
    },
    "notificationConfig": {
        "notifyOnSuccess": "never",
        "notifyOnFailure": "with_every_run"
    }
}

Changing Scheduling Interval

Example:

curl -X PATCH  --data '{ "scrapingInterval": "daily" }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects/<projectId>?api_key=xxxxxx'

Output:

{
    "id": 522,
    "name": "Project created at 2024-05-10T16:04:28.263Z",
    "schedulingEnabled": true,
    "scrapingInterval": "daily",
    "createdAt": "2024-05-10T16:04:28.306Z",
    "scheduledAt": "2024-05-10T19:00:00.211Z",
    "projectType": "urls",
    "projectInput": {
        "type": "list",
        "list": [
            "https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
        ]
    },
    "projectOutput": {
        "type": "save"
    },
    "notificationConfig": {
        "notifyOnSuccess": "never",
        "notifyOnFailure": "with_every_run"
    }
}

Creating a Project With a Custom Cron Scheduling

Cron is a time-based job scheduler. It allows users to schedule jobs to run periodically at fixed times, dates, or intervals. A cron expression consists of five fields that define when a job should execute:

Minute (0-59)
Hour (0-23)
Day of the month (1-31)
Month (1-12 or JAN-DEC)
Day of the week (0-7 or SUN-SAT, where 0 and 7 represent Sunday)

For example, a cron expression "0 0 * * 1" means the job will run at midnight (00:00) every Monday (1 represents Monday in cron):

curl -X POST --data '{ "name": "Cron project", 
"scheduingEnabled": true, 
"projectInput": {"type": "list", "list": 
["https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"] }, 
"scrapingInterval": "0 0 * * 1" }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects?api_key=xxxxxx'

Output:

{
    "id": 522,
    "name": "Project created at 2024-05-10T16:04:28.306Z",
    "schedulingEnabled": true,
    "scrapingInterval": "0 0 * * 1",
    "createdAt": "2024-05-10T16:04:28.306Z",
    "projectType": "urls",
    "projectInput": {
        "type": "list",
        "list": [
            "https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
        ]
    },
    "notificationConfig": {
        "notifyOnSuccess": "never",
        "notifyOnFailure": "with_every_run"
    }
}

Rescheduling The Project's Next Run Date

This works even with schedulingEnabled set to false, ensuring the project runs once.

curl -X PATCH --data '{ "scheduledAt": "2024-05-07T11:12:17.211Z" }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects/><projectID>?api_key=xxxxxx'

If you want to schedule the project to run as soon as possibele you can use the now interval.

curl -v -X PATCH  --data '{ "scheduledAt": "now" }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects/><projectID>?api_key=xxxxxx'

Configuring Webhook Output

If you are using webhook.site, it needs the content to be encoded as multipart/form-data so an extra parameter is used here.

Set the webhook output

Example:

curl -X POST  --data '{ "projectInput": {"type": "list", "list": 
["https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"] }, 
"webhookOutput": {"url": "https://webhook.site/b58e4798-0bf8-44c8-ac2b-f416fffddd39", 
"webhookEncoding": "multipart_form_data_encoding"} }' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects/?api_key=xxxxxx'

Output:

{
    "id": 522,
    "name": "Project created at 2024-05-10T16:04:28.263Z",
    "schedulingEnabled": true,
    "schedulingInterval": "weekly",
    "createdAt": "2024-05-10T16:04:28.306Z",
    "projectType": "urls",
    "projectInput": {
        "type": "list",
        "list": [
            "https://www.amazon.com/AmazonBasics-3-Button-Wired-Computer-1-Pack/dp/B005EJH6RW/"
        ]
    },
    "projectOutput": {
        "type": "webhook"
    },
    "notificationConfig": {
        "notifyOnSuccess": "never",
        "notifyOnFailure": "with_every_run"
    }
}

Unset the webhook output

curl -X PATCH --data '{"webhookOutput": null}' 
-H 'content-type: application/json' 
'https://datapipeline.scraperapi.com/api/projects/><projectId>/?api_key=xxxxxx'

Configuring Notifications

Define when you would like to receive email notifications for completed jobs. This can be done at project level creation or eddited later on. Both notifyOnSuccess and notifyOnFailure have to be specified.

Example (on project creation):

curl -X POST --data '{ "name":  "Google search project",  
"projectInput": {"type": "list", "list": ["iPhone", "Android"] }, 
"projectType": "google_search", 
"notificationConfig": {"notifyOnSuccess": "weekly", "notifyOnFailure": "weekly"}}'
-H 'Content-Type: application/json'
'https://datapipeline.scraperapi.com/api/projects/?api_key=xxxxxx'

Output:

{
    "id": 524,
    "name": "Google search project",
    "schedulingEnabled": true,
    "scrapingInterval": "weekly",
    "createdAt": "2024-05-10T21:04:28.306Z",
    "scheduledAt": "2024-05-10T21:04:28.306Z",
    "projectType": "google_search",
    "projectInput": {
        "type": "list",
        "list": [
            "iPhone",
            "Android"
        ]
    },
    "notificationConfig": {
        "notifyOnSuccess": "weekly",
        "notifyOnFailure": "weekly"
    }
}

Example (updating existing project):

curl -X PATCH --data '{ "name":  "Google search project",  
"projectInput": {"type": "list", "list": ["iPhone", "Android"] }, 
"projectType": "google_search", 
"notificationConfig": {"notifyOnSuccess": "weekly", "notifyOnFailure": "weekly"}}'
-H 'Content-Type: application/json'
'https://datapipeline.scraperapi.com/api/projects/524/?api_key=xxxxxx'

Output:

{
    "id": 524,
    "name": "Google search project",
    "schedulingEnabled": true,
    "scrapingInterval": "weekly",
    "createdAt": "2024-05-10T21:04:28.306Z",
    "scheduledAt": "2024-05-10T21:04:28.306Z",
    "projectType": "google_search",
    "projectInput": {
        "type": "list",
        "list": [
            "iPhone",
            "Android"
        ]
    },
    "notificationConfig": {
        "notifyOnSuccess": "weekly",
        "notifyOnFailure": "weekly"
    }
}

Check The List of Jobs For a Project

You can list project's jobs with this endpoint:

curl https://datapipeline.scraperapi.com/api/projects/<projectId>/jobs?api_key=xxxxxx

Cancel a Job Within a Project

Canceling a job within a project allows you to manage projects effectively by halting specific jobs that are no longer needed.

Example:

curl -X DELETE https://datapipeline.scraperapi.com/api/projects/522/jobs/6f933838-ec57-43cd-8e49-d8652716ddf8?api_key=xxxxx

Once you delete the job, the system will respond back with ok if the command was successful

If we now lookup that same project, we'll see the job status is cancelled:

{
    "id": "6f933838-ec57-43cd-8e49-d8652716ddf8",
    "projectId": 522,
    "status": "cancelled",
    "createdAt": "2024-05-10T16:04:28.306Z",
    "finishedAt": "2024-05-10T20:04:28.306Z",
    "successfulTasks": 0,
    "failedTasks": 0,
    "inProgressTasks": 0,
    "inProgressTasksWithAttempts": 0,
    "cancelledTasks": 0,
    "credits": 0,
    "resultUrl": "https://datapipeline.scraperapi.com/jobs/6f933838-ec57-43cd-8e49-d8652716ddf8/result",
    "errorReportUrl": "https://datapipeline.scraperapi.com/jobs/6f933838-ec57-43cd-8e49-d8652716ddf8/errorreport"
}

Archiving a Project

To archive an existing project, just send the following request:

Example:

curl -X DELETE https://datapipeline.scraperapi.com/api/projects/525?api_key=xxxxxxxx

When you try to look up that project now, the system will return Project not found

Project not found

These changes are also visible on the Dashboard, under the DataPipeline Projects list.

Before (7 Projects in total):

After (6 Projects in total):

Error Messages

// Spelling mistakes in body:
    "name": "BAD_REQUEST",
    "message": "Invalid project configuration: /projectType: must be equal to one of the allowed values",
    "status": 400,
    "errors": []

// Project does not exist:
<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="utf-8">
    <title>Error</title>
</head>

<body>
    <pre>NOT_FOUND: Project 624 of user 231516 is not found<br> &nbsp; &nbsp;at new Exception (/app/node_modules/@tsed/exceptions/lib/cjs/core/Exception.js:55:15)<br> &nbsp; &nbsp;at new ClientException (/app/node_modules/@tsed/exceptions/lib/cjs/core/ClientException.js:8:9)<br> &nbsp; &nbsp;at new NotFound (/app/node_modules/@tsed/exceptions/lib/cjs/clientErrors/NotFound.js:8:9)<br> &nbsp; &nbsp;at HostedScraperExceptionFilter.catch (/app/build/filters/HostedScraperException.filter.js:16:19)<br> &nbsp; &nbsp;at PlatformExceptions.catch (/app/node_modules/@tsed/platform-exceptions/lib/cjs/services/PlatformExceptions.js:35:48)<br> &nbsp; &nbsp;at /app/node_modules/@tsed/platform-express/lib/cjs/components/PlatformExpress.js:74:56<br> &nbsp; &nbsp;at Layer.handle_error (/app/node_modules/express/lib/router/layer.js:71:5)<br> &nbsp; &nbsp;at trim_prefix (/app/node_modules/express/lib/router/index.js:326:13)<br> &nbsp; &nbsp;at /app/node_modules/express/lib/router/index.js:286:9<br> &nbsp; &nbsp;at Function.process_params (/app/node_modules/express/lib/router/index.js:346:12)</pre>
</body>

</html>

// Job not found:
{
    "name": "NOT_FOUND",
    "message": "Resource \"/api/projects/524/jobs/a4ed0080-1083-4889-9ad0-adee0503e542\" not found",
    "status": 404,
    "errors": []
}

PreviousEndpoints and Parameters NextJobs & Costs Management

Last updated 1 month ago

hashtagTable of contents

hashtagCreate a New Project

hashtagSpecifying Parameters

hashtagUpdating Parameters Of An Existing Project

hashtagSpecifying Project Type

hashtagSpecifying Project Input

hashtagChanging Output Format For SDE Project Types

hashtagChanging Scheduling Interval

hashtagCreating a Project With a Custom Cron Scheduling

hashtagRescheduling The Project's Next Run Date

hashtagConfiguring Webhook Output

hashtagSet the webhook output

hashtagUnset the webhook output

hashtagConfiguring Notifications

hashtagCheck The List of Jobs For a Project

hashtagCancel a Job Within a Project

hashtagArchiving a Project

hashtagError Messages

Table of contents

Create a New Project

Specifying Parameters

Updating Parameters Of An Existing Project

Specifying Project Type

Specifying Project Input

Changing Output Format For SDE Project Types

Changing Scheduling Interval

Creating a Project With a Custom Cron Scheduling

Rescheduling The Project's Next Run Date

Configuring Webhook Output

Set the webhook output

Unset the webhook output

Configuring Notifications

Check The List of Jobs For a Project

Cancel a Job Within a Project

Archiving a Project

Error Messages