# Crawler UI

### Crawler Workspace

You can access the Crawler UI from the navigation menu on the left after logging in with your account.

<div align="left"><figure><img src="/files/EpfVfPUyWmh2tgAgV26b" alt="" width="173"><figcaption></figcaption></figure></div>

This section is the starting point for creating a new crawler, and it’s also where all your **active** and **inactive** crawlers will be listed. You can quickly review each crawler’s details and status, making it easy to track what’s running and manage what needs attention.

<figure><img src="/files/CTihY6mTMzZPeQTFwmiF" alt=""><figcaption></figcaption></figure>

### Crawler Setup

To launch a new Crawler, start by selecting the "**New Crawler"** button in the Crawler Workspace. This will redirect you to the Crawler Setup page, where you specify the crawler’s parameters and behavior.&#x20;

At the top of the setup page, specify the **URL to Crawl**, which defines the starting point for your crawler and give it a name (optional).

<figure><img src="/files/lPe9sNzokem6bfi13WaD" alt=""><figcaption></figcaption></figure>

#### Crawler Options

The Crawler Options view gives you full control over how your crawler behaves. Here you can set the link depth, define which paths to **include** and/or **exclude**, choose a callback URL, set cost limits, specify how often the crawler should run, and decide the output format. This section is where you shape the actual crawling logic.

<figure><img src="/files/vEVhkhYdyYCWEo9SwNBw" alt=""><figcaption></figcaption></figure>

| Field                       | Field Type | Description                                                                                                                                                                                                                                                                                                                                                                                                  |
| --------------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Link depth**              | REQUIRED   | Controls how deep the crawler goes. On the **Free Plan**, `link depth` is capped at 1 (the seed URL and its direct links). Paid plans can configure deeper crawls up to the supported maximum.                                                                                                                                                                                                               |
| **Include path**            | REQUIRED   | Regex pattern for URLs to include. The crawler will only follow URLs that match. Use `.*` to crawl all pages on the site. You can then use tools like [regex101](https://regex101.com/) for debugging.                                                                                                                                                                                                       |
| **Exclude path**            | OPTIONAL   | A regex pattern for URLs the crawler should skip. Any URL matching this pattern won’t be crawled.                                                                                                                                                                                                                                                                                                            |
| **Callback URL**            | OPTIONAL   | Specify a webhook URL that should be called after the crawler completes. The crawler will send the results to this endpoint.                                                                                                                                                                                                                                                                                 |
| **Maximum cost in credits** | REQUIRED   | Sets a maximum number of credits the crawler may use. The job stops once this limit is reached to avoid unexpected costs.                                                                                                                                                                                                                                                                                    |
| **Crawling frequency**      | REQUIRED   | <p>Defines the crawler’s run schedule. <strong>Available options:</strong></p><p><br>- Run crawler job once (now)<br>- Crawler scheduling disabled (crawler will not run, just the crawler config will be created)<br>- Hourly<br>- Daily<br>- Weekly<br>- Monthly<br><br>On the <strong>Free Plan</strong>, crawlers can be run <strong>once</strong>. Scheduled runs are <strong>unavailable</strong>.</p> |
| **Output format**           | REQUIRED   | Choose how results are returned. **Raw HTML** is the default. JSON/CSV are available for Amazon, Google, eBay, Redfin, and Walmart. `Markdown/Text` are LLM-friendly and ideal for model training.                                                                                                                                                                                                           |

{% hint style="warning" %}
\*\* Free Plan limitations \*\*

* **Link depth:** limited to `1` (seed URL plus direct links).
* **Scheduling:** recurring schedules (hourly/daily/weekly/monthly) are not available.<br>

*Attempting to exceed these limits will return a `403` error.* *Upgrade to a paid plan to run deeper crawls and enable scheduling.*
{% endhint %}

#### Notification Preferences

This section allows you to choose how frequently you want to be notified when a crawler run completes. You can opt out entirely (**Never**), receive an alert after each job (**With every run**) or get **Daily/Weekly** summaries that aggregate multiple runs into a single email.

<figure><img src="/files/tIlpM974Mm47fjb73Dkv" alt=""><figcaption></figcaption></figure>

#### Advanced Options

The **Advanced Options** section exposes request-level parameters available in the API. Here you can disable redirect-following, enable retries for 404 responses (if the target domain is known for throwing 'fake' 404s), or turn on JavaScript rendering for JS-heavy domains. **Advanced bypassing** adds extra anti-bot bypass logic (use only if necessary). You can also route traffic through residential or mobile IPs, specify a session for session-based crawling, set a country\_code for geo-targeting. Device Type allows you to simulate different device types (Desktop or  Mobile).

<figure><img src="/files/sztTOkx85pEAlqF25THP" alt=""><figcaption></figcaption></figure>

### Crawler Job Details

When you open an existing crawler from the Crawler Workspace, you’re taken to a page that shows an overview of the crawler along with its recent activity and configuration. At the top, you'll see the latest **Status**, **Created time**, **Avg. cost per job**, and **Total credits spent.** The table below shows all jobs associated with this crawler. It includes the job ID, last run timestamp, status (Done, Running or Failed) and the target domain. You can sort the table or use the search bar to find specific jobs.

<figure><img src="/files/4Fpnn5guDbVhgHegUrT8" alt=""><figcaption></figcaption></figure>

Clicking on a **Job ID** takes you to a results page that shows an overview of the crawl. At the very top there's the job's status, completion time and the crawl summary (URLs crawled, Credits used and cancelled or failed items). Underneath that, is a list of every URL processed in that job, including information on when it was crawled and its status.

<figure><img src="/files/OTsdUyAQrME7jmBOVwcS" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
**Note:** Crawled results are stored for up to 7 days. For scheduled crawls, new results replace the previously stored ones.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.scraperapi.com/scraperapi-crawler-v2.0/crawler-ui.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
