Crawler UI
Crawler Workspace
You can access the Crawler UI from the navigation menu on the left after logging in with your account.

This section is the starting point for creating a new crawler, and it’s also where all your active and inactive crawlers will be listed. You can quickly review each crawler’s details and status, making it easy to track what’s running and manage what needs attention.

Crawler Setup
To launch a new Crawler, start by selecting the "New Crawler" button in the Crawler Workspace. This will redirect you to the Crawler Setup page, where you specify the crawler’s parameters and behavior.
At the top of the setup page, specify the URL to Crawl, which defines the starting point for your crawler and give it a name (optional).

Crawler Options
The Crawler Options view gives you full control over how your crawler behaves. Here you can set the link depth, define which paths to include and/or exclude, choose a callback URL, set cost limits, specify how often the crawler should run, and decide the output format. This section is where you shape the actual crawling logic.

Link depth
REQUIRED
Controls how deep the crawler goes. Deeper crawls cover more pages but use more credits.
Include path
REQUIRED
Regex pattern for URLs to include. The crawler will only follow URLs that match. Use .* to crawl all pages on the site. You can then use tools like regex101 for debugging.
Exclude path
OPTIONAL
A regex pattern for URLs the crawler should skip. Any URL matching this pattern won’t be crawled.
Callback URL
OPTIONAL
Specify a webhook URL that should be called after the crawler completes. The crawler will send the results to this endpoint.
Maximum cost in credits
REQUIRED
Sets a maximum number of credits the crawler may use. The job stops once this limit is reached to avoid unexpected costs.
Crawling frequency
REQUIRED
Defines the crawler’s run schedule. Available options:
- Run crawler job once (now) - Crawler scheduling disabled (crawler will not run, just the crawler config will be created) - Hourly - Daily - Weekly - Monthly
Output format
REQUIRED
Choose how results are returned. Raw HTML is the default. JSON/CSV are available for Amazon, Google, eBay, Redfin, and Walmart. Markdown/Text are LLM-friendly and ideal for model training.
Notification Preferences
This section allows you to choose how frequently you want to be notified when a crawler run completes. You can opt out entirely (Never), receive an alert after each job (With every run) or get Daily/Weekly summaries that aggregate multiple runs into a single email.

Advanced Options
The Advanced Options section exposes request-level parameters available in the API. Here you can disable redirect-following, enable retries for 404 responses (if the target domain is known for throwing 'fake' 404s), or turn on JavaScript rendering for JS-heavy domains. Advanced bypassing adds extra anti-bot bypass logic (use only if necessary). You can also route traffic through residential or mobile IPs, specify a session for session-based crawling, set a country_code for geo-targeting. Device Type allows you to simulate different device types (Desktop or Mobile).

Crawler Job Details
When you open an existing crawler from the Crawler Workspace, you’re taken to a page that shows an overview of the crawler along with its recent activity and configuration. At the top, you'll see the latest Status, Created time, Avg. cost per job, and Total credits spent. The table below shows all jobs associated with this crawler. It includes the job ID, last run timestamp, status (Done, Running or Failed) and the target domain. You can sort the table or use the search bar to find specific jobs.

Clicking on a Job ID takes you to a results page that shows an overview of the crawl. At the very top there's the job's status, completion time and the crawl summary (URLs crawled, Credits used and cancelled or failed items). Underneath that, is a list of every URL processed in that job, including information on when it was crawled and its status.

Note: Crawled results are stored for up to 7 days. For scheduled crawls, new results replace the previously stored ones.
Last updated
Was this helpful?

