# ScraperAPI LlamaIndex Integration

With the `llama-index-tools-scraperapi` package, you can equip your agents with the ability to:

* Scrape any public webpage
* Extract structured data from leading platforms
* Power intelligent, data-rich pipelines

All through a **single, unified tool interface**.

Supported platforms include **Amazon, Google, eBay, Walmart, and Redfin** — making it easier than ever to build agents that interact with real-world data at scale.<br>

### What’s New in 1.0.0

Version 1.0.0 marks the first production-ready release of the integration, introducing a suite of purpose-built tools:

#### Available Tools

* **`scrape`**\
  Fetch and return fully rendered HTML or text from any public URL.
* **Amazon**\
  Extract rich product data including title, price, ratings, availability, and ASIN.
* **Google**\
  Retrieve structured search results with titles, links, and snippets.
* **eBay**\
  Access listing data such as item names, pricing, condition, and seller info.
* **Walmart**\
  Pull structured product data from search results and product pages.
* **Redfin**\
  Gather real estate insights including property details, pricing, and listing status.

### Package & Resources

* **PyPI:** `llama-index-tools-scraperapi`
* **GitHub:** github.com/scraperapi/llama-index-tools-scraperapi
* **MCP Server:** <https://mcp.scraperapi.com> (optional if you prefer a hosted approach - no installation needed)<br>

### Quick Start

#### Installation

```
pip install llama-index-tools-scraperapi
```

#### Basic Usage

Spin up a LlamaIndex agent with ScraperAPI tools in just a few lines:

```
import asyncio, os
from llama_index.tools.scraperapi import ScraperAPIToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

async def main():
    scraper_tool = ScraperAPIToolSpec(
        api_key=os.environ["SCRAPERAPI_API_KEY"],
    )

    agent = FunctionAgent(
        tools=scraper_tool.to_tool_list(),
        llm=OpenAI(model="gpt-4.1"),
    )

    response = await agent.run(
        "Scrape https://example.com and summarize the content"
    )

    print(response)

asyncio.run(main())
```

***

### MCP Server Integration (No Package Required)

Prefer a hosted approach? You can connect directly to ScraperAPI via the **remote MCP server**, no additional package needed.

#### Prerequisites

```
pip install llama-index llama-index-tools-mcp llama-index-llms-openai
```

#### Example

```
import asyncio, os
from llama_index.core.agent import FunctionAgent, AgentWorkflow
from llama_index.llms.openai import OpenAI
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec

async def main():
    scraperapi_key = os.environ["SCRAPERAPI_KEY"]
    mcp_url = os.environ.get("MCP_URL", "https://mcp.scraperapi.com/mcp")

    mcp_client = BasicMCPClient(
        mcp_url,
        headers={"Authorization": f"Bearer {scraperapi_key}"},
    )

    mcp_tool_spec = McpToolSpec(client=mcp_client)
    tools = await mcp_tool_spec.to_tool_list_async()

    agent = FunctionAgent(
        name="scraper_agent",
        tools=tools,
        llm=OpenAI(model="gpt-4o")
    )

    workflow = AgentWorkflow(agents=[agent])

    response = await workflow.run(
        user_msg="Look up the Amazon product page for ASIN B0FFMRH228"
    )

    print(response)

asyncio.run(main())
```

***

### Requirements

* ScraperAPI API key (`SCRAPERAPI_API_KEY` or `SCRAPERAPI_KEY`)
* Python 3.8+
* LLM provider API key (e.g., `OPENAI_API_KEY`)

### Why This Matters

This integration unlocks a key capability for modern AI systems:\
**reliable, structured access to live web data.**

Whether you're building research agents, e-commerce intelligence tools, or real estate pipelines, ScraperAPI + LlamaIndex gives your models the context they need to act intelligently in the real world.

<a href="https://docs.scraperapi.com/integrations/llm-integrations/llamaindex-integration" class="button primary" data-icon="file-lines">Llamaindex integration docs</a>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.scraperapi.com/resources/release-notes/april-2026/scraperapi-llamaindex-integration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
