# LlamaIndex Integration

### Overview

The LlamaIndex Tools integration plugs ScraperAPI directly to LlamaIndex agents, so that LLM pipelines can execute scraping tasks. This integration lets your LlamaIndex agent scrape web pages and pull structured data from Amazon, Google, eBay, Walmart and Redfin.

{% hint style="success" %}
Useful links:

* **PyPI package:** [`llama-index-tools-scraperapi`](https://pypi.org/project/llama-index-tools-scraperapi/)
* **GitHub repo:** <https://github.com/scraperapi/llama-index-tools-scraperapi>
  {% endhint %}

### Installation

```python
pip install llama-index-tools-scraperapi
```

### Using the LlamaIndex ToolSpec

**Prerequisites:**

* ScraperAPI Account
* OpenAI Account (used in the example)
* Python 3.10+ installed

```python
import asyncio
import os
from llama_index.tools.scraperapi import ScraperAPIToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

async def main():
    scraper_tool = ScraperAPIToolSpec(
        api_key=os.environ["SCRAPERAPI_API_KEY"],
    )
    agent = FunctionAgent(
        tools=scraper_tool.to_tool_list(),
        llm=OpenAI(model="gpt-4.1"),
    )
    response = await agent.run(
        "Scrape https://example.com and summarize the content"
    )
    print(response)

asyncio.run(main())
```

**Example Response**

```json
The content of the page at "https://example.com" is a placeholder text indicating that
the domain is intended for use in documentation examples without requiring permission. 
It advises against using it in operational contexts. 
More information can be found at [iana.org/domains/example](https://iana.org/domains/example).
```

### Available Tools

| Tool      | Description                                                                       | Example usage                                                           |
| --------- | --------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
| `scrape`  | Return the raw HTML content of any URL.                                           | `"Scrape https://example.com"`                                          |
| `amazon`  | Extract structured data from an Amazon product page (title, price, images, etc.). | `"Look up the Amazon product page for ASIN B0FFMRH228"`                 |
| `google`  | Perform a Google Search and return organic results (title, link, snippet).        | `"Search Google for 'best web scraping tools 2026'"`                    |
| `ebay`    | Search eBay listings and extract product information.                             | `"Find the top three eBay listings for 'vintage headphones'"`           |
| `walmart` | Extract product data from Walmart pages.                                          | `"Search Walmart for 'wireless headphones' and list the top 3 results"` |
| `redfin`  | Extract real‑estate data from Redfin listings.                                    | `"Get the details of 123 Main St from Redfin"`                          |

{% hint style="warning" %}
Each of these tools uses ScraperAPI on the backend. Requests count toward your ScraperAPI credit pool, and domains with anti-bot protection may consume more API credits.
{% endhint %}

### MCP Server Integration

If you prefer not to install the tool package, you can access the same tools through our MCP server instead. The MCP server exposes these tools over a streamable HTTP API, allowing you to use them without installing an additional Python package.

*Checkout the* [*MCP Server*](https://docs.scraperapi.com/integrations/llm-integrations/mcp-server) *section for more info on the MCP server itself.*

**Prerequisites:**

```python
pip install llama-index llama-index-tools-mcp llama-index-llms-openai
```

**Example**

```python
"""
Example: Using ScraperAPI MCP Server with LlamaIndex

Prerequisites:
    pip install llama-index llama-index-tools-mcp llama-index-llms-openai

Usage:
    export SCRAPERAPI_KEY="your-scraperapi-key"
    export OPENAI_API_KEY="your-openai-api-key"
    python examples/llamaindex_example.py
"""

import asyncio
import os

from llama_index.core.agent import FunctionAgent, AgentWorkflow
from llama_index.llms.openai import OpenAI
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec


async def main():
    scraperapi_key = os.environ["SCRAPERAPI_KEY"]
    mcp_url = os.environ.get("MCP_URL", "https://mcp.scraperapi.com/mcp")

    # Connect to the ScraperAPI MCP server over Streamable HTTP
    mcp_client = BasicMCPClient(
        mcp_url,
        headers={"Authorization": f"Bearer {scraperapi_key}"},
    )

    # Load all MCP tools as LlamaIndex tools
    mcp_tool_spec = McpToolSpec(client=mcp_client)
    tools = await mcp_tool_spec.to_tool_list_async()

    print(f"Loaded {len(tools)} tools:")
    for tool in tools:
        print(f"  - {tool.metadata.name}: {tool.metadata.description[:80]}...")

    # Create an agent with the MCP tools
    llm = OpenAI(model="gpt-4o")
    agent = FunctionAgent(
        name="scraper_agent",
        description="Agent that uses ScraperAPI tools to search and scrape the web",
        tools=tools,
        llm=llm,
    )
    workflow = AgentWorkflow(agents=[agent])

    # Example queries
    queries = [
        "Search Google for 'best web scraping tools 2026' and summarize the top results",
        "Look up the Amazon product page for ASIN B0FFMRH228",
        "Search Walmart for 'wireless headphones' and list the top 3 results with prices",
    ]

    query = queries[1]
    print(f"\n{'='*60}")
    print(f"Query: {query}")
    print(f"{'='*60}\n")

    response = await workflow.run(user_msg=query)
    print(f"\nAgent response:\n{response}")


if __name__ == "__main__":
    asyncio.run(main())
```

**Example Respose**

```
Loaded 22 tools:
  - scrape: Scrape the full content of any web page via ScraperAPI. Returns the page body in...
  - crawler_job_start: Start an asynchronous crawl that discovers and scrapes pages by following links ...
  - crawler_job_status: Check the progress of a running or completed crawler job started with crawler_jo...
  - crawler_job_delete: Cancel a running crawler job and permanently delete its data. This action is irr...
  - google_search: Search Google and get structured results (organic links, snippets, knowledge pan...
  - google_jobs: Search Google Jobs and get structured job listing results (title, company, locat...
  - google_news: Search Google News and get structured news article results (title, source, date,...
  - google_shopping: Search Google Shopping and get structured product results (title, price, seller,...
  - google_maps_search: Search Google Maps and get structured place results (name, address, rating, phon...
  - amazon_product: Get structured details for a single Amazon product by its ASIN (title, price, ra...
  - amazon_search: Search Amazon by keyword and get structured product results (title, ASIN, price,...
  - amazon_offers: Get all seller offers for an Amazon product by ASIN (seller name, price, conditi...
  - walmart_search: Search Walmart by keyword and get structured product results (title, price, rati...
  - walmart_product: Get structured details for a single Walmart product by its product ID (title, pr...
  - walmart_category: Browse products in a Walmart category by its category ID. Returns parsed JSON wi...
  - walmart_review: Get structured customer reviews for a Walmart product by its product ID (rating,...
  - ebay_search: Search eBay by keyword and get structured listing results (title, price, conditi...
  - ebay_product: Get structured details for a single eBay listing by its item ID (title, price, c...
  - redfin_for_sale: Get structured data for a Redfin property listing that is for sale (price, beds,...
  - redfin_for_rent: Get structured data for a Redfin rental property listing (rent price, beds, bath...
  - redfin_search: Search Redfin property listings and get structured results (address, price, beds...
  - redfin_agent: Get structured data for a Redfin real estate agent profile (name, brokerage, rat...

============================================================
Query: Look up the Amazon product page for ASIN B0FFMRH228
============================================================


Agent response:
Here is the information for the Amazon product with ASIN B0FFMRH228:

### Product Details
- **Name:** Vivimeng USA Sweatshirt Hoodie for Men American Flag Sweater Women Casual Long Sleeve Pullover Unisex USA Sweater Hooded
- **Brand:** [Visit the Vivimeng Store](https://www.amazon.com/stores/DailyNewLook/page/E3CC6E5E-0DB5-4E66-B8B9-9E585AB6B159?lp_asin=B0FS1J95KY&ref_=ast_bln&store_ref=bl_ast_dp_brandlogo_sto)
- **Price:** $24.99
- **Department:** Womens
- **Date First Available:** September 22, 2025
- **Best Sellers Rank:**
  - #478,166 in Clothing, Shoes & Jewelry
  - #540 in Women's Novelty Hoodies
  - #7,094 in Men's Novelty Hoodies
  - #304,756 in Women's Fashion
- **Customer Reviews:** 3.8 out of 5 stars (33 ratings)
- **Availability:** Only 2 left in stock - order soon.
- **Ships From:** Amazon
- **Sold By:** [Pinrong-](https://www.amazon.com/gp/help/seller/at-a-glance.html/ref=dp_merchant_link?ie=UTF8&seller=AEPY0S5X9FKQY&asin=B0FFMRH228&ref_=dp_merchant_link&isAmazonFulfilled=1)

### Description
- **Fabric Type:** 100% Polyester
- **Care Instructions:** Hand Wash Only
- **Origin:** Imported
- **Closure Type:** Pull On
- **Features:**
  - Show your patriotic spirit with the American Flag Sweater Hoodie, featuring a striking USA print.
  - Crafted from high-quality materials for exceptional comfort and durability.
  - Relaxed fit with a kangaroo pocket, easy to pair with jeans, shorts, or skirts.
  - Versatile for any occasion, ideal for layering or wearing on its own.

### Images
![Product Image](https://m.media-amazon.com/images/I/41Aj3isz70L._AC_.jpg)

### Reviews
- **5 Stars:** "Patriotic Sweater" - My dad loves the color RED and is very patriotic. He needs some more sweaters without holes. This is a nice bright red, soft, and big enough for him to wear.
- **2 Stars:** "Not impressed. Will not buy again." - These sweatshirts do not run true to size. Run very small.
- **4 Stars:** "Just found by searching" - It does shrink if you put it in the dryer.

For more details, you can visit the [Amazon product page](https://www.amazon.com/dp/B0FFMRH228).
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.scraperapi.com/integrations/llm-integrations/llamaindex-integration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
