LLM Output Formats

Learn how to use ScraperAPI’s output parameter in Python to generate clean, structured text and markdown responses ideal for training large language models.

To properly train LLMs, a lot of high quality unbiased data is needed. There is a lot of public data that is relevant for LLMs, but at times, that data can be too noisy and too large. Luckily, we have a solution. One that gathers large-scale data and cleans it by removing irrelevant or duplicate content. The result - structured format responses, that can be used to train LLMs effectively. Simply add the parameter output_format=text or output_format=markdown to the request structure. Here are some examples:

  • API REQUEST

import requests
url = "https://api.scraperapi.com/"
params = {
    "api_key": "API_KEY",
    "output_format": "OUTPUT_FORMAT",
    "url": "https://www.google.com/search?q=Carglass+GmbH+Halle+%28Saale%29+%28Stadtbezirk+Nord%29&hl=en"
}
response = requests.get(url, params=params)
print(response.text)
  • ASYNC REQUEST

import requests
import json
url = "https://async.scraperapi.com/jobs"
headers = {
    "Content-Type": "application/json"
}
data = {
    "apiKey": "API_KEY",
    "apiParams": {
        "output_format": "OUTPUT_FORMAT"
    },
    "url": "https://www.google.com/search?q=Carglass+GmbH+Halle+%28Saale%29+%28Stadtbezirk+Nord%29&hl=en"
}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.text)
  • PROXY MODE - COMING SOON!

Sample Response

Markdown:

Text:

Last updated

Was this helpful?