LLM Output Formats

To properly train LLMs, a lot of high quality unbiased data is needed. There is a lot of public data that is relevant for LLMs, but at times, that data can be too noisy and too large. Luckily, we have a solution. One that gathers large-scale data and cleans it by removing irrelevant or duplicate content. The result - structured format responses, that can be used to train LLMs effectively. Simply add the parameter output_format=text or output_format=markdown to your request structure. Here are some examples:

  • API REQUEST

curl --request GET \
  --url 'https://api.scraperapi.com?api_key=API_KEY&output_format=OUTPUT_FORMAT&url=https://www.amazon.com/dp/B07V1PHM66'
  • ASYNC REQUEST

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
        "apiKey": "API_KEY",
        "url": "https://www.amazon.com/dp/B07V1PHM66",
        "apiParams": {
          "output_format": "OUTPUT_FORMAT"
        }
      }' \
  "https://async.scraperapi.com/jobs"
  • PROXY MODE - COMING SOON!

Sample Response

chevron-rightMarkdownhashtag
chevron-rightTexthashtag

Last updated