LLM Output Formats
Learn how to use ScraperAPI’s output parameter in PHP to generate clean, structured text and markdown responses ideal for training large language models.
To properly train LLMs, a lot of high quality unbiased data is needed. There is a lot of public data that is relevant for LLMs, but at times, that data can be too noisy and too large. Luckily, we have a solution. One that gathers large-scale data and cleans it by removing irrelevant or duplicate content. The result - structured format responses, that can be used to train LLMs effectively. Simply add the parameter output_format=text or output_format=markdown to the request structure. Here are some examples:
API REQUEST
<?php
$url =
"http://api.scraperapi.com?api_key=API_KEY&output_format=OUTPUT_FORMAT&url=https://www.google.com/search?q=Carglass+GmbH+Halle+%28Saale%29+%28Stadtbezirk+Nord%29&hl=en"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,
TRUE); curl_setopt($ch, CURLOPT_HEADER,
FALSE); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,
0); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,
0); $response = curl_exec($ch); curl_close($ch); print_r($response);ASYNC REQUEST
<?php
$url = "https://async.scraperapi.com/jobs";
$headers = [
"Content-Type: application/json"
];
$data = [
"apiKey" => "API_KEY",
"apiParams" => [
"output_format" => "OUTPUT_FORMAT"
],
"url" => "https://www.google.com/search?q=Carglass+GmbH+Halle+%28Saale%29+%28Stadtbezirk+Nord%29&hl=en"
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($ch);
echo $response;
curl_close($ch);
?>PROXY MODE - COMING SOON!
Sample Response
Markdown:
Text:
Last updated
Was this helpful?

