Mistral PHP Client

LLaMa.CPP

Completion API

This section demonstrates how to use the Completion API with the Llama.cpp.

The Completion API generates text continuations based on a given prompt. It supports both non-streaming and streaming modes.

Tip

Use streaming mode when you want to start processing or displaying the generated text as soon as possible, especially for long outputs or interactive applications.

Completion without streaming

This example shows how to perform a basic text completion request in non-streaming mode. The response will be returned once the entire generation is complete.

use Partitech\PhpMistral\Clients\LlamaCpp\LlamaCppClient;

$llamacppUrl = getenv('LLAMACPP_URL');
$llamacppApiKey = getenv('LLAMACPP_API_KEY');

$client = new LlamaCppClient(apiKey: $llamacppApiKey, url: $llamacppUrl);

$params = [
    'temperature' => 0.7,   // Controls randomness (higher = more creative)
    'top_p' => 1,           // Nucleus sampling (1 = consider all tokens)
    'max_tokens' => 1000,   // Maximum number of tokens to generate
    'seed' => 15,           // Set a seed for reproducibility
];

try {
    $chatResponse = $client->completion(
        prompt: 'The ingredients that make up dijon mayonnaise are ',
        params: $params
    );

    print_r($chatResponse->getMessage()); // Generated text
    print_r($chatResponse->getUsage());   // Token usage statistics
    
} catch (\Throwable $e) {
    echo $e->getMessage();
    exit(1);
}

Note

The seed parameter allows you to reproduce the same output for the same input, which is useful for debugging or testing scenarios.

Completion with streaming

This example demonstrates how to use streaming mode. The generated text is sent incrementally (in chunks) as soon as it is available, making it ideal for real-time applications like chatbots.

use Partitech\PhpMistral\Clients\LlamaCpp\LlamaCppClient;

$llamacppUrl = getenv('LLAMACPP_URL');
$llamacppApiKey = getenv('LLAMACPP_API_KEY');

$client = new LlamaCppClient(apiKey: $llamacppApiKey, url: $llamacppUrl);

$params = [
    'temperature' => 0.7,
    'top_p' => 1,
    'max_tokens' => 1000,
    'seed' => 15,
];

try {
    foreach ($client->completion(
        prompt: 'Explain step by step how to make dijon mayonnaise ',
        params: $params,
        stream: true
    ) as $chunk) {
        echo $chunk->getChunk(); // Output each chunk as it arrives
    }
} catch (\Throwable $e) {
    echo $e->getMessage();
    exit(1);
}

Tip

You can buffer or process each chunk individually, depending on your application's needs (e.g., displaying it progressively on a web page).

Common Parameters

Parameter	Type	Description
`temperature`	float	Controls the randomness of the output. Lower values make the output more deterministic.
`top_p`	float	Enables nucleus sampling; considers tokens with top cumulative probability `p`.
`max_tokens`	int	Maximum number of tokens to generate.
`seed`	int	Fixes the random seed for reproducibility.
`stream`	bool	Enables streaming mode, returning the output in chunks.

Warning

Setting max_tokens too high may lead to long generation times or higher resource consumption. Adjust according to your use case.

Example Outputs

Non-Streaming:

The ingredients that make up dijon mayonnaise are egg yolks, Dijon mustard, lemon juice, vinegar, and vegetable oil.

Streaming:

Explain step by step how to make dijon mayonnaise 
1. Gather your ingredients: egg yolks, Dijon mustard, lemon juice, vinegar, and vegetable oil.
2. In a bowl, whisk together the egg yolks and Dijon mustard until combined.
3. Slowly drizzle in the oil while continuously whisking to create an emulsion...

Important

Ensure your Llama.cpp server is properly configured with the desired model and supports the completion endpoint.

Prompt FLow

¶Completion API

¶Completion without streaming

¶Completion with streaming

¶Common Parameters

¶Example Outputs

Completion API

Completion without streaming

Completion with streaming

Common Parameters

Example Outputs