LLaMa.CPP

Completion API

This section demonstrates how to use the Completion API with the Llama.cpp.

The Completion API generates text continuations based on a given prompt. It supports both non-streaming and streaming modes.


Completion without streaming

This example shows how to perform a basic text completion request in non-streaming mode. The response will be returned once the entire generation is complete.

use Partitech\PhpMistral\Clients\LlamaCpp\LlamaCppClient;

$llamacppUrl = getenv('LLAMACPP_URL');
$llamacppApiKey = getenv('LLAMACPP_API_KEY');

$client = new LlamaCppClient(apiKey: $llamacppApiKey, url: $llamacppUrl);

$params = [
    'temperature' => 0.7,   // Controls randomness (higher = more creative)
    'top_p' => 1,           // Nucleus sampling (1 = consider all tokens)
    'max_tokens' => 1000,   // Maximum number of tokens to generate
    'seed' => 15,           // Set a seed for reproducibility
];

try {
    $chatResponse = $client->completion(
        prompt: 'The ingredients that make up dijon mayonnaise are ',
        params: $params
    );

    print_r($chatResponse->getMessage()); // Generated text
    print_r($chatResponse->getUsage());   // Token usage statistics
    
} catch (\Throwable $e) {
    echo $e->getMessage();
    exit(1);
}

Completion with streaming

This example demonstrates how to use streaming mode. The generated text is sent incrementally (in chunks) as soon as it is available, making it ideal for real-time applications like chatbots.

use Partitech\PhpMistral\Clients\LlamaCpp\LlamaCppClient;

$llamacppUrl = getenv('LLAMACPP_URL');
$llamacppApiKey = getenv('LLAMACPP_API_KEY');

$client = new LlamaCppClient(apiKey: $llamacppApiKey, url: $llamacppUrl);

$params = [
    'temperature' => 0.7,
    'top_p' => 1,
    'max_tokens' => 1000,
    'seed' => 15,
];

try {
    foreach ($client->completion(
        prompt: 'Explain step by step how to make dijon mayonnaise ',
        params: $params,
        stream: true
    ) as $chunk) {
        echo $chunk->getChunk(); // Output each chunk as it arrives
    }
} catch (\Throwable $e) {
    echo $e->getMessage();
    exit(1);
}

Common Parameters

Parameter Type Description
temperature float Controls the randomness of the output. Lower values make the output more deterministic.
top_p float Enables nucleus sampling; considers tokens with top cumulative probability p.
max_tokens int Maximum number of tokens to generate.
seed int Fixes the random seed for reproducibility.
stream bool Enables streaming mode, returning the output in chunks.

Example Outputs

  • Non-Streaming:
The ingredients that make up dijon mayonnaise are egg yolks, Dijon mustard, lemon juice, vinegar, and vegetable oil.
  • Streaming:
Explain step by step how to make dijon mayonnaise 
1. Gather your ingredients: egg yolks, Dijon mustard, lemon juice, vinegar, and vegetable oil.
2. In a bowl, whisk together the egg yolks and Dijon mustard until combined.
3. Slowly drizzle in the oil while continuously whisking to create an emulsion...