Mistral PHP Client

LLaMa.CPP

Metrics

Retrieve server metrics from the Llama.cpp backend. This endpoint provides valuable insights into the server status, resource usage, and performance statistics.

Tip

Use metrics to monitor system health, track resource consumption (e.g., memory, CPU), and optimize your deployment.

Code

use Partitech\PhpMistral\Clients\LlamaCpp\LlamaCppClient;

$llamacppUrl = getenv('LLAMACPP_URL');
$llamacppApiKey = getenv('LLAMACPP_API_KEY');

$client = new LlamaCppClient(apiKey: $llamacppApiKey, url: $llamacppUrl);

try {
    $response = $client->metrics();
    print_r($response);  // Display the full metrics data
} catch (\Throwable $e) {
    echo $e->getMessage();
    exit(1);
}

Example Output

Array
(
    [model_loaded] => true
    [model_name] => llama-2-7b
    [threads] => 8
    [ctx_size] => 4096
    [gpu_layers] => 35
    [embeddings] => true
    [completion_tokens] => 1024
    [embedding_tokens] => 512
    [total_requests] => 123
    [memory_usage_mb] => 5120
)

Field	Description
`model_loaded`	Whether a model is currently loaded on the server.
`model_name`	The name of the loaded model.
`threads`	Number of threads allocated for inference.
`ctx_size`	Maximum context window size (in tokens).
`gpu_layers`	Number of layers offloaded to GPU (if applicable).
`embeddings`	Indicates if embeddings generation is supported.
`completion_tokens`	Total number of tokens processed for completions.
`embedding_tokens`	Total number of tokens processed for embeddings.
`total_requests`	Total number of API requests handled by the server.
`memory_usage_mb`	Approximate memory usage in megabytes.

Note

Metrics values may vary depending on the server configuration, hardware (CPU/GPU), and model in use.

Use Cases

Monitoring: Track resource consumption and performance trends over time.
Scaling decisions: Adjust infrastructure based on token throughput or memory usage.
Debugging: Validate that the correct model is loaded and server parameters are configured as expected.

Caution

Some fields (e.g., GPU layers, memory usage) may depend on the Llama.cpp build configuration and hardware environment.

Prompt FLow

¶Metrics

¶Code

¶Example Output

¶Use Cases

Metrics

Code

Example Output

Use Cases