LLaMa.CPP

Llama.cpp Embeddings API

The Embeddings API allows you to convert input text into high-dimensional vector representations (embeddings). These embeddings are useful for various tasks such as semantic search, clustering, recommendation systems, or as input for other machine learning models.

Example: Generating Embeddings

use Partitech\PhpMistral\Clients\LlamaCpp\LlamaCppClient;
use Partitech\PhpMistral\Exceptions\MistralClientException;

$llamacppUrl = getenv('LLAMACPP_URL');
$llamacppApiKey = getenv('LLAMACPP_API_KEY');

$client = new LlamaCppClient(apiKey: $llamacppApiKey, url: $llamacppUrl);

// Define the inputs for which you want embeddings
$inputs = [
    "What is the best French cheese?",
];

try {
    // Request embeddings for the given inputs
    $embeddingsBatchResponse = $client->embeddings($inputs);

    print_r($embeddingsBatchResponse); // Full response, including metadata and embeddings

} catch (MistralClientException $e) {
    echo $e->getMessage();
    exit(1);
}

Example Output

Array
(
    [model] => llama-2-7b
    [object] => list
    [usage] => Array
        (
            [prompt_tokens] => 9
            [total_tokens] => 9
        )
    [data] => Array
        (
            [0] => Array
                (
                    [embedding] => Array
                        (
                            [0] => 0.026056783273816
                            [1] => -0.052360784262419
                            ...
                            [383] => 0.074204355478287
                        )
                    [index] => 0
                    [object] => embedding
                )
        )
)

model: The model used for generating embeddings.
usage: Token usage for the input (helpful for cost monitoring).
data: The generated embeddings for each input. Each embedding is a list of floating-point numbers representing the input in a high-dimensional vector space.

Important

Embeddings are typically 384 to 4096 dimensions depending on the model. Always check the model documentation for the exact size.

Batch Embeddings

The embeddings method supports multiple inputs at once. You can pass an array of strings to generate embeddings for each.

$inputs = [
    "What is the best French cheese?",
    "How to make Dijon mayonnaise?",
];

$embeddingsBatchResponse = $client->embeddings($inputs);

Tip

Generating embeddings in batches can optimize performance by reducing the number of API calls.

Use Cases

Semantic search: Compare embeddings using cosine similarity to find similar texts.
Clustering: Group similar texts together based on their embeddings.
Recommendation systems: Recommend content based on embedding proximity.
Dimensionality reduction: Visualize data with techniques like PCA or t-SNE.

Error Handling

The embeddings() method can throw a MistralClientException if:

The server is unreachable.
The model does not support embeddings.
The input is invalid.

Always wrap your calls in a try-catch block.

try {
    $embeddingsBatchResponse = $client->embeddings($inputs);
} catch (MistralClientException $e) {
    // Handle error gracefully
}

Caution

Ensure that your Llama.cpp server is configured with an appropriate model that supports embeddings generation. Not all models support this feature.

Prompt FLow

¶Llama.cpp Embeddings API

¶Example: Generating Embeddings

¶Example Output

¶Batch Embeddings

¶Use Cases