Mistral PHP Client

Hugging Face datasets

Search

The Search API enables full-text search within a Hugging Face dataset repository. This is particularly useful when looking for specific samples (e.g., rows containing certain keywords) across large datasets.

Tip

Use this method to query specific terms (e.g., keywords like "love", "error", "classification") within a dataset split, reducing the need to process the entire dataset manually.

Code

use Partitech\PhpMistral\Clients\HuggingFace\HuggingFaceDatasetClient;
use Partitech\PhpMistral\MistralClientException;

$apiKey = getenv('HF_TOKEN');  // Hugging Face API token

$client = new HuggingFaceDatasetClient(apiKey: (string) $apiKey);

try {
    // Perform a search query for the keyword 'love' in the 'cornell-movie-review-data/rotten_tomatoes' dataset
    $searchResult = $client->search(
        dataset: 'cornell-movie-review-data/rotten_tomatoes', // Dataset name
        split: 'train',                                       // Dataset split (e.g., train, test)
        config: 'default',                                    // Dataset config (if multi-config)
        query: 'love'                                         // Search query (keyword)
    );

    print_r($searchResult);  // Output search results

} catch (MistralClientException $e) {
    print_r($e);  // Handle errors (e.g., dataset not searchable)
}

Result

Array
(
    [features] => Array
        (
            [0] => Array ( [feature_idx] => 0 [name] => text [type] => Array ( [dtype] => string [_type] => Value ) )
            [1] => Array ( [feature_idx] => 1 [name] => label [type] => Array ( [names] => Array ( [0] => neg [1] => pos ) [_type] => ClassLabel ) )
        )
    [rows] => Array
        (
            [0] => Array ( [row_idx] => 6248 [row] => Array ( [text] => who needs love like this ? [label] => 0 ) )
            [1] => Array ( [row_idx] => 1015 [row] => Array ( [text] => if you love motown music , you'll love this documentary . [label] => 1 ) )
            [2] => Array ( [row_idx] => 1042 [row] => Array ( [text] => it's a lovely film with lovely performances by buy and accorsi . [label] => 1 ) )
        )
    [num_rows_total] => 234
    [num_rows_per_page] => 100
    [partial] => 
)

features: The dataset schema (columns and types).
rows: Matching rows with:
- row_idx: Global row index in the dataset.
- row: Data fields (e.g., text, label).
num_rows_total: Total number of matching rows for the query.
num_rows_per_page: Number of rows returned per page (pagination).
partial: Indicates if the result set is partial (1 = partial, 0 = full).

Parameters

Parameter	Description
`dataset`	The dataset name (e.g., `user/dataset`).
`split`	The dataset split (e.g., `train`, `test`, `validation`).
`config`	Dataset configuration (for multi-config datasets).
`query`	The text query to search for within dataset rows.

Use Cases

Keyword filtering: Find rows containing specific words or phrases.
Data exploration: Quickly sample relevant entries in large datasets.
Debugging: Search for specific patterns or errors within datasets.

Common Pitfalls

Warning

Search is supported only if the dataset viewer supports search. Use the Is Valid API to check.

Ensure the query term is correctly formatted (simple keywords work best).

Tip

Combine this with Rows or First Row APIs for dynamic data exploration and efficient dataset handling.

Prompt FLow

¶Search

¶Code

¶Result

¶Parameters

¶Use Cases