Mistral PHP Client

Hugging Face datasets

Splits

The Splits API retrieves the list of available splits (e.g., train, test, validation) for a given Hugging Face dataset repository. This is useful to understand the dataset partitioning before processing or analyzing it.

Tip

Some datasets contain multiple configurations and splits. This method helps you discover how the dataset is structured.

Code

use Partitech\PhpMistral\Clients\HuggingFace\HuggingFaceDatasetClient;
use Partitech\PhpMistral\MistralClientException;

$apiKey = getenv('HF_TOKEN');  // Hugging Face API token

$client = new HuggingFaceDatasetClient(apiKey: (string) $apiKey);

try {
    // Get the splits for the 'google/civil_comments' dataset
    $splits = $client->splits('google/civil_comments');

    print_r($splits);  // Output dataset splits

} catch (MistralClientException $e) {
    print_r($e);
}

Result

Array
(
    [splits] => Array
        (
            [0] => Array ( [dataset] => google/civil_comments [config] => default [split] => train )
            [1] => Array ( [dataset] => google/civil_comments [config] => default [split] => validation )
            [2] => Array ( [dataset] => google/civil_comments [config] => default [split] => test )
        )
    [pending] => Array ( )
    [failed] => Array ( )
)

splits: Lists all available splits for each dataset configuration.
pending: Lists splits that are being processed or pending generation.
failed: Lists splits that failed during processing (e.g., incompatible file formats).

Returned Fields

Field	Description
`dataset`	Name of the dataset.
`config`	Configuration name (if the dataset has multiple configs).
`split`	Name of the split (e.g., `train`, `test`, `validation`).

Use Cases

Dataset exploration: Discover the available splits and configurations before loading data.
Pipeline automation: Dynamically load or process different splits (e.g., only train).
Error handling: Detect pending or failed splits to avoid processing issues.

Common Pitfalls

Warning

Some datasets may not have standard splits (e.g., only train or custom names). Always check the splits before assuming their names.

Pending or failed splits indicate that viewer features (like search or filtering) may not work properly for those splits.

Tip

Use this method before invoking Rows, Search, or First Row APIs to ensure you pass valid split names.

Prompt FLow

¶Splits

¶Code

¶Result

¶Returned Fields

¶Use Cases