Hugging Face datasets

List by author

The List Datasets API retrieves the list of datasets published by a specific Hugging Face user or organization. This method supports pagination, sorting, and can retrieve either basic metadata or full dataset details.


Code

use Partitech\PhpMistral\Clients\HuggingFace\HuggingFaceDatasetClient;
use Partitech\PhpMistral\MistralClientException;

$apiKey = getenv('HF_TOKEN');       // Hugging Face API token
$datasetUser = getenv('HF_USER');   // Hugging Face username or organization

$client = new HuggingFaceDatasetClient(apiKey: (string) $apiKey);

try {
    // List datasets by author with detailed metadata
    $datasets = $client->listDatasets(
        author: $datasetUser,      // Author username or organization
        limit: 5,                  // Limit results (pagination)
        sort: 'lastModified',      // Sort by last modification date
        direction: -1,             // Direction: -1 (descending), 1 (ascending)
        full: true                 // Retrieve full dataset metadata (set to false for basic info)
    );

    print_r($datasets);  // Output dataset metadata

} catch (MistralClientException $e) {
    print_r($e);
}

Result

Array
(
    [0] => Array
        (
            [id] => Bourdin/test3
            [author] => Bourdin
            [cardData] => Array
                (
                    [language] => Array ( [0] => en )
                    [license] => cc0-1.0
                    [task_categories] => Array ( [0] => text-classification )
                    [task_ids] => Array ( [0] => multi-label-classification )
                    [dataset_info] => Array
                        (
                            [features] => Array
                                (
                                    [0] => Array ( [name] => text [dtype] => string )
                                    [1] => Array ( [name] => toxicity [dtype] => float32 )
                                    ...
                                )
                            [splits] => Array
                                (
                                    [0] => Array ( [name] => train [num_examples] => 1804874 )
                                    [1] => Array ( [name] => validation [num_examples] => 97320 )
                                    ...
                                )
                        )
                )
            [lastModified] => 2025-04-24T15:16:05.000Z
            [description] => Dataset Card for "civil_comments" ...
        )
)

Parameters

Parameter Description
author The username or organization name on Hugging Face.
limit Maximum number of datasets to retrieve (pagination).
sort Field to sort by (lastModified, createdAt, downloads, etc.).
direction Sorting direction: -1 (descending) or 1 (ascending).
full Whether to retrieve full dataset metadata (set to false for basic).

Dataset Metadata (Full Mode)

When full: true, each dataset entry includes:

  • id: Dataset identifier (e.g., user/dataset).
  • author: Dataset author.
  • cardData: Dataset card metadata (language, license, tags, task categories, etc.).
  • dataset_info: Features, splits, download size, etc.
  • lastModified: Last update timestamp.
  • description: Dataset description (shortened if too long).

Use Cases

  • Dataset discovery: List and filter datasets for a specific user or organization.
  • Metadata inspection: Retrieve detailed information about datasets (features, splits, licenses).
  • Monitoring: Track dataset updates (using lastModified).

Common Pitfalls