Browse a list of machine learning datasets
Support by Parquet, CSV, Jsonl, XLS
Display trending datasets and spaces
Create a large, deduplicated dataset for LLM pre-training
Upload files to a Hugging Face repository
Convert and PR models to Safetensors
sign in to receive news on the iPhone app
Explore and edit JSON datasets
Manage and analyze datasets with AI tools
Browse and view Hugging Face datasets from a collection
Create Reddit dataset
Build datasets using natural language
Datasets is a platform designed for browsing, managing, and utilizing machine learning datasets. It provides a centralized repository where users can explore various datasets, filter them based on specific criteria, and download them for use in their projects. Whether you're working on data analysis, model training, or research, Datasets simplifies the process of finding the right data for your needs.
• Diverse Dataset Collection: Access a wide range of datasets across different domains, including computer vision, natural language processing, and more.
• Search and Filter: Easily search for datasets using keywords, tags, or categories to find relevant data quickly.
• Dataset Details: View detailed information about each dataset, including descriptions, formats, and usage instructions.
• Download Options: Download datasets in various formats such as CSV, JSON, or ZIP.
• Version Control: Track different versions of datasets and access previous releases if needed.
What types of datasets are available on Datasets?
Datasets offers a wide variety of datasets, including but not limited to computer vision, natural language processing, time-series data, and structured datasets for tabular analysis.
Do I need to register to use Datasets?
Yes, registration is required to access and download datasets from the platform. This helps in tracking usage and ensuring compliance with dataset licenses.
How do I cite a dataset from Datasets in my research?
Each dataset provides citation instructions or a DOI (Digital Object Identifier) if available. Always follow the citation guidelines provided with the dataset to ensure proper attribution.