Browse a list of machine learning datasets
Explore recent datasets from Hugging Face Hub
Explore and manage datasets for machine learning
Browse and view Hugging Face datasets from a collection
Speech Corpus Creation Tool
Speech Corpus Creation Tool
Create a large, deduplicated dataset for LLM pre-training
Search and find similar datasets
Organize and process datasets using AI
Organize and process datasets for AI models
Explore datasets on a Nomic Atlas map
Support by Parquet, CSV, Jsonl, XLS
Datasets is a platform designed for browsing, managing, and utilizing machine learning datasets. It provides a centralized repository where users can explore various datasets, filter them based on specific criteria, and download them for use in their projects. Whether you're working on data analysis, model training, or research, Datasets simplifies the process of finding the right data for your needs.
• Diverse Dataset Collection: Access a wide range of datasets across different domains, including computer vision, natural language processing, and more.
• Search and Filter: Easily search for datasets using keywords, tags, or categories to find relevant data quickly.
• Dataset Details: View detailed information about each dataset, including descriptions, formats, and usage instructions.
• Download Options: Download datasets in various formats such as CSV, JSON, or ZIP.
• Version Control: Track different versions of datasets and access previous releases if needed.
What types of datasets are available on Datasets?
Datasets offers a wide variety of datasets, including but not limited to computer vision, natural language processing, time-series data, and structured datasets for tabular analysis.
Do I need to register to use Datasets?
Yes, registration is required to access and download datasets from the platform. This helps in tracking usage and ensuring compliance with dataset licenses.
How do I cite a dataset from Datasets in my research?
Each dataset provides citation instructions or a DOI (Digital Object Identifier) if available. Always follow the citation guidelines provided with the dataset to ensure proper attribution.