List of French datasets not referenced on the Hub
Manage and label data for machine learning projects
Convert and PR models to Safetensors
Create a large, deduplicated dataset for LLM pre-training
Label data for machine learning models
Upload files to a Hugging Face repository
Create datasets with FAQs and SFT prompts
Rename models in dataset leaderboard
Generate synthetic datasets for AI training
Display trending datasets from Hugging Face
Review and rate queries
Browse and view Hugging Face datasets
Create Reddit dataset
Jeux de données en français mal référencés sur le Hub is a curated list of French datasets that are not well-referenced or easily accessible on popular data hubs. This collection aims to highlight datasets that are valuable but may have been overlooked due to insufficient documentation or lack of visibility. It covers a wide range of domains, including natural language processing (NLP), computer vision, and data science applications. The goal is to provide researchers and developers with high-quality French-language datasets that can be used for various projects and research initiatives.
What types of datasets are included in Jeux de données en français mal référencés sur le Hub?
The collection includes a variety of datasets, such as text corpora, image datasets, and structured data, all primarily in French.
Why is this collection useful for researchers?
It provides easy access to French datasets that are often difficult to find, saving time and effort for researchers working with French data.
How can I contribute a dataset to this collection?
You can submit your dataset through the platform's submission process, usually involving a form or repository pull request, where it will be reviewed for inclusion.