View and submit LLM evaluations
Merge machine learning models using a YAML configuration file
Pergel: A Unified Benchmark for Evaluating Turkish LLMs
Display benchmark results
Convert a Stable Diffusion XL checkpoint to Diffusers and open a PR
Compare code model performance on benchmarks
Find recent high-liked Hugging Face models
Display leaderboard for earthquake intent classification models
Visualize model performance on function calling tasks
Create and manage ML pipelines with ZenML Dashboard
Create and upload a Hugging Face model card
Display LLM benchmark leaderboard and info
Search for model performance across languages and benchmarks
Hallucinations Leaderboard is a platform designed for benchmarking and evaluating large language models (LLMs). It allows users to view and submit evaluations of LLMs based on their performance in generating accurate and coherent responses. The leaderboard focuses specifically on hallucinations, which are instances where models produce incorrect or nonsensical information. This tool helps researchers and developers identify models that excel in minimizing hallucinations while maintaining high-quality outputs.
What is the purpose of the Hallucinations Leaderboard?
The purpose of the Hallucinations Leaderboard is to provide a centralized platform for evaluating and comparing large language models based on their ability to minimize hallucinations while generating high-quality outputs.
Do I need technical expertise to use the Hallucinations Leaderboard?
No, the leaderboard is designed to be user-friendly. While technical expertise may be helpful for interpreting results, the platform is accessible to anyone interested in understanding LLM performance.
Can I submit my own evaluations to the leaderboard?
Yes, the Hallucinations Leaderboard offers a submission interface for users to contribute their own evaluations. Ensure your evaluations adhere to the platform's guidelines for consistency and accuracy.