View and submit LLM evaluations
Browse and evaluate ML tasks in MLIP Arena
Display leaderboard for earthquake intent classification models
View NSQL Scores for Models
Persian Text Embedding Benchmark
Submit deepfake detection models for evaluation
Browse and filter machine learning models by category and modality
View and compare language model evaluations
Measure execution times of BERT models using WebGPU and WASM
Multilingual Text Embedding Model Pruner
Measure BERT model performance using WASM and WebGPU
Browse and submit evaluations for CaselawQA benchmarks
Benchmark AI models by comparison
Hallucinations Leaderboard is a platform designed for benchmarking and evaluating large language models (LLMs). It allows users to view and submit evaluations of LLMs based on their performance in generating accurate and coherent responses. The leaderboard focuses specifically on hallucinations, which are instances where models produce incorrect or nonsensical information. This tool helps researchers and developers identify models that excel in minimizing hallucinations while maintaining high-quality outputs.
What is the purpose of the Hallucinations Leaderboard?
The purpose of the Hallucinations Leaderboard is to provide a centralized platform for evaluating and comparing large language models based on their ability to minimize hallucinations while generating high-quality outputs.
Do I need technical expertise to use the Hallucinations Leaderboard?
No, the leaderboard is designed to be user-friendly. While technical expertise may be helpful for interpreting results, the platform is accessible to anyone interested in understanding LLM performance.
Can I submit my own evaluations to the leaderboard?
Yes, the Hallucinations Leaderboard offers a submission interface for users to contribute their own evaluations. Ensure your evaluations adhere to the platform's guidelines for consistency and accuracy.