View and submit LLM evaluations
Calculate memory needed to train AI models
GIFT-Eval: A Benchmark for General Time Series Forecasting
Run benchmarks on prediction models
Request model evaluation on COCO val 2017 dataset
Display genomic embedding leaderboard
Display model benchmark results
View NSQL Scores for Models
Pergel: A Unified Benchmark for Evaluating Turkish LLMs
Browse and filter ML model leaderboard data
Calculate VRAM requirements for LLM models
Submit deepfake detection models for evaluation
Evaluate AI-generated results for accuracy
Hallucinations Leaderboard is a platform designed for benchmarking and evaluating large language models (LLMs). It allows users to view and submit evaluations of LLMs based on their performance in generating accurate and coherent responses. The leaderboard focuses specifically on hallucinations, which are instances where models produce incorrect or nonsensical information. This tool helps researchers and developers identify models that excel in minimizing hallucinations while maintaining high-quality outputs.
What is the purpose of the Hallucinations Leaderboard?
The purpose of the Hallucinations Leaderboard is to provide a centralized platform for evaluating and comparing large language models based on their ability to minimize hallucinations while generating high-quality outputs.
Do I need technical expertise to use the Hallucinations Leaderboard?
No, the leaderboard is designed to be user-friendly. While technical expertise may be helpful for interpreting results, the platform is accessible to anyone interested in understanding LLM performance.
Can I submit my own evaluations to the leaderboard?
Yes, the Hallucinations Leaderboard offers a submission interface for users to contribute their own evaluations. Ensure your evaluations adhere to the platform's guidelines for consistency and accuracy.