Track, rank and evaluate open LLMs and chatbots
Browse and filter machine learning models by category and modality
GIFT-Eval: A Benchmark for General Time Series Forecasting
Evaluate LLM over-refusal rates with OR-Bench
Teach, test, evaluate language models with MTEB Arena
Compare LLM performance across benchmarks
Predict customer churn based on input details
Explore GenAI model efficiency on ML.ENERGY leaderboard
Calculate VRAM requirements for LLM models
Explain GPU usage for model training
Evaluate open LLMs in the languages of LATAM and Spain.
Display genomic embedding leaderboard
Benchmark LLMs in accuracy and translation across languages
The Open LLM Leaderboard is a comprehensive tool designed to track, rank, and evaluate open-source Large Language Models (LLMs) and chatbots. It provides a transparent and standardized platform to compare models based on various benchmarks and metrics, helping developers, researchers, and users make informed decisions. By focusing on performance, efficiency, and capabilities, the Leaderboard serves as a go-to resource for understanding the evolution and advancements in the field of LLMs.
What metrics are used to rank LLMs? The Leaderboard uses a variety of metrics, including performance benchmarks, speed, memory usage, and specific task accuracy to ensure a holistic evaluation of each model.
Can I compare custom or non-listed models? Yes, the platform allows users to input custom models for comparison, providing flexibility for researchers and developers working on niche or proprietary LLMs.
How often is the Leaderboard updated? The Leaderboard is updated regularly to reflect new releases and improvements in existing models, ensuring users always have access to the latest information.