Benchmark LLMs in accuracy and translation across languages
Submit models for evaluation and view leaderboard
Measure over-refusal in LLMs using OR-Bench
GIFT-Eval: A Benchmark for General Time Series Forecasting
Create demo spaces for models on Hugging Face
Load AI models and prepare your space
Calculate memory usage for LLM models
Calculate memory needed to train AI models
Download a TriplaneGaussian model checkpoint
Display LLM benchmark leaderboard and info
Submit deepfake detection models for evaluation
Predict customer churn based on input details
Launch web-based model application
The European Leaderboard is a benchmarking tool designed to evaluate and compare Large Language Models (LLMs) across European languages. It focuses on assessing models based on their accuracy and translation capabilities in multiple languages, providing a comprehensive overview of their performance in diverse linguistic contexts.
The European Leaderboard offers the following features:
• Multilingual Support: Evaluates models across a wide range of European languages.
• Accuracy Benchmarking: Measures models' performance in understanding and generating text accurately.
• Translation Capabilities: Assesses how well models translate text between European languages.
• Detailed Results: Provides in-depth analysis and rankings of model performance.
• Filtering Options: Allows users to filter results by specific languages or model types.
• Consistent Evaluation: Ensures fair and consistent benchmarking across all models.
Using the European Leaderboard is straightforward:
What languages are supported by the European Leaderboard?
The European Leaderboard supports a wide range of European languages, including English, Spanish, French, German, Italian, Portuguese, Dutch, and many others.
How are models ranked on the leaderboard?
Models are ranked based on their performance in both accuracy and translation tasks. The rankings are determined by a combination of scores from these evaluations.
Can I customize the evaluation criteria?
Yes, users can filter results by specific languages or model types to focus on particular aspects of performance.
How often is the leaderboard updated?
The leaderboard is regularly updated to include new models and improvements in existing ones, ensuring the most current benchmarking data is available.