Benchmark LLMs in accuracy and translation across languages
Pergel: A Unified Benchmark for Evaluating Turkish LLMs
Calculate survival probability based on passenger details
Explore and submit models using the LLM Leaderboard
Analyze model errors with interactive pages
Measure BERT model performance using WASM and WebGPU
Evaluate LLM over-refusal rates with OR-Bench
Push a ML model to Hugging Face Hub
Measure over-refusal in LLMs using OR-Bench
Display and submit LLM benchmarks
Browse and submit LLM evaluations
Calculate memory usage for LLM models
Display LLM benchmark leaderboard and info
The European Leaderboard is a benchmarking tool designed to evaluate and compare Large Language Models (LLMs) across European languages. It focuses on assessing models based on their accuracy and translation capabilities in multiple languages, providing a comprehensive overview of their performance in diverse linguistic contexts.
The European Leaderboard offers the following features:
• Multilingual Support: Evaluates models across a wide range of European languages.
• Accuracy Benchmarking: Measures models' performance in understanding and generating text accurately.
• Translation Capabilities: Assesses how well models translate text between European languages.
• Detailed Results: Provides in-depth analysis and rankings of model performance.
• Filtering Options: Allows users to filter results by specific languages or model types.
• Consistent Evaluation: Ensures fair and consistent benchmarking across all models.
Using the European Leaderboard is straightforward:
What languages are supported by the European Leaderboard?
The European Leaderboard supports a wide range of European languages, including English, Spanish, French, German, Italian, Portuguese, Dutch, and many others.
How are models ranked on the leaderboard?
Models are ranked based on their performance in both accuracy and translation tasks. The rankings are determined by a combination of scores from these evaluations.
Can I customize the evaluation criteria?
Yes, users can filter results by specific languages or model types to focus on particular aspects of performance.
How often is the leaderboard updated?
The leaderboard is regularly updated to include new models and improvements in existing ones, ensuring the most current benchmarking data is available.