Display and submit language model evaluations
Compare LLM performance across benchmarks
Convert and upload model files for Stable Diffusion
Teach, test, evaluate language models with MTEB Arena
Upload a machine learning model to Hugging Face Hub
Convert PyTorch models to waifu2x-ios format
Optimize and train foundation models using IBM's FMS
Display leaderboard for earthquake intent classification models
Evaluate adversarial robustness using generative models
Evaluate and submit AI model results for Frugal AI Challenge
GIFT-Eval: A Benchmark for General Time Series Forecasting
View NSQL Scores for Models
Measure over-refusal in LLMs using OR-Bench
Leaderboard is a platform designed for Model Benchmarking, allowing users to display and submit language model evaluations. It serves as a centralized hub where researchers and developers can compare the performance of different language models across various tasks and metrics. By providing a transparent and standardized environment, Leaderboard facilitates innovation and collaboration in the field of AI.
• Customizable Metrics: Evaluate models based on multiple criteria such as accuracy, F1-score, ROUGE score, and more.
• Real-Time Tracking: Stay updated with the latest submissions and benchmarking results.
• Model Comparison: Directly compare performance across different models and tasks.
• Filtering and Sorting: Easily filter models by task type, model size, or submission date.
• Submission Interface: Seamlessly submit your own model evaluations for inclusion on the leaderboard.
• Version Control: Track improvements in model performance over time with version history.
• Shareable Results: Generate and share links to specific model comparisons or benchmarking results.
How do I submit my model to the Leaderboard?
To submit your model, navigate to the submission interface, provide the required evaluation data, and follow the step-by-step instructions. Ensure your data meets the specified format and metrics requirements.
What types of models can I benchmark?
Leaderboard supports a wide range of language models, including but not limited to transformer-based models, RNNs, and traditional machine learning models.
Can I compare models across different tasks or metrics?
Yes, Leaderboard allows you to filter and compare models based on specific tasks or metrics, enabling detailed performance analysis.