Display and submit language model evaluations
Merge Lora adapters with a base model
Compare code model performance on benchmarks
Pergel: A Unified Benchmark for Evaluating Turkish LLMs
Rank machines based on LLaMA 7B v2 benchmark results
Compare and rank LLMs using benchmark scores
Visualize model performance on function calling tasks
Generate leaderboard comparing DNA models
Browse and submit LLM evaluations
Explore and visualize diverse models
Measure BERT model performance using WASM and WebGPU
Leaderboard of information retrieval models in French
Calculate memory needed to train AI models
Leaderboard is a platform designed for Model Benchmarking, allowing users to display and submit language model evaluations. It serves as a centralized hub where researchers and developers can compare the performance of different language models across various tasks and metrics. By providing a transparent and standardized environment, Leaderboard facilitates innovation and collaboration in the field of AI.
• Customizable Metrics: Evaluate models based on multiple criteria such as accuracy, F1-score, ROUGE score, and more.
• Real-Time Tracking: Stay updated with the latest submissions and benchmarking results.
• Model Comparison: Directly compare performance across different models and tasks.
• Filtering and Sorting: Easily filter models by task type, model size, or submission date.
• Submission Interface: Seamlessly submit your own model evaluations for inclusion on the leaderboard.
• Version Control: Track improvements in model performance over time with version history.
• Shareable Results: Generate and share links to specific model comparisons or benchmarking results.
How do I submit my model to the Leaderboard?
To submit your model, navigate to the submission interface, provide the required evaluation data, and follow the step-by-step instructions. Ensure your data meets the specified format and metrics requirements.
What types of models can I benchmark?
Leaderboard supports a wide range of language models, including but not limited to transformer-based models, RNNs, and traditional machine learning models.
Can I compare models across different tasks or metrics?
Yes, Leaderboard allows you to filter and compare models based on specific tasks or metrics, enabling detailed performance analysis.