Display and submit language model evaluations
Merge Lora adapters with a base model
Display genomic embedding leaderboard
Evaluate code generation with diverse feedback types
Create demo spaces for models on Hugging Face
Calculate survival probability based on passenger details
Evaluate LLM over-refusal rates with OR-Bench
Measure over-refusal in LLMs using OR-Bench
View NSQL Scores for Models
Display LLM benchmark leaderboard and info
Display leaderboard for earthquake intent classification models
View and submit LLM benchmark evaluations
View and submit machine learning model evaluations
Leaderboard is a platform designed for Model Benchmarking, allowing users to display and submit language model evaluations. It serves as a centralized hub where researchers and developers can compare the performance of different language models across various tasks and metrics. By providing a transparent and standardized environment, Leaderboard facilitates innovation and collaboration in the field of AI.
• Customizable Metrics: Evaluate models based on multiple criteria such as accuracy, F1-score, ROUGE score, and more.
• Real-Time Tracking: Stay updated with the latest submissions and benchmarking results.
• Model Comparison: Directly compare performance across different models and tasks.
• Filtering and Sorting: Easily filter models by task type, model size, or submission date.
• Submission Interface: Seamlessly submit your own model evaluations for inclusion on the leaderboard.
• Version Control: Track improvements in model performance over time with version history.
• Shareable Results: Generate and share links to specific model comparisons or benchmarking results.
How do I submit my model to the Leaderboard?
To submit your model, navigate to the submission interface, provide the required evaluation data, and follow the step-by-step instructions. Ensure your data meets the specified format and metrics requirements.
What types of models can I benchmark?
Leaderboard supports a wide range of language models, including but not limited to transformer-based models, RNNs, and traditional machine learning models.
Can I compare models across different tasks or metrics?
Yes, Leaderboard allows you to filter and compare models based on specific tasks or metrics, enabling detailed performance analysis.