View and submit machine learning model evaluations
View LLM Performance Leaderboard
Upload ML model to Hugging Face Hub
Request model evaluation on COCO val 2017 dataset
Benchmark AI models by comparison
Convert Stable Diffusion checkpoint to Diffusers and open a PR
Upload a machine learning model to Hugging Face Hub
Browse and filter ML model leaderboard data
View and submit LLM benchmark evaluations
View and submit LLM benchmark evaluations
Convert a Stable Diffusion XL checkpoint to Diffusers and open a PR
Evaluate LLM over-refusal rates with OR-Bench
Pergel: A Unified Benchmark for Evaluating Turkish LLMs
The LLM Safety Leaderboard is a platform designed to evaluate and compare the safety performance of large language models (LLMs). It provides a community-driven space where users can submit evaluations of machine learning models, focusing on their adherence to safety guidelines and ethical standards. The leaderboard serves as a transparent tool for developers, researchers, and users to assess and improve the safety of AI models.
1. What makes the LLM Safety Leaderboard unique?
The leaderboard's focus on safety metrics and its community-driven submissions set it apart from other model benchmarking tools. It prioritizes ethical AI development and user participation.
2. Can anyone submit a model evaluation?
Yes, any user can submit evaluations, provided they meet the platform's guidelines and quality standards. This ensures diverse and reliable data.
3. How are models ranked on the leaderboard?
Models are ranked based on aggregated safety metrics, including user submissions and automated evaluations. Rankings are updated in real-time as new data is added.