SomeAI.org
  • Hot AI Tools
  • New AI Tools
  • AI Category
  • Free Submit
  • Find More AI Tools
SomeAI.org
SomeAI.org

Discover 10,000+ free AI tools instantly. No login required.

About

  • Blog

© 2025 • SomeAI.org All rights reserved.

  • Privacy Policy
  • Terms of Service
Home
Model Benchmarking
OR-Bench Leaderboard

OR-Bench Leaderboard

Evaluate LLM over-refusal rates with OR-Bench

You May Also Like

View All
🐠

Nexus Function Calling Leaderboard

Visualize model performance on function calling tasks

92
🐠

Space That Creates Model Demo Space

Create demo spaces for models on Hugging Face

4
🦀

LLM Forecasting Leaderboard

Run benchmarks on prediction models

14
🛠

Merge Lora

Merge Lora adapters with a base model

18
🥇

ContextualBench-Leaderboard

View and submit language model evaluations

14
🐨

Open Multilingual Llm Leaderboard

Search for model performance across languages and benchmarks

56
📊

Llm Memory Requirement

Calculate memory usage for LLM models

2
🥇

Arabic MMMLU Leaderborad

Generate and view leaderboard for LLM evaluations

15
🚀

stm32 model zoo app

Explore and manage STM32 ML models with the STM32AI Model Zoo dashboard

2
🥇

Deepfake Detection Arena Leaderboard

Submit deepfake detection models for evaluation

3
⚡

ML.ENERGY Leaderboard

Explore GenAI model efficiency on ML.ENERGY leaderboard

8
🏆

Low-bit Quantized Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

166

OR-Bench Leaderboard is a benchmarking tool designed to evaluate the performance of large language models (LLMs) with a specific focus on their over-refusal rates. It provides a comprehensive platform to assess how often LLMs refuse to provide answers, even when they should be capable of doing so. This metric is crucial for understanding model reliability and effectiveness in real-world applications.

Features

• Over-refusal rate tracking: Measures how frequently LLMs decline to answer questions they should know. • Comparison across models: Allows users to compare multiple models based on refusal rates. • Real-time leaderboards: Provides up-to-date rankings of LLMs in a competitive format. • Interactive data exploration: Enables users to filter results by specific criteria like model size or dataset. • Transparency and reproducibility: Offers detailed methodologies and datasets for independent verification.

How to use OR-Bench Leaderboard ?

  1. Visit the OR-Bench Leaderboard platform and explore the available models and their performances.
  2. Use the filtering options to narrow down results by model, dataset, or other criteria.
  3. Analyze the refusal rates and compare them against other models.
  4. Access additional resources like datasets, evaluation metrics, and detailed reports.
  5. For submitting your own model, follow the submission guidelines provided on the platform.

Frequently Asked Questions

1. Why is OR-Bench Leaderboard important for evaluating LLMs?
OR-Bench Leaderboard is important because it helps identify models that are overly cautious, ensuring they provide meaningful answers rather than refusing when they have the capability to respond.

2. Can anyone submit their model to OR-Bench Leaderboard?
Yes, researchers and developers can submit their models for evaluation by following the submission guidelines provided on the platform.

3. How is the over-refusal rate calculated?
The over-refusal rate is calculated by evaluating how often a model refuses to answer questions it should reasonably be expected to answer, based on its training data and capabilities.

4. Does OR-Bench Leaderboard provide insights into model reliability?
Yes, the leaderboard offers insights into model reliability by highlighting how often models refuse to answer questions, helping users assess their practical effectiveness.

5. Are the datasets used for evaluation publicly accessible?
Yes, the datasets and evaluation methodologies used by OR-Bench Leaderboard are transparent and publicly accessible to ensure reproducibility and fairness.

Recommended Category

View All
🎥

Convert a portrait into a talking video

🌈

Colorize black and white photos

🔧

Fine Tuning Tools

📐

Convert 2D sketches into 3D models

📏

Model Benchmarking

🔖

Put a logo on an image

↔️

Extend images automatically

🕺

Pose Estimation

💻

Code Generation

🧹

Remove objects from a photo

😂

Make a viral meme

🎥

Create a video from an image

🖌️

Generate a custom logo

🎭

Character Animation

📋

Text Summarization