OR-Bench Leaderboard

Evaluate LLM over-refusal rates with OR-Bench

Features

• Over-refusal rate tracking: Measures how frequently LLMs decline to answer questions they should know. • Comparison across models: Allows users to compare multiple models based on refusal rates. • Real-time leaderboards: Provides up-to-date rankings of LLMs in a competitive format. • Interactive data exploration: Enables users to filter results by specific criteria like model size or dataset. • Transparency and reproducibility: Offers detailed methodologies and datasets for independent verification.

How to use OR-Bench Leaderboard ?

Visit the OR-Bench Leaderboard platform and explore the available models and their performances.
Use the filtering options to narrow down results by model, dataset, or other criteria.
Analyze the refusal rates and compare them against other models.
Access additional resources like datasets, evaluation metrics, and detailed reports.
For submitting your own model, follow the submission guidelines provided on the platform.

Frequently Asked Questions

1. Why is OR-Bench Leaderboard important for evaluating LLMs?
OR-Bench Leaderboard is important because it helps identify models that are overly cautious, ensuring they provide meaningful answers rather than refusing when they have the capability to respond.

2. Can anyone submit their model to OR-Bench Leaderboard?
Yes, researchers and developers can submit their models for evaluation by following the submission guidelines provided on the platform.

3. How is the over-refusal rate calculated?
The over-refusal rate is calculated by evaluating how often a model refuses to answer questions it should reasonably be expected to answer, based on its training data and capabilities.

4. Does OR-Bench Leaderboard provide insights into model reliability?
Yes, the leaderboard offers insights into model reliability by highlighting how often models refuse to answer questions, helping users assess their practical effectiveness.

5. Are the datasets used for evaluation publicly accessible?
Yes, the datasets and evaluation methodologies used by OR-Bench Leaderboard are transparent and publicly accessible to ensure reproducibility and fairness.

Recommended Category

View All

✂️

OR-Bench Leaderboard

You May Also Like

Export to ONNX

Aiera Finance Leaderboard

SD To Diffusers

LLM Conf talk

Goodharts Law On Benchmarks

Can You Run It? LLM version

DécouvrIR

Vidore Leaderboard

Open LLM Leaderboard

LLM HALLUCINATIONS TOOL

Open Tw Llm Leaderboard

Deepfake Detection Arena Leaderboard

Features

How to use OR-Bench Leaderboard ?

Frequently Asked Questions

Recommended Category

Background Removal

Change the lighting in a photo

Make a viral meme

Language Translation

Automate meeting notes summaries

Image

Object Detection

Chatbots

OCR

Voice Cloning

Text Summarization

Data Visualization

Generate a 3D model from an image

Enhance audio quality

Colorize black and white photos