Browse and submit evaluations for CaselawQA benchmarks
Merge machine learning models using a YAML configuration file
Benchmark models using PyTorch and OpenVINO
GIFT-Eval: A Benchmark for General Time Series Forecasting
Teach, test, evaluate language models with MTEB Arena
Determine GPU requirements for large language models
Evaluate open LLMs in the languages of LATAM and Spain.
Create and manage ML pipelines with ZenML Dashboard
Benchmark AI models by comparison
Predict customer churn based on input details
Evaluate model predictions with TruLens
Display and submit LLM benchmarks
Convert Stable Diffusion checkpoint to Diffusers and open a PR
The CaselawQA leaderboard (WIP) is a platform designed for tracking and comparing the performance of AI models on the CaselawQA benchmark. It enables researchers and practitioners to evaluate and submit results for their models, fostering collaboration and progress in legal AI applications. The leaderboard is currently a work in progress, with ongoing updates and improvements being made to enhance its functionality and usability.
What is the CaselawQA benchmark?
The CaselawQA benchmark is a dataset and evaluation framework designed to assess the ability of AI models to answer legal questions based on case law.
How do I submit my model's results?
To submit your model's results, use the submission interface on the CaselawQA leaderboard. Follow the provided instructions to upload your results in the required format.
Is the leaderboard open to everyone?
Yes, the leaderboard is open to all researchers and developers who want to evaluate their models on the CaselawQA benchmark. No special access is required.