Run benchmarks on prediction models
Track, rank and evaluate open LLMs and chatbots
Display genomic embedding leaderboard
Measure execution times of BERT models using WebGPU and WASM
Browse and filter ML model leaderboard data
Convert Stable Diffusion checkpoint to Diffusers and open a PR
Generate and view leaderboard for LLM evaluations
Display leaderboard for earthquake intent classification models
Load AI models and prepare your space
Browse and filter machine learning models by category and modality
Evaluate LLM over-refusal rates with OR-Bench
Evaluate code generation with diverse feedback types
Push a ML model to Hugging Face Hub
The LLM Forecasting Leaderboard is a platform designed for benchmarking and comparing the performance of large language models (LLMs) in forecasting tasks. It provides a comprehensive framework to evaluate these models on various datasets, enabling researchers and practitioners to identify top-performing models for specific forecasting needs. The leaderboard facilitates transparency and fosters innovation by showcasing the capabilities of different LLMs in prediction tasks.
• Real-Time Benchmarking: Continuously updated rankings of LLMs based on their forecasting performance.
• Customizable Evaluation: Users can define specific metrics and datasets for tailored benchmarking.
• Cross-Model Comparison: Directly compare the performance of multiple LLMs on the same tasks.
• Dataset Support: Access to a variety of pre-loaded datasets, including time series and trend-based data.
• Visualization Tools: Interactive charts and graphs to analyze performance differences.
• Model Version Tracking: Track improvements in model performance over time.
• Community Sharing: Share benchmarking results and insights with the broader AI community.
What types of forecasting tasks can I benchmark?
The LLM Forecasting Leaderboard supports a wide range of forecasting tasks, including time series prediction, trend forecasting, and sequential data modeling. Users can also customize tasks based on specific needs.
How often are the rankings updated?
Rankings are updated in real-time as new models are added or existing models are re-evaluated. This ensures the leaderboard always reflects the latest advancements in LLM technology.
Can I use custom datasets for benchmarking?
Yes, the platform allows users to upload and use their own datasets for benchmarking. This feature is particularly useful for domain-specific forecasting tasks.