View LLM Performance Leaderboard
GIFT-Eval: A Benchmark for General Time Series Forecasting
Persian Text Embedding Benchmark
Search for model performance across languages and benchmarks
Create and upload a Hugging Face model card
Calculate memory usage for LLM models
Convert PaddleOCR models to ONNX format
Browse and submit model evaluations in LLM benchmarks
Export Hugging Face models to ONNX
Measure execution times of BERT models using WebGPU and WASM
Evaluate RAG systems with visual analytics
Convert and upload model files for Stable Diffusion
Evaluate open LLMs in the languages of LATAM and Spain.
The LLM Performance Leaderboard is a tool designed to evaluate and compare the performance of large language models (LLMs) across various tasks and datasets. It provides a comprehensive overview of model capabilities, helping users identify top-performing models for specific use cases. By benchmarking models, the leaderboard enables researchers and developers to make informed decisions about model selection and optimization.
• Performance Metrics: Detailed performance metrics across multiple benchmarks and datasets.
• Model Comparisons: Side-by-side comparisons of different LLMs, highlighting strengths and weaknesses.
• Customizable Benchmarks: Ability to filter results by specific tasks or datasets.
• Interactive Visualizations: Graphs and charts to simplify data interpretation.
• Real-Time Updates: Regular updates with the latest models and benchmark results.
• Community Insights: Access to expert analyses and community discussions on model performance.
What types of models are included in the leaderboard?
The leaderboard includes a wide range of LLMs, from open-source models to proprietary ones, covering various architectures and sizes.
How often are the results updated?
Results are updated regularly, typically when new models are released or when significant updates to existing benchmarks occur.
Can I contribute to the leaderboard?
Yes, contributions are welcome. Users can submit feedback, suggest new benchmarks, or participate in community discussions to enhance the platform.