Display leaderboard for earthquake intent classification models
Measure over-refusal in LLMs using OR-Bench
Browse and evaluate ML tasks in MLIP Arena
Display and filter leaderboard models
Submit models for evaluation and view leaderboard
GIFT-Eval: A Benchmark for General Time Series Forecasting
Track, rank and evaluate open LLMs and chatbots
Create demo spaces for models on Hugging Face
Evaluate reward models for math reasoning
Export Hugging Face models to ONNX
Create and upload a Hugging Face model card
Display leaderboard of language model evaluations
Leaderboard of information retrieval models in French
Intent Leaderboard V12 is a cutting-edge tool designed for model benchmarking in the context of earthquake intent classification. It provides a comprehensive leaderboard that ranks and evaluates different models based on their performance in classifying earthquake-related intents. This allows researchers and developers to compare models effectively and identify top-performing solutions in the field.
• Real-Time Updates: The leaderboard is continuously updated to reflect the latest model performances. • Customizable Filters: Users can filter results based on specific criteria, such as model type or evaluation metrics. • Detailed Analytics: Provides in-depth insights into each model's strengths and weaknesses. • Model Comparison: Enables side-by-side comparison of multiple models to identify superior performers. • User Feedback Integration: Incorporates feedback from users to refine model rankings over time.
What does the Intent Leaderboard V12 display?
The leaderboard displays the performance of various models in classifying earthquake-related intents, ranked based on predetermined evaluation metrics.
How are models compared on the leaderboard?
Models are compared using standardized metrics such as accuracy, precision, recall, and F1-score, ensuring a fair and consistent evaluation process.
Can I customize the filters on the leaderboard?
Yes, users can apply custom filters to view results based on specific criteria like model architecture or datasets used, allowing for more tailored analysis.