Display benchmark results
Convert PaddleOCR models to ONNX format
Find and download models from Hugging Face
Upload ML model to Hugging Face Hub
Rank machines based on LLaMA 7B v2 benchmark results
Text-To-Speech (TTS) Evaluation using objective metrics.
Generate and view leaderboard for LLM evaluations
Evaluate open LLMs in the languages of LATAM and Spain.
View LLM Performance Leaderboard
Pergel: A Unified Benchmark for Evaluating Turkish LLMs
Merge Lora adapters with a base model
Evaluate code generation with diverse feedback types
Predict customer churn based on input details
Redteaming Resistance Leaderboard is a benchmarking tool designed to evaluate the performance of AI models under adversarial attacks. It provides a platform to test and compare the resistance of different models to red teaming strategies, helping researchers and developers identify strengths and weaknesses in their systems.
• Leaderboard System: Displays rankings of models based on their resistance to adversarial attacks.
• Benchmarking Metrics: Provides detailed metrics on model performance under various red teaming scenarios.
• Customizable Attacks: Allows users to define and test specific types of adversarial inputs.
• Result Visualization: Offers graphical representations of benchmark results for easier analysis.
• Performance Tracking: Enables tracking of model improvements over time.
• Scenario Customization: Supports testing against real-world and hypothetical adversarial scenarios.
1. What does "red teaming" mean in this context?
Red teaming refers to the process of attacking a system (in this case, an AI model) to test its resistance and identify vulnerabilities.
2. How do I interpret the benchmark results?
Benchmark results show how well your model performs under adversarial conditions. Lower scores indicate weaker resistance, while higher scores suggest better robustness.
3. Can I test custom adversarial scenarios?
Yes, the leaderboard allows users to define and test custom adversarial scenarios, providing flexibility for specific use cases.