Persian Text Embedding Benchmark
Convert PaddleOCR models to ONNX format
Compare and rank LLMs using benchmark scores
Request model evaluation on COCO val 2017 dataset
Evaluate LLM over-refusal rates with OR-Bench
Convert Hugging Face model repo to Safetensors
View and submit LLM benchmark evaluations
Calculate memory needed to train AI models
Generate leaderboard comparing DNA models
Convert and upload model files for Stable Diffusion
Display benchmark results
Benchmark LLMs in accuracy and translation across languages
Measure over-refusal in LLMs using OR-Bench
The PTEB Leaderboard is a benchmarking platform designed to evaluate and compare the performance of Persian text embedding models. It provides a comprehensive framework for assessing how well these models handle Persian language tasks, making it an essential tool for researchers and developers in the NLP community. The leaderboard allows users to view and analyze the results of various models across different metrics and datasets.
• Comprehensive Benchmarking: Evaluates models on multiple Persian language tasks and datasets.
• Model Comparison: Enables side-by-side comparison of different embedding models.
• Customizable Metrics: Supports a variety of evaluation metrics tailored for Persian text.
• Interactive Visualizations: Presents results in easy-to-understand charts and graphs.
• Regular Updates: Maintains up-to-date results as new models are released.
What is the purpose of the PTEB Leaderboard?
The PTEB Leaderboard is designed to provide standardized benchmarks for Persian text embedding models, helping researchers and developers identify top-performing models for their specific use cases.
Can I add my own model to the leaderboard?
Yes, the PTEB Leaderboard allows submissions of new models. Visit the official documentation for guidelines on how to prepare and submit your model for evaluation.
How often are the benchmarks updated?
The benchmarks are updated regularly as new models are released and existing models are fine-tuned. Follow the leaderboard for the latest updates and improvements.