Compare LLMs by role stability
Classify text into categories
Find the best matching text for a query
Provide feedback on text content
G2P
Detect harms and risks with Granite Guardian 3.1 8B
Predict song genres from lyrics
eRAG-Election: AI กกต. สนับสนุนความรู้การเลือกตั้ง ฯลฯ
Identify named entities in text
Track, rank and evaluate open Arabic LLMs and chatbots
Encode and decode Hindi text using BPE
Compare different tokenizers in char-level and byte-level.
Find collocations for a word in specified part of speech
Stick To Your Role! Leaderboard is a tool designed to compare and evaluate Large Language Models (LLMs) based on their ability to maintain role consistency. It provides insights into how well different models adhere to their assigned roles during interactions, helping users understand their strengths and weaknesses in contextual tasks.
• Role Stability Score: Measures how consistently an LLM stays in character and follows its assigned role.
• Model Comparison: Allows side-by-side comparison of multiple models to evaluate performance differences.
• Interactive Charts: Visualize performance trends and benchmarks across various tasks and scenarios.
• Customizable Parameters: Adjust evaluation criteria to focus on specific aspects of role adherence.
• Real-Time Updates: Stay informed with the latest data as new models and updates are released.
What is role stability in the context of LLMs?
Role stability refers to how consistently an LLM maintains its assigned role or task during interactions, avoiding deviations or misalignments.
How does the leaderboard determine the rankings?
Rankings are based on the role stability score, which is calculated through systematic testing and evaluation of each model's performance in adhering to its assigned roles.
Can I customize the evaluation criteria?
Yes, the leaderboard allows users to adjust parameters to focus on specific roles or tasks, providing more relevant insights for their use case.