![](/_next/image?url=https%3A%2F%2Fsite-assets.plasmic.app%2F7f4b8d40206cf943746cf7462ca7590c.svg&w=1920&q=80)
Leaderboards
Expert-Driven Private Evaluations
Discover the SEAL LLM Leaderboards for precise and reliable LLM rankings, where leading large language models (LLMs) are evaluated using a rigorous methodology.
Developed by Scale’s Safety, Evaluations, and Alignment Lab (SEAL), these leaderboards utilize private datasets to guarantee fair and uncontaminated results. Regular updates ensure the leaderboard reflects the latest in AI advancements, making it an essential resource for understanding the performance and safety of top LLMs.
Private Datasets
Scale’s proprietary, private evaluation datasets can’t be gamed, ensuring unbiased and uncontaminated results.
Evolving Competition
We periodically update leaderboards with new datasets and models, fostering a dynamic, contest-like environment.
Expert Evaluations
Our evaluations are performed by thoroughly vetted experts using domain specific methodologies, ensuring the highest quality and credibility.
Learn more about our LLM evaluation methodology
Claude 3.5 Sonnet (October 2024)
Gemini 2.0 Pro Experimental (February 2025)
Gemini 2.0 Flash Thinking Experimental (January 2025)
Gemini 2.0 Flash Thinking (January 2025)
Gemini 2.0 Flash Thinking (December 2024)
Gemini 2.0 Flash Experimental (December 2024)
Llama 3.2 90B Vision Instruct
Gemini 2.0 Flash Experimental (December 2024)
Claude 3.5 Sonnet (October 2024)
Claude 3.5 Sonnet (June 2024)
ChatGPT-4o-latest (November 2024)
Gemini 1.5 Pro (August 27, 2024)
Gemini 2.0 Flash Experimental (December 2024)
Gemini 1.5 Pro (November 2024)
Gemini 1.5 Pro (August 27, 2024)
Gemini 1.5 Pro (November 2024)
Gemini 2.0 Flash Experimental (December 2024)
Gemini 2.0 Flash Experimental (December 2024)
Claude 3.5 Sonnet (June 2024)
Gemini 1.5 Pro (November 2024)
Gemini 1.5 Pro (November 2024)
Gemini 1.5 Pro (August 27, 2024)
Claude 3.5 Sonnet (June 2024)
Gemini 2.0 Flash Experimental (December 2024)
Gemini 1.5 Pro (November 2024)
Gemini 1.5 Pro (August 27, 2024)
Gemini 1.5 Pro (August 27, 2024)
Claude 3.5 Sonnet (June 2024)
Claude 3.5 Sonnet (June 2024)
Gemini 1.5 Pro (August 27, 2024)
Gemini 2.0 Flash Experimental (December 2024)
Deprecated (as of January 2025)
Claude 3.5 Sonnet (June 2024)
Deprecated (as of January 2025)
Gemini 2.0 Flash Experimental (December 2024)
Claude 3.5 Sonnet (June 2024)
Deprecated (as of January 2025)
Gemini 1.5 Pro (May 2024)
Claude 3.5 Sonnet (June 2024)
If you’d like to add your model to this leaderboard or a future version, please contact seal@scale.com. To ensure leaderboard integrity, we require that models can only be featured the FIRST TIME when an organization encounters the prompts.