Loading video...

AI LLM Leaderboard

Top AI LLM Leaderboard

An interactive ranking of leading Large Language Models in the race to AGI.

Model Overall Reasoning Coding Math Speed (t/s) Price ($/1M)

Intelligence Report: The AGI Race

Inspired by independent analysis from ArtificialAnalysis.ai, this section provides insight into the benchmarks defining the frontier of AI.

Key Intelligence Benchmarks

MMLU-Pro

Tests broad, expert-level knowledge.

GPQA Diamond

Evaluates graduate-level science questions.

LiveCodeBench

Measures real-world coding capabilities.

AIME & MATH

Assesses advanced mathematical reasoning.

Frontier Model Intelligence Over Time

Late 2022

The Spark

Early models set the stage for the AGI race.

2023

The Leap

Models like GPT-4 demonstrate huge leaps in reasoning.

2024

The Acceleration

Rapid releases from all major labs push performance.

2025 & Beyond

The Frontier

The race intensifies, approaching human-expert performance.

Model Comparison

Intelligence Index: Higher is better

Output Tokens per Second: Higher is better

USD per 1M Tokens: Lower is better

Data synthesized from public benchmarks and reports. Click on column headers to sort.

Leave a Comment