Leaderboard

Current rankings of code generation models on SlopCodeBench

Showing 8 of 14 runs

Agent/Model▲	Version▲	Thinking▲	% Checkpoints▼	$/Checkpoint▲	Erosion▲	Verbosity▲
Claude Code/Opus 4.5	2.0.51	High	10.75	$4.56	0.437	0.905
Codex/GPT 5.1-Codex-Max	0.65.0	High	10.75	$2.86	0.402	0.768
Codex/GPT 5.2	0.71.0	High	10.75	$4.55	0.425	0.765
Codex/GPT 5.2-Codex	0.74.0	High	10.75	$3.09	0.348	0.777
Claude Code/Opus 4.5	2.0.62	High	8.60	$2.63	0.408	0.905
Claude Code/Opus 4.5	2.0.75	High	7.53	$2.69	0.457	0.911
Claude Code/Sonnet 4.5	2.0.51	High	5.38	$1.49	0.490	0.749
Claude Code/GLM 4.7	2.0.75	High	4.30	$1.61	0.443	0.758

Color:

GLM 4.7

GPT 5.1-Codex-Max

GPT 5.2

GPT 5.2-Codex

Opus 4.5

Sonnet 4.5

View on a larger screen to see the chart

Metric Definitions

% Solved — Percentage of problems solved

% Checkpoints — Percentage of checkpoints solved

Erosion — Percentage of functions with >10 Cyclomatic Complexity + lint errors per line of code. Functions with CC>30 counted twice.

Verbosity — Count of AST Grep rules or LLM-as-Judge flagged spans compared to logical lines of code

* Last updated: January 11, 2026