Leaderboard
Current rankings of code generation models on SlopCodeBench
Showing 11 of 20 runs
Showing best version by % Checkpoints. Select both Model and Harness to view all versions.
| Model▲ | Harness▲ | Version▲ | Isolated Solved▼ | $/CKPT▲ | Erosion▲ | Verbosity▲ |
|---|---|---|---|---|---|---|
| GPT 5.3-Codex (High) | Codex | 0.98.0 | 23.66 | $3.14 | 0.678 | 0.844 |
| Opus 4.6 (High) | Claude Code | 2.1.32 | 21.51 | $3.47 | 0.647 | 0.965 |
| GPT 5.4 (High) | Codex | 0.110.0 | 20.43 | $3.27 | 0.643 | 0.682 |
| GPT 5.2 (High) | Codex | 0.71.0 | 19.35 | $4.55 | 0.671 | 0.765 |
| Opus 4.5 (High) | Claude Code | 2.0.51 | 17.39 | $2.64 | 0.642 | 0.932 |
| GPT 5.1-Codex-Max (High) | Codex | 0.65.0 | 17.20 | $2.86 | 0.656 | 0.768 |
| GPT 5.2-Codex (High) | Codex | 0.74.0 | 16.13 | $3.21 | 0.673 | 0.723 |
| Sonnet 4.6 (High) | Claude Code | 2.1.44 | 16.13 | $1.92 | 0.609 | 0.953 |
| Sonnet 4.5 (High) | Claude Code | 2.0.76 | 12.90 | $1.54 | 0.627 | 0.909 |
| GPT 5.3-Codex-Spark (High) | Codex | 0.100.0 | 12.90 | $0.91 | 0.564 | 0.819 |
| GLM 4.7 (High) | Claude Code | 2.0.75 | 9.68 | $1.61 | 0.626 | 0.758 |
OpenAI
Anthropic
Z-AI
Other
View on a larger screen to see the chart
Metric Definitions
% Solved — Percentage of problems solved
CKPT Solved — Checkpoint, and all prior checkpoints, are solved
Isolated Solved — % Passes only the tests for the checkpoint.
Core Solved — just passes the core tests for a checkpoint.
$/CKPT — Average USD cost per checkpoint
Erosion — Gini coefficient of complexity mass distribution across functions. 0 = uniform complexity, 1 = all complexity concentrated in one function.
Verbosity — Count of AST Grep rules or LLM-as-Judge flagged spans compared to logical lines of code
Clone Lines / 1K LOC — at final checkpoint
Complexity Concentration — Function gini coef at final checkpoint
* Last updated: March 6, 2026