Leaderboard
Current rankings of code generation models on SlopCodeBench
| Agent/Model | Thinking | % Solved | % Checkpoints Solved | $ / Checkpoint | Erosion Mean | Verbosity Mean |
|---|---|---|---|---|---|---|
| Claude Code/Opus 4.5 | High | 0 | 12.5 | 7.18 | 0.387 | 0.968 |
| Claude Code/Opus 4.5 | Low | 0 | 11.25 | 6.47 | 0.423 | 0.877 |
| Claude Code/Opus 4.5 | None | 0 | 11.25 | 8.07 | 0.411 | 0.889 |
| Codex/GPT 5.1-Codex-Max | Low | 0 | 10 | 1.30 | 0.408 | 0.958 |
| Codex/GPT 5.1-Codex-Max | None | 0 | 10 | 1.89 | 0.399 | 0.921 |
| Codex/GPT 5.1-Codex-Max | high | 0 | 8.75 | 2.74 | 0.383 | 0.855 |
| Codex/GPT 5.2 | high | 0 | 7.5 | 4.11 | 0.470 | 0.829 |
| Codex/GPT 5.2 | Low | 0 | 7.5 | 1.79 | 0.408 | 0.828 |
| Codex/GPT 5.2 | None | 0 | 6.25 | 1.49 | 0.490 | 0.884 |
* Last updated: December 17, 2025