Leaderboard
Current rankings of code generation models on SlopCodeBench
Showing 16 of 24 runs
Showing best version by % Checkpoints. Select both Model and Harness to view all versions.
| Model▲ | Harness▲ | Version▲ | Strict Solve▲ | Iso Solve▼ | $/CKPT▲ | Erosion▲ | Verbosity▲ |
|---|---|---|---|---|---|---|---|
| GPT 5.5 (High) | Codex | 0.124.0 | 14.29 | 28.06 | $1.51 | 0.494 | 0.269 |
| GPT 5.3-Codex (High) | Codex | 0.98.0 | 11.22 | 26.02 | $0.69 | 0.644 | 0.336 |
| GPT 5.4 (High) | Codex | 0.110.0 | 10.71 | 23.47 | $0.82 | 0.508 | 0.273 |
| GPT 5.2-Codex (High) | Codex | 0.93.0 | 9.69 | 21.94 | $0.85 | 0.728 | 0.398 |
| Opus 4.6 (High) | Claude Code | 2.1.32 | 9.69 | 20.92 | $3.17 | 0.737 | 0.318 |
| Opus 4.7 (High) | Claude Code | 2.1.111 | 8.16 | 20.92 | $2.17 | 0.759 | 0.357 |
| Kimi K2.6 (High) | Kimi CLI | 1.37.0 | 10.71 | 18.88 | $0.74 | 0.764 | 0.399 |
| Opus 4.5 (High) | Claude Code | 2.0.51 | 9.18 | 17.35 | $2.53 | 0.691 | 0.297 |
| Sonnet 4.6 (High) | Claude Code | 2.1.44 | 7.14 | 16.84 | $1.96 | 0.741 | 0.316 |
| GLM 5.1 (High) | Claude Code | 2.1.44 | 9.69 | 13.78 | $1.47 | 0.684 | 0.322 |
| GPT 5.4-Mini (High) | Codex | 0.110.0 | 5.10 | 13.78 | $0.45 | 0.655 | 0.330 |
| Kimi K2.5 (High) | Kimi CLI | 1.37.0 | 4.59 | 9.69 | $0.33 | 0.712 | 0.309 |
| GLM 5.1 (High) | OpenCode | 1.4.3 | 5.61 | 8.16 | $0.59 | 0.550 | 0.387 |
| GPT 5.3-Codex-Spark (High) | Codex | 0.100.0 | 3.06 | 8.16 | $0.20 | 0.586 | 0.357 |
| Kimi K2.5 (High) | Claude Code | 2.1.44 | 3.57 | 7.14 | $1.07 | 0.692 | 0.310 |
| MiniMax M2.7 (High) | Claude Code | 2.1.44 | 2.55 | 4.08 | $0.33 | 0.500 | 0.265 |
OpenAI
Anthropic
Z-AI
Moonshot
Cursor
MiniMax
Other
View on a larger screen to see the chart
Metric Definitions
CKPT Solved — Checkpoint, and all prior checkpoints, are solved
Isolated Solved — % Passes only the tests for the checkpoint.
Core Solved — just passes the core tests for a checkpoint.
$/CKPT — Average USD cost per checkpoint
Erosion — Fraction of total complexity mass in high-complexity functions (CC > 10), where mass(f) = CC(f) × √SLOC(f). 0 = no high-complexity functions, 1 = all mass in high-CC functions.
Verbosity — Union of AST-Grep flagged lines and clone lines divided by LOC. Bounded [0, 1].
% AST-Grep — Percentage of lines flagged by AST-Grep rules for wasteful code patterns.
% Cloned — Percentage of lines that are structural duplicates (clone lines / LOC).
* Last updated: April 24, 2026