SlopCodeBench

Community driven benchmark for measuring code erosion under iterative specification refinement.

Featured Problems

Overview

SlopCodeBench (SCBench) is a benchmark designed to evaluate coding agents the way real software actually gets built: through repeated requirement changes and extensions. Instead of treating the spec as a one-shot oracle, each task is a sequence of checkpoints where an agent implements an initial version, then extends its own solution multiple times as new requirements arrive. The v0.2 release includes 20 novel problems with 3-8 checkpoints each, evaluated in a black-box setting where only a CLI or API contract is given - no prescribed architecture, function signatures, or module boundaries - so early design decisions can meaningfully help or hurt later work.

Supported by

Special thanks to Snorkel AI for supporting this work through the Open Benchmarks Grant.

DARPANational Science FoundationSnorkel AISnorkel AI