← Benchmarks
DeepSWE — agentic SWE benchmark by Datacurve.
Open benchmark problems affecting this task set.
Problems with a fix underway upstream.
Problems fixed upstream and recorded here.
Health numbers count affected task rows; benchmark-wide problems count once.