← Benchmarks
ARC-AGI-2: A benchmark measuring abstract reasoning through visual grid puzzles requiring rule inference and generalization.
Open benchmark problems affecting this task set.
Problems with a fix underway upstream.
Problems fixed upstream and recorded here.
Health numbers count affected task rows; benchmark-wide problems count once.