arc_agi_2

ARC-AGI-2: A benchmark measuring abstract reasoning through visual grid puzzles requiring rule inference and generalization.

defect now

Open benchmark problems affecting this task set.

fixing now

Problems with a fix underway upstream.

total fixed

Problems fixed upstream and recorded here.

Health numbers count affected task rows; benchmark-wide problems count once.

Maintain this benchmark? Claim benchmark →

Preparing benchmark defect board...