Waitlist open

Bounty hunting, for AI benchmarks.

HackerOne for RL Envs. Reviewers get paid to find defects in RL benchmarks: wrong oracles, broken verifiers, leaky tests. Teams get their datasets audited before they ship.

Join as a reviewer Get your dataset audited

Waitlist only for now. One email when paid audits open, and you get first dibs. Researchers can also contribute free on the open track.

For reviewers

Get paid to find defects.

Read the verifier, prove what's broken, write it up. The first reviewer to pinpoint the real root cause takes the bounty. No ML PhD required. Bug bounty, code review, or competitive-judging instincts map cleanly.

01
Pick a task
Browse the open paid queue. Each task is a single broken-or-not call on a benchmark task: instruction, repo snapshot, and verifier.
02
Audit independently
Open it in VS Code with our devcontainer. Read the verifier, run it, and submit your verdict with markdown evidence: file, line, before/after.
03
Get paid
A validator confirms the root cause. The first reviewer to nail it takes the bounty. Wise payout, weekly.

Join the reviewer waitlist

For teams shipping benchmarks

Audit your dataset before you ship.

Whether it's a private dataset you're selling to a frontier lab or an open benchmark you're about to publish, we run it through the same crowd-plus-AI audit first, so the defects surface before your users hit them.

Catch defects pre-release

Surface broken verifiers and wrong oracles before launch. Highest payoff on a freshly built benchmark, where the defect rate is steepest.

Private or open

Selling private data to a lab, or publishing in the open? Works either way. For open releases, ship the audit alongside it as a quality signal.

A report you can stand behind

Get a defect report with root causes and fixes you can cite at release, not a black-box score.

Talk to us about an audit

Get on the list before paid audits open.

Drop your email and we'll reach out the day before paid audits open, and the waitlist gets first dibs. Have a dataset to audit instead? Talk to us.

Bounty hunting, for AI benchmarks.

Get paid to find defects.

Pick a task

Audit independently

Get paid

Audit your dataset before you ship.

Get on the list before paid audits open.