HackerOne for RL Envs. Reviewers get paid to find defects in RL benchmarks: wrong oracles, broken verifiers, leaky tests. Teams get their datasets audited before they ship.
Waitlist only for now. One email when paid audits open, and you get first dibs. Researchers can also contribute free on the open track.
Read the verifier, prove what's broken, write it up. The first reviewer to pinpoint the real root cause takes the bounty. No ML PhD required. Bug bounty, code review, or competitive-judging instincts map cleanly.
Browse the open paid queue. Each task is a single broken-or-not call on a benchmark task: instruction, repo snapshot, and verifier.
Open it in VS Code with our devcontainer. Read the verifier, run it, and submit your verdict with markdown evidence: file, line, before/after.
A validator confirms the root cause. The first reviewer to nail it takes the bounty. Wise payout, weekly.
Whether it's a private dataset you're selling to a frontier lab or an open benchmark you're about to publish, we run it through the same crowd-plus-AI audit first, so the defects surface before your users hit them.
Surface broken verifiers and wrong oracles before launch. Highest payoff on a freshly built benchmark, where the defect rate is steepest.
Selling private data to a lab, or publishing in the open? Works either way. For open releases, ship the audit alongside it as a quality signal.
Get a defect report with root causes and fixes you can cite at release, not a black-box score.
Drop your email and we'll reach out the day before paid audits open, and the waitlist gets first dibs. Have a dataset to audit instead? Talk to us.