Arc Intelligence

Make capability measurable — with real data-science discipline.

Who it is for

For people who can measure what others assume — benchmarking, agent and model diagnosis, judgement pipelines. You work close to frontier AI, use Arc's evaluation substrate (Lancet), and run a full professional data-science process end to end — collect, validate, preprocess, EDA, pipeline, test, eval — often producing a corpus, benchmark, or dataset that resists gaming.

What it trains

Running a full data-science process: collect → validate → preprocess → EDA → pipeline → test → eval
Designing benchmarks that resist gaming
Diagnosing where a system actually fails
Working with Arc's evaluation substrate (Lancet)

Example missions

Design a benchmark for a capability that lacks one
Build an agent-failure diagnosis harness
Stand up a judgement pipeline with quality controls
Audit a system's claims against measured evidence

What you leave with

A benchmark, corpus, or dataset
A diagnosis others can trust
An evaluation pipeline

How a mission works

Arc shows you a few real projects it judges you ready for, and you choose the one that draws you. Then it is mission-based and asynchronous — a clear brief in, a concrete artifact out; you investigate, decide, and return with evidence, and Arc evaluates the outcome, not the motion. Expect the start to be hard — unfamiliar tools, an unfamiliar problem space; that crossing is the point.

What it is not

Not a course or a bootcamp — the work is real, and harder
Not employment, salary, a title, or a guaranteed role — a cultivation path, not a job
Not metric theatre

Selection

Recognised through real work, by invitation — not an application
Rigour under ambiguity; you measure honestly, including failure

Evaluation