Find what is real, what is failing, and what must be built before deployment. A short, low-risk way in — Arc reads an existing AI system the way it reads its own, and tells you the truth about it.
WHAT IS REAL?
A demo tells you almost nothing.
A demo can be impressive and still be false; a pipeline can run and still be untrustworthy; an agent can finish a task and still be unsafe to delegate to.
Classical software is deterministic — if it works once, it works, and you can test it to the edges. A language model is not: the same question can draw a different answer, and the failure does not announce itself. It hides inside a fluent, confident sentence — and almost no one, however capable, can see the gap by looking, because the gap is, by construction, invisible.
Most AI systems fail quietly. They do not crash — they answer fluently, retrieve something plausible, pass a demo, and move through a workflow as if everything were fine. The Diagnostic does not take a working demo for proof: it inspects the substrate, the evaluation loop, the evidence trail, the agent boundary, and the governance structure — until you know what is real.
Your AI system may appear to work. Arc finds out whether it is true.
WHO IT'S FOR
A system that looks like it works.
You have something built — it demos, it runs — but you cannot prove it can be trusted in production. The Diagnostic is for any of these:
- An enterprise RAG or chatbot demo
- An agent or multi-step workflow
- An AI-coding workflow across your repos
- A document-intelligence system
- An internal knowledge assistant
- An AI product before launch
- A regulated or high-stakes AI workflow
Not yet built, and weighing whether to? That is a Feasibility Study — judgement on whether to build, before you spend on demos and tools. (In preparation.) The Diagnostic reads what already exists; the Feasibility Study judges what does not yet.
THE SHAPE
Two to four weeks, seven readings.
- 01
System intake
Understand the workflow, users, data, models, tools, and the claims being made about it.
- 02
Failure reproduction
Find where the system fails quietly — the confident wrong answer, the silent degradation.
- 03
Boundary mapping
Separate what is deterministic from what is non-deterministic, and where the line must be engineered.
- 04
Substrate review
Inspect documents, retrieval units, evidence, and traceability — is the ground addressable and citable?
- 05
Evaluation-gap analysis
Review or design the eval set, regression tests, and the domain-oracle the judge actually needs.
- 06
Governance review
Permissions, human review, audit trail, responsibility boundaries — trust by construction or cleanup?
- 07
Deployment roadmap
What to fix, what to build, and what must not be deployed yet.
WHAT YOU GET
A report you can act on.
- Failure map
- Risk register
- Architecture review
- Eval-gap report
- Governance boundary notes
- Recommended intervention plan
- Go / no-go / rebuild recommendation
APPLY
Apply for an AI System Diagnostic.
Engagements are selective and problem-led. Write to us with a concise account of the system and where you suspect it is failing. ArcSoft Pty Ltd signs and holds the commercial terms.