Find what is real, what is failing, and what must be built before deployment. A short, low-risk way in — Arc reads an existing AI system the way it reads its own, and tells you the truth about it.

Field Practice
Window2–4 weeks
CommitmentLow-risk · scoped
What you getThe truth, not a build

WHAT IS REAL?

A demo tells you almost nothing.

A demo can be impressive and still be false; a pipeline can run and still be untrustworthy; an agent can finish a task and still be unsafe to delegate to.

Classical software is deterministic — if it works once, it works, and you can test it to the edges. A language model is not: the same question can draw a different answer, and the failure does not announce itself. It hides inside a fluent, confident sentence — and almost no one, however capable, can see the gap by looking, because the gap is, by construction, invisible.

Most AI systems fail quietly. They do not crash — they answer fluently, retrieve something plausible, pass a demo, and move through a workflow as if everything were fine. The Diagnostic does not take a working demo for proof: it inspects the substrate, the evaluation loop, the evidence trail, the agent boundary, and the governance structure — until you know what is real.

Your AI system may appear to work. Arc finds out whether it is true.

WHO IT'S FOR

A system that looks like it works.

You have something built — it demos, it runs — but you cannot prove it can be trusted in production. The Diagnostic is for any of these:

  • An enterprise RAG or chatbot demo
  • An agent or multi-step workflow
  • An AI-coding workflow across your repos
  • A document-intelligence system
  • An internal knowledge assistant
  • An AI product before launch
  • A regulated or high-stakes AI workflow

Not yet built, and weighing whether to? That is a Feasibility Study — judgement on whether to build, before you spend on demos and tools. (In preparation.) The Diagnostic reads what already exists; the Feasibility Study judges what does not yet.

THE SHAPE

Two to four weeks, seven readings.

  1. 01

    System intake

    Understand the workflow, users, data, models, tools, and the claims being made about it.

  2. 02

    Failure reproduction

    Find where the system fails quietly — the confident wrong answer, the silent degradation.

  3. 03

    Boundary mapping

    Separate what is deterministic from what is non-deterministic, and where the line must be engineered.

  4. 04

    Substrate review

    Inspect documents, retrieval units, evidence, and traceability — is the ground addressable and citable?

  5. 05

    Evaluation-gap analysis

    Review or design the eval set, regression tests, and the domain-oracle the judge actually needs.

  6. 06

    Governance review

    Permissions, human review, audit trail, responsibility boundaries — trust by construction or cleanup?

  7. 07

    Deployment roadmap

    What to fix, what to build, and what must not be deployed yet.

WHAT YOU GET

A report you can act on.

  • Failure map
  • Risk register
  • Architecture review
  • Eval-gap report
  • Governance boundary notes
  • Recommended intervention plan
  • Go / no-go / rebuild recommendation

APPLY

Apply for an AI System Diagnostic.

Engagements are selective and problem-led. Write to us with a concise account of the system and where you suspect it is failing. ArcSoft Pty Ltd signs and holds the commercial terms.