Trusting AI-generated code: the harness, not the model
Trusting AI-generated code you didn't read line by line: trust the harness you built, not the model. AI now assists ~42% of committed code.
Part of the guide: AI Agent Harnesses: A Field Guide
On this page
- The last few releases of ralphctl were built, in part, by ralphctl
- You can't read it all, and "read it all" was never the plan
- "Never trust it" and "just trust it" are both wrong
- Trust didn't vanish. It moved.
- The pipe in practice: gates, an evaluator, and signals
- The harness lies too
- Where I still read every line
- The skill now is knowing which layer to distrust
The last few releases of ralphctl were built, in part, by ralphctl
ralphctl is a CLI I wrote to run AI coding agents through a structured sprint. At some point I started using the released version to build the next one. I change something, point a dev build at the work, and let it run. The tool I ship for everyone else now ships itself.
The first time I noticed, it felt like a small magic trick. Then it felt like proof. I let it build itself because I trust it, and I only trust it because of how it's built.
Here's the uncomfortable part. I ship code I didn't read line by line. Not all of it, and not the parts that matter most (more on those later), but far more than I'd have admitted two years ago. And I sleep fine.
My trust moved. It came off the code the agent writes and went onto the harness I built around it. The model is the only non-deterministic line in an otherwise deterministic script. I trust the script.
You can't read it all, and "read it all" was never the plan
AI assistance now touches a lot of the code that reaches main. Sonar's 2026 survey puts
AI-assisted code at around 42% of what developers commit, by the developers' own estimate. Of that
same group, 96% say they don't fully trust that the output is correct, and only 48% always verify it
before committing.1
That gap is the whole problem, and the right reading of it isn't "developers are lazy." Line-by-line review never scaled to this volume. Pretending it does just means the reading gets shallower while everyone tells themselves it didn't.
"Never trust it" and "just trust it" are both wrong
The two loudest answers are both dead ends. "Never trust AI code" throws away the leverage and doesn't survive contact with the volume. "Just trust it" is how unreviewed nonsense reaches production.
The smarter answer, the one I keep seeing from people who actually ship with agents, is "build guardrails, review the spec not the diff." That's right, and it's half the answer. It tells you to move trust to a harness. What it skips is how the harness earns that trust, and the fact that the harness lies too.
Trust didn't vanish. It moved.
Here's the reframe that made it click for me. The AI provider is just another unreliable I/O boundary, like a flaky network call to a service you don't control. You already know how to handle one of those. You don't trust a network call. You wrap it in timeouts, contracts, retries, and tests, and then you trust the wrapper.
Do the same to the model. Make each operation a deterministically callable unit: a script that calls the model through a clear interface and takes a structured result back. It hands that result to the next step and verifies it with executable checks. The model in the middle is stochastic. Everything around it is deterministic, inspectable, and yours. That wrapper is the harness.
I don't trust the AI. I trust the pipe I built around it.
The pipe in practice: gates, an evaluator, and signals
In ralphctl, a sprint is a sequence of deterministic steps, not one big "go" prompt. A
verifyScript gate brackets every task: work starts from a verified-green baseline and has to end
green (typecheck && lint && test), so "the agent finished" and "the code is still green" stay two
separate facts. A generator model produces the work,
a separate evaluator model grades it against explicit criteria, and
structured signals pass between stages through a file-based signals.json contract instead of
vibes.2 Clear interfaces in, clear signals out, executable verification at each seam.
The model never gets to be the thing I trust. It gets to be one step I can re-run, inspect, and gate. That pipeline is one station in a bigger delivery harness that orchestrates the whole path to production, but the trust argument is the same at every step.
There's one more part no vendor's guardrails pitch can offer you: I built this one. I know its contracts and its failure modes because I wrote them. That's the difference between trusting a harness and trusting a black box that happens to be called a harness. You can borrow someone else's, but then the homework is understanding it well enough to know where it'll let you down. The source is on GitHub if you want the shape of it.
The harness lies too
This is the part the "just build guardrails" crowd skips, and it's the honest center of the whole thing: the scaffolding isn't infallible. Trust moved, it didn't disappear.
My own evaluator can be confidently wrong. A weaker evaluator model will sometimes tell a stronger generator to rewrite code that was already correct, and say so with complete conviction. I cap that retry loop on purpose (a bounded turn budget, and a red verdict never blocks the run), because a confident wrong "fix" is a real failure mode, not a hypothetical. Green can mean nothing. Red can be noise.
Green is also necessary, not sufficient. The deterministic checks catch a whole class of failures in seconds, but some confidence only ever comes from manual steps I do myself. The clearest case for me is visual design. I'm a backend developer, not a designer, yet I can tell when something works and still looks wrong. An agent usually hands me a component that renders and passes its tests but needs a human eye before I'd show it to anyone, because no check encodes "this feels right." That's not an AI thing: every piece of software still needs a few manual passes for the final polish that no executable test captures. The harness gets me to a candidate. I get it to shippable.
So I don't trust the harness blindly either. I trust it the way I trust any system I built: knowing exactly where it's weak, and where it ends.
Where I still read every line
So what stays on the no-skim list, no matter how green the checks are? A few things, every single time.
Anything that costs money. Not just payment code in the obvious sense, but anything that drives spend, including how the app uses models: which model a path picks, how many times it calls, and the tests that pin that down. A broken feature fails loudly the first time someone runs it. A loop that quietly calls an expensive model ten times, where one call would do, fails on the invoice weeks later, and the harness reported green the whole way.
Security and authorization. When I'm not a hundred percent sure an auth path actually works, I
check it myself: how roles are evaluated, how the identity provider is wired (Keycloak or
otherwise), how security is configured across the app. This is where "the code runs" and "the code
is correct" sit furthest apart, and no verifyScript catches "this works perfectly and authorizes
the wrong person."
Anything irreversible, data migrations above all. Those get tested locally more than once, but testing isn't really the point. Before a migration reaches production I want to know what's hot: what could fail, how big the blast radius is, and what the plan is when it does. If I've read it, I can write tests for the scary cases up front and walk in prepared, instead of reverse-engineering a 2am incident.
The agent can often do all three. I read these to keep my own model of the system intact, so when something breaks I already know why, and I still know what to expect from my own application. The day I can't say how authorization flows through my app, or what a migration will do under load, is the day I've outsourced something I shouldn't have.
The skill now is knowing which layer to distrust
The new discipline is knowing which layer to distrust, and owning the one you trust. Wrap the stochastic step in a deterministic shell. Build that shell, or, if you borrow one, understand it well enough to name its weak spots.
The models will keep getting better. The wrapping is what stays. I trust ralphctl enough to let it build itself, and I distrust it exactly where I know it's blind. That's not a contradiction. That's the job now.
Footnotes
-
Sonar, Sonar Data Reveals Critical "Verification Gap" in AI Coding: 96% Don't Fully Trust Output, Yet Only 48% Verify It (2026 State of Code Developer Survey; fieldwork Oct 2025, 1,149 developers per the full report). Source of the 96% / 48% trust-versus-verify split and the 42% AI-assisted share-of-commits figure. ↩
-
ralphctl behaviour claims here are drawn from the project's CHANGELOG and npm package (MIT): the file-based
signals.jsoncontract, the generator-evaluator defaults (Claude Opus 4.8 generator, OpenAI Codex / GPT-5.5 evaluator), and the bounded, non-blocking evaluator retry. ↩
Frequently asked questions
Should you read every line of AI-generated code before merging?
Not all of it, and reading everything stops scaling once an agent writes most of your code. The workable rule is to read every line of the high-blast-radius parts (auth, security, data migrations, anything that costs money), trust executable verification for the rest, and keep a few manual confirmation steps for the final shippable-to-a-real-user polish.
Why does trust belong in the harness instead of the model?
The harness is everything around the model: the scripts, interfaces, signals, and checks that turn a stochastic model call into a repeatable, verifiable pipeline. Those parts are deterministic, inspectable, and yours, so they can earn trust the way any system you built does. The model stays one step you can re-run, inspect, and gate.
How can you trust code an AI wrote?
You don't trust the model. You wrap it like any unreliable I/O boundary: a deterministic step calls the model through a clear interface, takes a structured result, hands it to the next step, and verifies it with executable checks. You trust the deterministic shell, not the stochastic core.
Resources
Enjoyed this article?
Stay in the Loop
Get notified when I publish new articles. No spam, unsubscribe anytime.