Blog
Cloud Control

What Governance Means When AI Writes The Code

June 4, 2026
What Governance Means When AI Writes The Code
4
min read

I've been in security long enough to remember when finding the problem was the hard part. You had to look in the right place, build the right rule, tune out the noise.

That problem is solved. AI finds everything now.

My team uses Claude Code. The speed at which you can generate working code today is real, and it keeps getting better.

But there’s a question underneath every AI-generated fix is whether we can trust it. Because working and governed are not the same thing.

When a developer writes a fix manually, there's a chain of accountability behind it. They know the policy and the reason behind the change. They can defend it in an audit. The PR review is a checkpoint, not a guessing game.

When AI writes the fix, none of that chain exists by default. 

The code might look right and run right. But does it align to your security standards? If you apply it twice, does it do the same thing twice? Is there a record of why the change was made?

Most of the time, the answer is no.

That's not a criticism of the tools. It's not their job. Their job is generation. Governance is a different problem entirely.

Every team evaluating AI for code remediation should be asking the same five questions.

  1. Is the output idempotent? Apply the fix twice and see what happens. Idempotency is not a nice-to-have. It's the baseline for trusting automation in production.
  2. Is every fix tied to the policy that triggered it, or is it a best guess based on the prompt?
  3. Does it align to a standard? Not approximately. Exactly. Your auditor will ask.
  4. Can it be reproduced? If you run the same scenario tomorrow, do you get the same fix?
  5. Is there an audit trail that holds up in a real review? Not a commit message. A record of what was found, what policy applied, what changed, and why.

Most AI coding tools will fumble at least three of these. That's not a flaw. It's a scope issue. They weren't built for this. 

The layer above them was.

Last month we ran 15 production cloud scenarios through Gomboc. Forty-three merge-ready fixes. Twelve minutes. Seventeen dollars in tokens.

Some scenarios were straightforward. An S3 bucket without encryption. An SSH port open to the internet.

Some were not.

Three AWS accounts running independent CloudTrail configurations, each capturing the same management events into separate buckets. Two thousand and fifty dollars a month in unnecessary spend. The fix is obvious in theory. In practice, most teams know exactly what to do and still haven't done it. Gomboc wrote the fix, tested it, and made it ready to merge.

Every fix is idempotent, has a policy reference and generates an audit trail. This combination is what governance looks like when AI writes the code.

AI coding tools are accelerating output. Developers are merging more code than ever. The bottleneck didn't disappear. It moved from writing to reviewing. And every team manually validating AI output before it ships is losing the speed advantage they bought the tool for.

Governance is not about slowing AI down. It's about making AI output trustworthy enough that you stop slowing it down manually every time it ships. 

We published the benchmark to show the numbers. The real point was to raise the bar. The era of trusting AI output on instinct is over. 

The scenarios are at https://github.com/Gomboc-AI/rattleback/tree/main/scenarios. The full methodology is at https://www.gomboc.ai/show-your-work. Run them. Check the work. Tell us where we're wrong. That's the point.