
Project Glasswing made a lot of noise last couple of weeks and honestly, it deserves the attention. It showed very clearly what AI is now capable of when it comes to finding vulnerabilities. Not just assisting, not just speeding things up, but actually operating at a level where it can uncover issues across complex systems in ways that start to rival human expertise. That’s not incremental progress. It’s a shift.
At the same time, none of this should feel surprising. If you’ve been watching how AI has been evolving, this was always the direction. More data, more scale, better pattern recognition, applied to one of the most pattern-heavy problems in engineering. Of course it’s going to get good at finding things. That part was inevitable.
Most of the conversation right now is focused on that capability. People are excited about how much more AI can find, how quickly it can scan large environments, and how it can surface issues that have been sitting unnoticed for years. All of that is true, and it matters. But it’s also only one part of the equation, and not even the hardest part.
The reality is that security teams have never really struggled with finding problems. They struggle with fixing them. Backlogs don’t exist because detection is weak. They exist because turning a finding into an actual fix is slow, manual, and often risky. Someone has to understand the issue, write the change, validate it, and then get it into production without breaking something else. That process is where things stall.
Glasswing makes the first half faster. It does not meaningfully change the second half. And that’s where things start to get uncomfortable, because if AI can now generate findings at scale, you don’t automatically become more secure. You just end up with more work. In many cases, a lot more work. If your remediation process stays the same, you’ve essentially built a faster way to accumulate backlog.
The natural next step everyone is already thinking about is simple. If AI can find issues, we’ll just use AI to fix them. Tools like Claude Code get pulled in, you feed them findings, and expect clean fixes back. On paper, that sounds like the loop is closed.
In reality, it gets messy fast.
Every fix costs tokens. Every iteration costs more tokens. You don’t get a clean answer the first time, so you re-prompt, tweak context, add constraints, try again. What should have been a quick fix turns into multiple cycles of generation and review. Now multiply that across hundreds or thousands of findings, and you start to see the problem. Costs go up, but more importantly, time and attention go up with it.
And even after all that, you still don’t fully trust the output.
Because the issue isn’t just cost. It’s consistency. One run gives you something usable, the next gives you something slightly off. It might fix the vulnerability but ignore internal policy. It might work in isolation but not in your actual environment. So engineers still have to step in, validate, adjust, and sometimes rewrite entirely.
You didn’t remove the bottleneck. You just moved it.
So now you’re in this strange place where AI is finding more issues than ever, generating fixes for them, and yet teams are still stuck reviewing, debugging, and second-guessing those fixes at scale. The backlog doesn’t disappear. It just evolves.
This is where the gap shows up.
Most of these systems are probabilistic. They give you a likely answer, not a guaranteed one. That’s fine for writing code or brainstorming ideas. It doesn’t hold up when you’re making changes that need to be correct every single time, especially in production environments.
What’s missing is determinism.
The ability to take a finding and produce a fix that is consistent, policy-aligned, and actually ready to be applied without a human having to babysit the process. The same input should lead to the same output, every time. That’s not how most AI systems work today.
Project Glasswing shows us that discovery is no longer the bottleneck. Tools like Claude Code show that generation is getting better. But neither solves the part that actually matters, which is executing fixes in a way that teams can trust without burning time, money, and attention in the process.
Because in the end, it’s not about how much you can find.
It’s about what actually gets fixed.


