There's a narrative circulating in tech right now: AI will replace software engineers. The evidence looks compelling. AI can generate a REST API in thirty seconds. It can refactor a messy module into something readable. It writes tests, documents functions, explains legacy codebases. The output is often genuinely good.
But I've been watching what happens when teams actually remove humans from the review process. The results are instructive.
The Problem With AI-Generated Code
AI code generators are trained on existing code. They know what correct code looks like — syntactically, stylistically, even idiomatically. What they don't know is your specific business context, the quirks of your production environment, or the implicit contracts your team maintains through code review.
I've seen AI-generated authentication logic that was technically correct and catastrophically wrong. The code passed every lint check and looked clean. But it used a timing-safe comparison in the wrong context, which created a side-channel leak. A human reviewer who knew the system's threat model caught it in ninety seconds.
AI doesn't know your threat model. It doesn't know which decisions are load-bearing and which are incidental. Code review is really about understanding intent — yours, and the code's.
What AI Review Gets Wrong
Current AI code review tools fall into a few predictable failure modes. The first is false confidence. AI is very good at sounding authoritative about incorrect conclusions. A junior developer who thinks they might be wrong will ask questions. An AI that is wrong will tell you the code is fine with complete certainty.
The second failure mode is context blindness. AI can read your diff but it can't read your codebase the way a human who has worked there for two years can. It doesn't know that this service is scheduled for deprecation, or that this variable name was deliberately chosen to match a regulatory requirement, or that this pattern exists because of a bug that cost the company $200,000 last year.
The third is systemic thinking. Code review isn't just about whether individual lines are correct. It's about whether the change fits the architecture, whether it creates dependencies that will cause problems later, whether the error handling strategy is consistent with the rest of the system.
The Harness Is Still Human
Here's what I think is actually happening: AI is becoming genuinely good at the mechanical parts of coding. The syntax, the boilerplate, the common patterns. What's not changing is the reasoning layer — the part where humans decide what to build, why to build it, and what tradeoffs to accept.
Code review is fundamentally a reasoning task, not a pattern-matching task. The reviewer is asking: does this change move the system toward or away from where it needs to be? Does it reflect good judgment about the unknowns?
That judgment takes context. It takes institutional knowledge. It takes understanding the people who wrote the code and what they were likely thinking. These are things that don't transfer to a model, no matter how large.
Key Takeaways
- AI code generators produce syntactically correct code that can be semantically wrong in subtle, dangerous ways
- AI review tools lack context about business logic, threat models, and team conventions
- Code review is a reasoning task — checking intent against implementation — not a pattern-matching exercise
- The teams getting the most from AI are using it to augment human review, not replace it
The honest answer is that AI changes what code review looks for, not whether you need a human doing it. You're still going to need someone in the loop who understands the system and is willing to be wrong. The machine just helps them look at more code in the same amount of time.