The ‘ship it fast’ myth dominating AI coding culture
A recognisable anti-pattern has spread across engineering teams adopting AI coding tools: generate barely-passable code, open a massive pull request, skip thorough review, merge. Call it “slop and merge.” It treats LLMs as quantity machines — devices whose sole purpose is to maximise lines of code produced per hour — and the assumption is now widespread enough to shape how entire organisations build software.
The logic feels intuitive on the surface. AI tools accelerate output, faster output means faster shipping, faster shipping wins. So teams optimise for velocity, and quality becomes a casualty that shows up later as technical debt, production bugs, and review queues no one has the bandwidth to clear properly.
This is a category error. LLMs are flexible tools. The same model that spews out unreviewed boilerplate can be directed to interrogate that boilerplate, hunt down edge cases, and flag problems a time-pressured engineer would miss. The capability is identical — what changes is the intent behind the prompt.
The “ship it fast” framing also misreads what speed actually means at the team level. A bloated, unvetted PR that introduces three subtle bugs does not accelerate delivery. It creates a debt payment that arrives with interest: a production incident, a late-night rollback, a week of debugging work that erases whatever time the AI supposedly saved. Velocity measured at the moment of merge is not the same as velocity measured at the moment of working, reliable software in users’ hands.
The practical consequences are already visible. Codebases are accumulating AI-generated code that nobody fully understands, reviewed too quickly to catch the assumptions baked into it. PR sizes have grown. Review quality has dropped. The tools designed to help engineers do better work are being used, systematically, to do more work — regardless of whether that work is good.
What most coverage gets wrong: LLMs are flexible, not fast-by-default
The dominant narrative around AI coding tools collapses into a single promise: ship faster. Benchmarks celebrate lines generated per minute. Blog posts count how many PRs a solo developer can open in a week. This framing treats LLMs as speed infrastructure, and that assumption is doing real damage to codebases.
Here is what that narrative misses: LLMs carry no intrinsic preference for speed or sloppiness. The model that generates unreviewed boilerplate in thirty seconds is the same model that can methodically stress-test an architecture, enumerate edge cases, or tear apart a pull request with the rigor of a senior engineer. The output reflects the intent of the person prompting, not some hardwired bias toward volume.
The “slop cannon” use case — spewing barely-passable code, opening massive PRs, merging unvetted — is a human workflow choice dressed up as a technical constraint. Teams that build around it are not discovering what AI is good at. They are discovering what happens when you optimize a flexible tool for exactly one low-quality outcome.
The same flexibility cuts the other way. LLM agents deployed against a codebase for bug detection don’t find one or two issues — they surface so many that teams struggle to prioritize the queue. That is not a velocity story. That is a quality story, and it requires slower, more deliberate work to act on.
Framing matters because it shapes process design. Teams that treat AI as a pure velocity tool build workflows that skip review, compress testing, and defer architectural thinking. Teams that treat the same model as a rigorous thinking partner build workflows that use it to ask harder questions earlier. The code that comes out of those two environments is not remotely comparable, and the gap compounds over time.
The mainstream coverage is not wrong that AI can accelerate development. It is wrong to stop there, as if speed were the ceiling rather than one dial among many.
What ‘slower and better’ actually looks like in practice
The “slower and better” workflow starts before you accept a single line of AI-generated code. When an AI suggests an implementation, the first move isn’t to paste it in — it’s to ask the model to critique its own output. What are the edge cases this misses? What’s a simpler alternative approach? Where does this break under load or malformed input? That interrogation step alone filters out a significant share of the subtle bugs that make it into codebases when developers treat AI output as finished work.
This is where LLMs actually earn their reputation as bug-finders. Projects like Mythos have demonstrated that when AI agents are pointed at a codebase with a mandate to find problems rather than produce output, they surface bugs faster and in higher volume than most human review cycles. The same capability exists inside a single developer’s workflow — you just have to ask for it explicitly instead of defaulting to generation mode.
Commit size changes in this workflow. Instead of opening a 400-line PR that an AI helped assemble in 20 minutes, the output is smaller, more deliberate commits where each piece of logic has been reasoned through out loud, often in the chat thread itself. The AI assists with documentation, explains the reasoning behind decisions, and flags where assumptions are being made. That reasoning becomes part of the record, not a private mental note that disappears when the ticket closes.
The mental model that makes this work: treat the AI as a senior reviewer and a rubber duck at the same time. It generates a proposal, then immediately challenges it. You stay in the loop not as a passive approver but as the engineer who decides which trade-offs matter. The AI handles the cognitive load of holding multiple approaches in memory simultaneously. You handle judgment.
This isn’t slower in the way that’s painful. It’s slower in the way that careful work is slower — the kind that doesn’t generate a 3 a.m. incident two weeks after merge.
Why this approach is harder — and why that’s the point
Choosing to slow down with AI costs something real: cognitive effort. Rubber-stamping generated output is easy. Evaluating it — questioning the architecture, stress-testing the logic, deciding what to keep and what to throw away — demands active judgment. Developers who do this work are effectively doing two jobs at once: writing software and auditing a collaborator who never gets tired, never admits uncertainty, and never tells you when it’s confidently wrong.
That difficulty is structural, not personal. Sprint velocity metrics reward merged pull requests, not the quality of what’s inside them. Delivery pressure pushes teams toward the fast-slop pattern because the feedback loop on shoddy code is slow. A bad abstraction merged in Q1 becomes a maintenance nightmare in Q3, long after the sprint retrospective has moved on. The incentive system doesn’t penalize the merge; it penalizes the engineer stuck fixing it months later.
This makes the quality-first approach a managerial and cultural problem as much as a technical one. A team can have excellent individual developers who still default to speed because that’s what gets rewarded at standup. Changing that pattern requires managers to explicitly value code review depth, treat AI output as a draft rather than a deliverable, and resist treating story-point throughput as a proxy for engineering health.
The developers resisting the speed temptation are building something that won’t show up in any sprint dashboard: a compounding advantage. Maintainable codebases are cheaper to extend and easier to audit. Bugs caught during generation cost nothing to fix; bugs caught in production cost multiples of that. Teams using LLMs to find bugs before shipping — running agents repeatedly against a codebase to surface issues — are banking that future reliability now, invisibly.
The difficulty is the point. Any team can generate code fast. Fewer can generate code that holds up. The gap between those two outcomes is where engineering judgment lives, and AI doesn’t close that gap — it just makes the consequences of ignoring it arrive faster.
The bigger picture: redefining what AI productivity means
The productivity conversation around AI coding tools runs on the wrong fuel. Teams measure success in pull requests merged, tickets closed, and lines of code shipped per sprint. Those metrics don’t just fail to capture quality — they actively punish it. A developer who uses an LLM to find and eliminate latent bugs before merging looks slower than a teammate who dumps unvetted, barely-passable code into a massive PR and hits ship. The scoreboard rewards the worse outcome.
A more honest accounting of AI productivity would track defect rates post-deployment, time spent in code review, and how maintainable the codebase looks six months later. On those measures, the slower, more deliberate approach wins. LLM agents are demonstrably effective at bug detection — run them repeatedly against a codebase and they surface more issues than most human reviewers catch in a single pass. That capability is being wasted by teams who have decided AI is a slop cannon and nothing else.
The tools themselves share some blame. Most AI coding assistants optimize for visible output — autocomplete fires, code appears, the illusion of velocity is maintained. Very few surface quality signals alongside generated code, and almost none integrate defect rates or review turnaround data back into how they present productivity to managers. The incentive architecture bakes the problem in.
Until organizations redefine what the numbers are actually measuring, the slop-and-ship pattern will remain the path of least resistance. It isn’t a technology problem — LLMs are flexible enough to support rigorous, high-quality development workflows right now. It’s a measurement problem dressed up as a capability conversation. Teams that shift their dashboards away from volume metrics and toward quality signals will find the technology performs very differently. The ones that don’t will keep shipping fast, keep accumulating debt, and keep blaming the AI when the codebase becomes unmaintainable. The tool didn’t make that choice. The metric did.