- new
- past
- show
- ask
- show
- jobs
- submit
> Code becomes precious when it is the only place knowledge lives.
Reading AI code all day is _agonizing_. Just, a horrible way to live, and it melts people's brains at the moment you need them to be the most capable.
Manual programming has this really productive and gratifying feedback loop, where you read the code, write the code, and fix it until it compiles/runs/does what you want. AI code not only does half that for you, but it makes the "click" at the end uninspiring because you're never sure if it's cheated a bit to get to that moment.
Trying to operate with AI-generated code as the only durable artifact of programming is a dead end for the industry. Charity points to (and correct discards) architecture diagrams/specs as an interesting space to work in. My suspicion is that it's closer to the thing that's hand-written: prompts, markdown plans, and other nudges. Focus on the thing that you, as a human, produce, and that's the basis for both the core loop of "did the AI follow my instructions" and it's higher-leverage when you go to code review.
By the time you get to the PR, you've probably typed enough to Claude that you can regenerate the code, but the current industry default is to just throw away all those sessions and ship the code. That's backwards!
I think it's less about "break it down" and more about "let's communicate at the same altitude."
I wrote a (bait-titled) post about it: https://tern.sh/blog/stop-reading-prs/
305 files +15075 −13110
153 files +21934 −8698
125 files +28120 −2398
43 files +11188 −63
118 files +21564 −647
These are the largest (6 of 35) in the past 30 days. added: 190079 removed: 39696 in the last 6 months
from one person.
Because if all your SWEs produce 5x more code, it also means they have to review 5x more code. But LLMs don't really help with code reviews. Then it becomes a Metcalfian paradox unless you just rubberstamp PRs, which is what is expected of you.
if youre being asked to rubberstamp prs thats a management skill issue
But it's also the exact sort of thing that LLMs are literally perfect for in my experience so there's really no excuse anymore. I've never seen Claude fail to turn a 5k PR into a well-decomposed Graphite stack.
I would, and all my training at Google told me to do that. But what I found after I left that comfortable box was that somehow this kind of practice is acceptable in the industry at large and you're expected to just Deal With It(tm). 5k lines isn't even high by what I've seen.
Worse the "code review" tools that people have access to in GitHub make this absolutely and totally unworkable to incrementally improve review. Messy merge commits full of "responding to code review" comments. Threads impossible to follow. Just bad tooling.
So a lot of shops, from what I've seen, are just yeeting it with very shallow reviews.
This is my observation pre agentic AI. LLMs just threw kerosene on that dumpster fire.
I don’t think this is possible in practice without leaning on the stability of the code base.
However, LLM outputs change with slight re-wording of prompts and with each new model release. I could hand write a test that says if x > 1 make sure y happens, but then what productivity was gained?
If your program is truly nothing more than that "if" statement then there is unlikely to be any productivity gain, just as there is no real productivity gain using a programming language over flipping toggle switches for something so simple. Programming languages would have never been invented if that bit of logic summed up the entirety of computer science. In the real world, the calculus starts to change when you are trying to solve bigger problems. A lot of solutions require way more code to implement than to describe the necessary properties of. That is where you can gain some huge productivity gains by being able to focus on declaring the properties over having to define the full implementation.
But, again, it is not a panacea. No such thing exists. Every abstraction brings its own set of tradeoffs. Your job is to find the tradeoffs you can accept for your unique circumstances. What others are doing is irrelevant to your situation, but it remains that others are doing things and it can be fun to learn about it.
> A lot of solutions require way more code to implement than to describe the necessary properties of.
That's true to an extent, the additional code often define the emergent and undiscovered properties of a system.
That's the cost of abstraction. Everything has tradeoffs.
I think it's very similar to dealing with large offshore teams. Every day you get a huge pile of code to review. It's really exhausting.
I prefer dealing with AI because at least it tends to follow rules once I write them down. Not so much with a lot of offshore guys. Same mistakes every day.
I guess my company needs to hire better offshore devs....
AI can give you 80-90% of the quality, but the feedback loop is hours or days, not weeks.
This means the inevitable iteration of "no make that a bit greener, move that there, that is the wrong style for this scene" can be done faster, which means cheaper.
https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r...
Notice how the code block header attributes the model. The UUID can be traced to the conversation so everybody can tell exactly how the code came about. For this to work though, you need to use my chat app as it ensures you can't tamper with things if you are truly serious about AI code provenance.
I also have a lot more human-focused method which is part of my CLI tool.
https://github.com/gitsense/gsc-cli
I am currently looking at making pi (https://github.com/earendil-works/pi) support AI code provenance, but for now if you want a more structured way to capture what you have done in an agent session that can be used in code reviews and be carried forward as knowledge that lives inside your repository, I have
gsc lessons
The basic idea is, after you have finished chatting/working with the agent, you would work with it to identify lessons worth carrying forward. You can store your session if you want, but really, the lessons should be something that can help you review code better and to prevent future mistakes.
I have a real working example at
https://github.com/gitsense/smart-ripgrep
This is a fork of the BurntSushi/ripgrep repository. It shows how you can use lessons to learn from past design decisions.
First product compares the code to the prompts and highlights places the agent made decisions you weren't involved in: https://tern.sh/docs/tours/
It's not even that its more fun, AI can spew endless slop boilerplate but it simply can't handle boiling an application down to its component essence in a way that makes it straightforward to maintain and bug free.
At the very least apply it at a higher level: specification, proofs, anything but generating Rust/Java/C and then letting yourself or an agent babysit it.
depending on your industry, you might be able to ship half-slop and then fix some bugs downstream though
When starting on a new codebase, how do you make yourself into a helpful contributor as quickly as possible? I go straight for the humans and their human docs. What problem was the system originally built to solve? What was the original design, and what were its biggest problems? Who is currently using it? If you know these, reading the code is much easier because you can guess why things were done the way they are.
Also, this blog post has gotten popular: https://blog.gpkb.org/posts/just-send-me-the-prompt/
I think Charity is observing a very old problem and expecting the new technology to lead to a new solution of some kind. I doubt she thinks even the current generation of tools are the end of the AI software development story. She's not saying we'll drop design docs right into Claude code and walk away (design docs aren't complete either, that's why when you're ramping up you also have to talk to people, read old tickets and postmortems, etc.)
What she's observing is that, in prod, people don't like infra where it's hard to tell how it got into is current state, and so infra-as-code is what we do now. She's also observing that, "it's hard to tell how it got into its current state" is the status quo with codebases, which other people have observed going back to "Programming as Theory Building" and earlier. And she's expecting that, analogous to infra, software development will somehow be done with tools focused on making "how the code got into its current state" clearer.
> When starting on a new codebase, how do you make yourself into a helpful contributor as quickly as possible? I go straight for the humans and their human docs. What problem was the system originally built to solve? What was the original design, and what were its biggest problems? Who is currently using it? If you know these, reading the code is much easier because you can guess why things were done the way they are.
This is the way but plenty of engineering teams don't have any human docs at all. Decisions are made in one engineer's head or in a chat that isn't saved. The spec was just a few notes in a ticket that was deleted during cleanup or lost when the team changed trackers. There's no map of the codebase or features, no ADRs, minimal observability. All you have is the code. You read the code to try and figure out what is going on then ping an engineer who made a recent commit to a specific area to ask if they remember why something was done the way it was. Someone makes a change and it breaks something on the other side of the codebase that they thought was totally unrelated, etc.
Even the most AI-positive teams prefer human discussion when things get that tough. Given enough time, things will "click" for humans. LLMs don't work that way.
Even a team of all-new unfamiliar devs forced to study an old codebase will eventually figure out what it was about and pick up tons of nuance the LLM cannot. This is the nature of writing. It exists in a time and place beyond the pure literal text. Humans live in this context and can get into the headspace of the original dev(s).
At least in my experience, that ideal has never existed. "Engineering" (and "re-engineering" and "re-re-engineering" in the agile worlds) was always what I was spending the majority of my time on. Coding was a medium for the engineering. By the time I finished the engineering the code was either already finished, being discovered in the written code and then documented, or the code was "the fun part" reward for all the hard engineering work that lead to it (and all the ugly specs documents it took to get there).
It's from this perspective that a lot of engineers feel strong negative sentiment towards AI.
There are always going to be some critical sections of code that one must consider carefully. These tend to be at the extreme ends of choice. Either there's only one way to do it and it probably sucks, or there are many ways to do it and staying optimal is very slippery for maintainers. Identifying and describing these critical sections is also the most important part of the documentation. This is precisely where LLMs fail to do a good job and where people curse the original devs for not writing docs.
But as well, the overall architecture is just as important. The code for this tends to look like the "boring" boilerplate. This is the skeleton of the codebase and LLMs can be bad at this too, haphazardly jamming together design patterns that clash. We're in luck that, usually, a framework or library will provide this code along with documentation to be copypasted verbatim. The rub is when the developers are having to shoehorn it onto existing code they will have to carefully craft some custom interfaces and document them very well.
So in the end, what's left for the LLM to do? When it does a good job, it's usually cribbing so heavily from existing solutions by humans you could have copied it yourself if you knew where it got it from. The LLM is automating copypasta, not deliberate coding. When it's bad, it's making a mess only suitable for a rough proof of concept, if it even works.
From the perspective of a diligent engineer trying to avoid technical debt and other incidents down the road, burning an extra couple of days to get it done right by hand isn't that big of a deal. LLMs become about as useful as a google search. Assuming one does not work at a coding sweatshop, why not just use the google AI summary 90% of the time? The agentic workflow doesn't look promising for a significant chunk of experienced engineers working on maintenance more often than new projects.
I've been saying this a lot lately. I agree completely. Most of the impressive stuff I see people talking about AI, they could have just copied and pasted from open source repos in the past
I think that was sort of a taboo in the past though, for several reasons. First off, most teams I've worked on were reluctant to take on the additional maintenance of unvetted open source code if there were other options. For some reason this reluctance has flown completely out the window with AI, even though the cost of code maintenance hasn't really decreased.
Secondly, there was always the question of code licenses when copying from an open source repo. Companies were reluctant to risk anything that could be viewed as code theft
Now the code theft part is solved by training LLMs instead of just copying and pasting, which just feels like theft with extra steps to me
If there's one thing that seems unclear from AI proponents, that is the number of technology they're using at the same time. After months or so on a project there should be enough that should be in longterm memory or resources that are part of your browser history that things that require deep research should be a small part of each ticket. And it would be important enough to get it right.
AI needs more discipline, yes. But theoretically that discipline can be learned much easier than becoming a good engineer.
Think of it this way... 20 years ago, to write good, scalable C code - you needed to 1) either be a genius, or 2) dedicated to the craft.
You need to learn dozens of tools like the back of your hand.
* ASan
* LSan
* UBSan
* TSan
* GDB
etc... God forbid if you needed to manually read DWARF files. Unless you're a pure genius, this is not feasible to master in a short amount of time. And in parallel, you need to learn how to design systems, too, otherwise, you're still not very good, and that's an almost completely orthogonal skillset.
Now, you simply need to be aware of the hazards in your language/framework, tell your LLM to test for them, have the infrastructure set up to see if they've adequately tested for those hazards, and maybe read the actual tests and implementation.
It is pretty easy to be able to read and understand Rust compared to debugging all the sorcery-like errors that come during Rust development... It is easy to see that you need a Loom test for certain scenarios, and to write a tool to detect if you did it.
Even if you're still working in C or Zig, it far easier to know and detect when you need to use those tools then to learn to use them all individually.
It is not hard to learn to read SQL. Almost ~50% of business professionals can. Python is barely harder. Rust can look like sorcery if you don't read a 50 page guide to understand to read it, but that's a VERY small price to pay compared to spending ~10 years learning the craft painfully by trial and error.
I'm not sure how you get from "LLMs work in mysterious ways" to "So we need more discipline" to "everything is fine."
I agree that everything is fine. I just don't think this is the clear path and thought process.
Anyone who has the determination to get things to actually work, and takes a little bit of time to understand what makes them not, should be able to leverage LLMs to work wonders.
In my opinion, LLMs are going to make things far more complicated, because the cost of building something complicated is becoming almost free.
Engineering was always about discipline and getting things to work. But you needed a set of prerequisite skills to have much value. Most of those are gone now.
It is simply far easier than before. It does require discipline, yes. But discipline is cheap compared to ~10 years of trial by fire.
Are you referring to this part:
> I am not worried, at least in the near term, about AI creating massive, discontinuous returns on investment in the absence of engineering discipline. (Many will try, and it will be entertaining to watch.)
She's saying, "the amazing thing about LLMs isn't that they generate lots of code fast, so don't worry about people using LLMs for that taking over the industry"
She's making two points:
1. Before infra-as-code, people would be afraid to touch parts of production due to lost knowledge about how and why it got that way. Now that we have infra-as-code, you aren't allowed to change infra the old way (ad-hoc changes via dashboard/CLI), even if doing so would be faster and easier. Experienced SREs were required to abandon lots of their old skills with CLIs and dashboards and start working in a completely new way, because the knowledge captured in a terraform repo's commit history is so valuable.
2. In the past, the way code got written was through people making changes in ways that are specific to their current knowledge, the org's current problems, the current users, etc, some or all of which is not written down. Eventually, everyone is afraid to change certain things because they don't know or remember all the considerations that went into them (not just afraid to touch parts of the code, but afraid to delete seemingly-unused features, or migrate the schema, or whatever).
Charity is saying that problem 2 is a hidden/lost-knowledge problem like problem 1, and the amazing thing about LLMs is you have to write down all the knowledge you want them to have, which may lead to a better solution to the "lost knowledge" problem in software development, which would be so valuable that experienced software engineers have to abandon lots of old skills and start using it.
(Not only writing down all the knowledge you want the LLM to have, since they're flaky enough to ignore instructions and miss implications sometimes, but building test suites and tools and so on that adequately guide their solution. This is the "more discipline" she's referring to.)
I've been thinking about this a whole lot recently. So much of my intuition about software development is based on 25 years of accumulated experience on how long it will take to write different bits of code.
Should I add validation for this one edge-case which won't break everything but will make a little bit of a mess if someone hits it? If that's an extra couple of hours of code I might skip it. If it's one more prompt, why wouldn't I?
This new feature would be a lot easier to understand if there was a custom API explorer for it. There's no way I could justify investing in that... unless it's just 10 minutes with Codex, and it was: https://tools.simonwillison.net/datasette-extras-explorer#ur... (linked from the release notes https://docs.datasette.io/en/latest/changelog.html#extra-sup...)
That's just on the small scale. There are entire projects that I'd never previously have considered, because I don't need a custom SQLite SELECT query parsing library enough to justify spending a week or more building one. But now... https://github.com/simonw/sqlite-ast
People get VERY upset (and condescending) any time you suggest that being able to produce lines of code faster is a valuable thing. And sure, measuring output through "lines of code" is stupid.
But measuring "lines of verified code that deliver valuable" isn't stupid at all. That's the thing we can do faster now.
Look around you - google is valuable because it hoovers up data to generate revenue from advertising and has minimal expenditures compared with the revenues. All those bets? Lol yeah what about them?
Engineering for the sake of engineering has no value to the economy - aka it’s irrelevant. It’s the hard truth nobody wants to hear. There’s a limited set of things that can existence in the economy at any given moment in time - only those that provide value and can be sustained w.r.t economics stay the course.
I think that's the adventure we're on now. If recreating something is low cost, what is the value in investing in designing it well in the first place? We can empirically discover issues and the the AI to address them.
I certainly routinely find in supervising what the LLM is writing that it's making terrible internal design choices and correct them. Usually things one level up from code. "This will cache every image on the client and cause a huge amount of bloat. Change it to pull the image in real time from the server" kind of stuff. You do slowly build that up in the project documentation - "Never store unnecessary data on the client: we assume they are using low powered devices without substantial storage". But it takes time and the road to discovering that empirically is through a lot of unhappy users.
So I think there is still a lot of room for genuine engineering - that is, at the technical design level. Levels up from that - code structure etc - are much less clear. I am guessing that over time we will heavily optimise code written by AI for maintenance by AI. Which may be mostly about matching the context window to the code module size. Factoring something to 5 modules may be less of a good idea if it means the context window has to hold all of them for the LLM to work. But that is the path of discovery we are on which history tells us is a 20 year journey.
I would put that as my signature if HN supported that. I see a lot of systems being built where the whole point seems to be about the ritual, not anything valuable for the user.
Except measuring the value of an individual piece of code is still very difficult if not impossible.
I think some people care about understanding things they have to attach their names to. Many obviously don't care, but others do.
It was submitted by a seasoned user, who probably asked a frontier LLM. It still felt… wrong. I didn't understand it, and I wouldn't merge it without understanding it.
I also suspected it was wrong, in a way that would cause issues in the future.
So I reviewed it 4 different ways: (1) try to understand/improve it; (2) do it with better algorithms; (3) avoid it by fixing the issue upstream; (4) rewrite it from scratch probably just to match my brain.
I expected either (2) or (3) would be the answer. (2) didn't work, rather it's the correct answer but I need to redo the project from scratch to use it; (3) I wanted really bad to work, but didn't.
So I got to a blend of (1) and (4). I'm still not entirely convinced, but now I understand the issue/solution. I obviously think my approach is better.
Still, I still stripped both of comments, and asked my LLM to review.
The LLM came back and said the original one was clearly better. I explained why not, it then answered I was correct.
If I try it with comments, LLMs say the mine is better. Because I found a real issue (one that I pointed at in the original comment thread). But is it saying mine is better because I coerced it to say so?
I find it’s less common for me in ruby, even refactoring bad ruby. Sure I can remove lines but bad JS/React balloons so fast.
My current org values this and my direct boss constantly praises those of us that try to remove more lines than we add. Very refreshing.
What happened in 2025 was this: the economics of code production were turned upside down. Instead of being very hard, time-consuming, and expensive to generate code, it became effectively free and instant. Lines of code went from being treasured, reused, cared for and carefully curated, to being disposable and regenerable, practically overnight.
It's not so much as "the economics [...] were turned upside down", but that a manufacturing process that used to be strictly additive (akin to 3D printing) is now complemented by a subtractive process (akin to CNC milling). The "shape" that is demanded hasn't really changed, and nor has the human effort (as long as you care about achieving certain tolerances). You still have to "treasure, reuse, care for, and curate" your product to whatever degree the market demands.Also I disagree with:
Lines of code are not the ideal artifact to review
What does "ideal" mean here? When I was growing up "show your work" was the rule for all examinations. Why? Because we're working to improve mental models and thought processes for the next generation, not just products we will release tomorrow.> What does "ideal" mean here? When I was growing up "show your work" was the rule for all examinations. Why? Because we're working to improve mental models and thought processes for the next generation, not just products we will release tomorrow.
They're saying that the mental models and thought processes are incredibly important but that code is not the place for that work to live.
They’re important for discussion and brainstorming. They’re also important for sharing context before reviewing. But code is the only perfect representation in terms of semantics of what the computer will do.
You can have all the diagram and all the proses you want, but they’re still ambiguous.
What I meant is that, insofar as some work has been produced with a human mind involved and where imperfect abstractions are used, one should not for whatever idealistic reasons push for reviewing the work at some coarser granularity than the details which are readily available. That's a way to foster and encourage mistakes, in both the work and in the mental model.
So when you say that code is not the place for that work to live (or more closely to the line I disagree with, that code is not an 'ideal' artifact to review), you are essentially purporting that there is a perfect abstraction that can generally be trusted, which I disagree is currently the case for an LLM spec versus produced code.
I bet that I know why!
I suspect the stance they described as one readers mistakenly took away from their previous article to in fact be their stance. Otherwise why dance around it so hard?
I don't think that's true. We treated code as permanent because we considered code to be the source of truth. Computers don't run documents, computers run code. If the requirements document contradicted the code, then the default was to assume that the requirements document was wrong.
You can't separate code from spec because the code is the spec.
> It was reasonable to be skeptical the first time
It's still reasonable to be skeptical. A few weeks ago a post was discussed here on HN [1] that asked:
> What would have to be true for us to ‘check English into the repository’ instead of code?
to which I replied:
> Code is already the cheapest path to working, correct software. LLMs do not change the calculus because figuring out what to make is the expensive part, not coding it up. Skipping code makes the specification of what to make even more expensive and throws away the tools that keep precision affordable. Programming in English would be more expensive than just using a programming language. [2]
[1] https://annievella.com/posts/finding-comfort-in-the-uncertai...
[2] https://www.slater.dev/2026/05/why-english-will-never-be-a-p...
> A sufficiently detailed specification is runnable code.
In a way I think LLMs will enable the dream of 4gl and "sufficiently smart compilers"[c].
LLMs aren't smart, but they are capable. Especially capable of translation and transformation.
I can certainly see them help move the abstraction horizon at which we work - so that rigid high level descriptions of the desired logic/process along with the process for quality testing - become the relevant curated artifacts - and the generated go/rust/java/python/etc code become incidental and mutable; subject to constant rewriting as part of the deployment of systems.
[c] You know, the ones that take naive C/C++ and produce executables that fully leverage RISC/EPIC platforms to be better than CISC. See also: Intel Itanium
1. What a C compiler was
2. What a C compiler looked like
3. What the C compiler had to do at runtime to pass gcc’s torture suite through some sort of collaborative iteration (compile, run, did it get stuck at some torture suite test or fail?)
Remove 1 and 2, or replace it with imperfect business logic, and you’re left with a system that is built to _only_ pass the tests you supply it, or in the most extreme case, print(“unit and functional tests pass!”)
Without more, better testing, hopefully more invariants stored in type systems that are easy to reason about, and more recording of the reasons why we change things, we get a more unstable system in practice. One were fewer people can work at once.
Rather than dismissing humans for quality control, we should take an asymptotic approach, where humans verify less and less as more verifications are automated, but are never out of the loop. Get down to 1% of the things, then 0.1%, then 0.01% and so on.
Automate all the linting you can before the agent is allowed to make a PR, make sure it passes the tests, add custom linting for dumb AI-isms you’re sick of telling the agent not to do - yes, you can lint for that fallback & backcompat code you never asked for, you just have the agent generate a script that walks the AST and flags the problem by line and file, then put that in your pre-commit checks - the agent treats it like just another lint error. Now you never have to review for that thing again.
But you still have value!
Even when you automated everything you can think of, there’s still tremendous value in human review. It’s your last chance to fully understand the implementation before it melds with the codebase. You also pick up more antipatterns to add to your automated reviewer (the automated reviewer is just a long prompt with an ever growing list of bullet points)
And the asymptotic nature of QC extends to observability and production. You cannot really ever automate a loop directly from observability to code fixes? Even when the agent presents a fix to an unhandled exception in production - if it was bad data, should you clean it in a backfill? If a key business metric dropped off a cliff because of a bug, should you add an alert once you fix the bug?
Now that AI coding speed and performance outperformed most of human. But AI still need human to be commanded. Yes, you can let AI agent manage sub-agents but still, human is at the top of manager who order AI what should be written.
So human must command and final say on when it's done.
Is laziness still a good virtue in AI era?
That is still an enormous virtue in the AI era. It is completely the opposite of what many AI-using programmers are doing, which is being lazy in the conventional sense, minimizing their individual energy expenditure at the price of increasing the overall energy expenditure.
Being big-picture lazy is a virtue. Being individually lazy is a vice.
If you buy that, then it follows that the more work you accomplish with AI, the "lazier" of a dev you are.
If AI writes the code and humans spend more time reviewing it, that might not be a bad thing, but when the AI code is good enough, people are going to view thorough reviews as optional. Then the job of a SWE will look very, very different than before since SWEs won't write much code or spend much time reviewing it. The IDE may go the way of the dodo. And maybe the focus will move to setting up the goals and tests that keep the AI coding team on task. Maybe SWEs will spend more time architecting since they're likely to know where projects are heading and won't want AI to rewrite things as goalposts legitimately move. Maybe more will be spent exploring: build it one way and another and another and compare and generate new ideas from the different approaches.
I have no better idea than anyone else, but I'd be heavily against the role going away and in favor of it evolving, like it's done many times before, though perhaps never as rapidly as it is right now.
> What happened in 2025 was this: the economics of code production were turned upside down. Instead of being very hard, time-consuming, and expensive to generate code, it became effectively free and instant. Lines of code went from being treasured, reused, cared for and carefully curated, to being disposable and regenerable, practically overnight.
A little but further reinforced by this:
> I am just barely old enough that my first job title was “System Administrator”. [...] I lived through the shift from handcrafted server pets to immutable infrastructure cattle.
What is happening now is nothing new, we have seen it many times before: a shift in technology which is bringing changes in the ecosystem, required skills and so on. This happened with stocking frames, steam engines [1], automobiles, servers, and now the code. Just like before, many will be - and already are - harmed by this, but ultimately the world will adapt and accept the new paradigm.
[1] There's an infamous screenshot of a tweet being shared around, where someone suggests various names for writing code without AI, and someone else responds with "software engineering". Allow me to add my on contribution to this debate: codejamming.
> People do not want to wake up every day and log in to Slack and find the buttons and menus all subtly moved around. People do not want financial transactions that complete most of the time. Determinism is not going anywhere, my friends.
Well, I can't reconcile people not wanting things moving around and determinism with the promises of acceleration made by AI. The way I see it either AI makes "massive, discontinuous returns on investment" by way of changing things or we get a sustainable rate of change; these seem like contradicting goals to me.
AI doesn't have anything resembling discipline
And I‘ve quickly realized that it’s also much easier to follow that premise.
Not only because agents obviously help with writing documentation, test cases, DX tools, and so on.
But also because it feels so much more rewarding to know that someone — even if it’s just a soulless agent — actually cares to read and use and follow these.
I have always been the guy on the team who would write the tools and documentation, and it’s always been a bit frustrating to know that only half the team would care to read and use and follow them, at best.
I now do documentation driven development, and with very few exceptions I am committing code that is better written, better documented, easier to reason about and maintain, with less library overuse than I ever did as a senior lead with a smal team, and I’m doing it for 1/4 the price, at 4x the speed.
But it’s not vibe coding. Discipline is critical, as is deep systems understanding.
That question was answered decisively last November."
It's easy to forget that people said this exact thing about every model after GPT 3.5. This is a standard trick the industry uses to invalidate negative experience with LLMs. 'You are prompting it wrong' becomes 'you are using Gemini, but you should use Clade' which then becomes 'well, all of your criticism is now irrelevant, because everything is fixed in this new version'.
This "discussion" about capabilities is set up to be asymmetrical and basically non-falsifiable.
I really don't know how I'm supposed to reply to stuff like this.
You undermine your own point when you misrepresent the situation like this. Real human mathematicians, including at least one Fields Medal winner, have validated and complimented the result.
The claim made by Open AI has two necessary components. The first one is that the conjecture had been disproven. (This is what had been verified by "real human mathematicians".) The second necessary part of their claim is that the work to disprove the conjecture was done mostly by their AI model rather than by people employed by Open AI.
Funny thing is that even the explainer on OpenAI's own website points out the issue:
"This result does not show us all the times AI has claimed to have a proof of something and been wrong."
"I believe if the level and type of human expertise that is represented on this note had been assembled to find a counterexample to this conjecture a month ago, and those people put in similar amounts of time working on it than they did to reading and thinking about Chat GPT’s solution, the mathematicians would have found a counterexample."
[1] https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...
In general most developers are going to find themselves fighting incentives which will color their opinion. AI isn't there yet but if you are going to abase your whole world view on a point on a graph and not on the trajectory you are in for a bad time.
The author makes the wrong assumption though that the majority of people who are doing engineering want to do even more engineering.
It’s my experience that most technology workers just want a high paycheck and have some kind of association with being in tech and doing cool things
yeh, I can see how that is now mistaken for a definition of 'engineer' or 'hacker'.
I am sorry you never knew what engineering truly means.
Well, that's a loaded statement. I'am yet to see a Claude session where Claude would tell me to hold off and make my prompts more disciplined.
So it does not demand more discipline.
It can, otoh, build better from disciplined prompts but people too, build better software from better specs.
AI demands more of your hours cranking out the same discipline, due to the volume of stuff that needs to be verified.
Normally the term "more discipline" is understood as increased rigor, not simply more work at the same level of rigor.
Previously in my career, a junior making a mistake and being told why it was a mistake would learn from that and improve. The junior-chatbot duo will not. The junior will feed my comment into the chatbot, and the chatbot will superficially give the impression of having learned from the mistake and fix the code while introducing the same problem somewhere else, and the junior will have learned nothing.
This all requires that I review code as though it was created by some kind of cursed monkey's paw with an endless number of fingers that each grants a wish of the operator with a devastating caveat through idiotic interpretation.
If you give a toddler who can otherwise not operate a chainsaw -- for the simple reason that they don't have the strength to start it using the starter rope -- access to a robot who will turn on the chainsaw for them on command, you've created a problem which didn't exist before.
Such as the claim "AI demands more engineering discipline, not less" ... of skilled people, not irrelevant unskilled people.
I will probably be dealing with them in the future more than in the past because chatbots bypass the process by which unskilled people generally become skilled people.
if you could look clearly at the progress from 2020 to 2023, as someone like Gwern did, and from 2023 to 2024 with the invention of reasoning modes, then it was not that hard to understand what would happen in late 2025. Opus 4.5 was not a surprise to anyone who was actually paying attention.
But people (including the author) still mistake the current state as a stable state and future gains as incremental. he says
“I am not asserting that all code will eventually be AI-generated to spec, bypassing human understanding”
I AM asserting that, and it’s incredibly easy to do so.
The question of “when” is separate.
I agree that AI demands more engineering discipline, but it also demands more domain knowledge, purpose, and intent. Suddenly we can actually accomplish most of our goals a little faster. I can take on work I couldn't before.
Before I even begin getting disciplined about engineering, I need to ask: does this work actually make sense? Should I do it? If it's done... What do I think will change for my team or organization? Will it have practical results that move us in the right direction?
The better you get at asking that question, the less you'll find yourself prompting and planning and shoving PRs into the chute. It's still somewhat difficult to find important work in many places.
Still, engineering discipline is and always has been critical when going ahead with important work.
My gut feeling is that many of us simply aren't doing important work, and the discipline might be nice but is ultimately irrelevant. The sloppers are doing a faster version of something they always have, and much of it will be lost to time just like our pre-slop work has been.
I find LLMs aren't as helpful when applied to well-thought and intentional work towards very specific goals in complex domains. They're still helpful, but, the deeper you go and the more specific you get, the more they tend to deliver results you can't use. If you're on the rails they can be incredible. Diverging from the track and having exacting requirements, eh, it gets pretty hit or miss and you can spend a lot of time herding a digital cat. This certainly demands a lot more engineering discipline.
AI pushes this premise beyond infrastructure and into application code itself. When rewriting is cheap, editing in place becomes risky. Mutation accumulates entropy. Replacement resets it."
I've always found verifying some code works correctly much harder and time consuming than writing code. Replacing big chunks code means much _more_ verification and validation.
When you see a bug, and you "fix the running thing" you only need to verify what you changed.
As I see it, the "infra as code" transition means going from more ad-hoc changes to less. Predictable, auditable.
Using LLMs to replace mostly-working code with a new, nondeterministically different version seems entirely different to me. You've identified a problem in the code, great, let's rewrite it from spec. Now it fixed the problem! But did it introduce more problems? Hope you used enough tokens, I guess!
Writing software begins with a solid design that is defensible. If you don't have that, the AI will produce slop.
Once you're happy with the design, you need a solid plan. If you don't have that, the AI will produce slop.
Once you're happy with the plan, you can set the AI loose, but don't get too complacent! Anything that you missed in the previous phases could very well lead to slop (although likely localized).
And then then, as your project matures and you gain more understanding of the space, you start to notice deficiencies in your model. This is where AI really shines: design and code changes to adapt to reality.
Guess who the author is.
> > The enthusiasts are not wrong. We are starting to see real, non-imaginary, discontinuous leaps in capabilities from teams that lean in hard to working with AI. And this does not feel like a normal technology cycle where you can wait for the dust to settle; teams that sit this out while competitors are hustling could be out of business before the dust settles. That’s a real, existential threat.
It’s not imaginary. It’s real. This time it’s different. And on a higher level, the FOMO is real. It’s not imaginary. It’s even existential.
Why do they all write the same as well? It’s so emphatic.
> The tech is cool, but as a thinking, feeling, breathing human who cares about other people, it can be hard to get excited about anything that so many people are this upset about. It’s also hard to get excited about something when so many of the loudest voices are out there talking gleefully about putting everyone permanently out of work, and so many artists and writers and people from developing nations are talking openly about the impact on them.
> Hold your desire to jump in and berate me here, I beg you. Like I said, I will deal with the ethics and morality of using AI in my very next post. Be honest, your attention span is no more up for reading a 10,000-word essay than mine is up for writing one. (Can we blame AI for that too?)
More Inevitability Soothsaying. All our feelings are crashing with Existentinal Threat Reality.
Was this article written by AI? It's certainly stupid enough!
- Schema validation with appropriate size limits on all relevant fields.
- Authentication.
- Access control.
- Backpressure management and rate limiting in case a (possibly malicious) user tries to perform too many computationally expensive actions in a short time.
- Ensuring that the actions of one user doesn't throttle another user which is connected to the same process/host, e.g. using async constructs to avoid freezing the main process.
- DDoS mitigation.
- Avoiding race conditions.
- Designing a good database schema, with well chosen indexes, with deterministic IDs/idempotency to avoid double-insertion scenarios. You don't want to be forced to rely on overly complex queries with a lot of joins. This doesn't scale well and rarely necessary.
- Logging and error handling.
- Avoiding conflicts and accidental overwrite with old data when multiple users are editing different fields of the same resource concurrently.
- Efficient distribution of realtime messages.
- Scalability.
The list goes on and on... And every piece has to be implemented perfectly. This involves a huge number of carefully thought-out decisions.
We are absolutely drowning in documentation and code that seems legit and the only recourse is to lean on AI to help process the sheer quantity of it. I have a feeling that the fallout from this phase of the industry is going to be an exotic form of technical debt that is remarkable mostly in its enormity.
LLMs are prolific and they love to add shit. Truly capable engineers are able to achieve more business outcomes with less code / fewer moving parts.
I'd simplify to "Truly capable engineers are able to achieve more positive outcomes" - half of what makes a capable, dependable engineer is knowing what outcomes are needed and making them happen.
> But there is a correlation between output and LoC.
That is less true today than it ever has been, due to LLMs.
i tend to find that the most productive teams make better decisions and work fewer hours. the quality of decisions is such a huge force multiplier that it renders actual hours worked almost an irrelevant variable.
in fact, i've sent over 20 variable rename prs and am now topping the leaderboard!
• sloc = Source Lines of Code
.. so I suppose nloc would mean Net LoC
The library was a masterpiece of what if driven development. It was about 50k LoC, and it had 300k LoC of dependencies. It was a nightmare to modify. And no one wanted to take over maintenance so people would submit PRs to the former employee when they did modify it.
I wanted to change something in the library to support a large migration I was in charge of. When I went digging it turned out that we were barely using any of the features in the 2 years since he’d finished it. I replaced the 50k LoC library and 300k LoC of dependencies with 300 lines in less time than it would have taken me to modify the library (a few days).
High-quality code and high-volume code are highly anti-correlated. Incidentally, low-quality code that is excessively long just so happens to be common complaint with AI-generated code.
How is that not efficient?
So overall increases productivity by a lot
Yes.
> That doesn't sound effective that sounds like digging ditches to fill them.
It sounds effective to me, like removing garbage from sidewalks so people can walk straight instead of walking around the trash.
> Every line of code removed is a line that was previously added.
Correct. Today I cleaned up
to and contributed various other negative lines of code in multiple areas.Every line of code removed is a line that was previously added.
Do you have any experience coding before LLMs?
But, if that original code had comments and traceability of each condition and return to a specific domain scenario, you would be doing a disservice by collapsing it to the one flat boolean expression. In that case, it may be better in its expanded form, and you should let an optimizing compiler do the collapsing.
If there were comments for each conditional, it should still be refactored as
Many years ago, "lines of code" was the classic example of nonsense management metrics. Today, there are somehow HN users who argue that lines of code is indeed a good metric and ask "But what if the code had comments?" as if they have never seen comments interleaved with code.> In that case, it may be better in its expanded form, and you should let an optimizing compiler do the collapsing.
This is nonsense. This optimization is not about compiler optimization for efficiency. It's an optimization for human readability and maintainability.
Such branches could make sense if the conditions have to do with underlying domain concepts, but you expect the outcomes to be revised. It could just be a moment-in-time accident that they are all returning True right now.
This kind of tension is also where you often see indirection via configuration files or other auxiliary data structures. Or in the old days, things like bit fields instead of booleans, so that merging the conditions would encode different small integers to use as lookup table indices.
Because they were added doesn't mean they were needed and even if the same person added and then removed them, it doesn't mean they are digging ditches to fill them.
The idea that "I would have written a shorter letter, but I did not have the time" also applies to code, and sometimes later you are blessed with more time than you had when implementing something under deadline pressure.
Huh? If LoC weren't needed then adding them was unnecessary and a waste of time. Someone who is known at an organization for removing unnecessary code screams inefficiency to me. It's paying one person to create a mess then another to clean it up.
My previous reply already addressed this?
I can't help but think you are being purposefully obtuse if you can't acknowledge the concept of developers creating known (and hopefully temporary) technical debt due to various forms of deadline related time pressure or changing requirements.
Perhaps they tackle non-code-editing tasks like architecture, design, mentoring and code review (think staff and principal tasks)
> Every line of code removed is a line that was previously added
Yes. This os not a failure. Code has a surprisingly short half-life.
What would you keep from this?
There's an obvious answer of course. And that is the direction that these effective senior engineers move towards.
Often the solution to create those document to feed into these AI automations? Use AI. Its like ouroboros. Create docs using AI, then summarize and ingest using AI, explained by AI.
Same thing is going to happen with code. Create 1000s of line of code using AI. Then explain it using AI etc.
1. add tools to be able to do the job 2. work through the process using the agent interactively many times 3. have an llm read back through the session histories to create the full agent definition 4. run it and review
works fairly well, and does use llms for most of the steps
It's okay, I'm sure the algorithm questions during the interview phase totally weeded out the fakers of systems knowledge right?
Words and code are so cheap they're meaningless, while human focus, attention, and understanding are so expensive and overloaded.
Funnily at my company, the few engineer that did the majority of the work before AI still do the majority of the work now. By majority I mean tackling both more issues and better.
However there is a general verboseness and over engineering trend across the board.
What I noticed for myself is that saying no to something is becoming harder by the day because, why wouldn't you try something if it's so cheap to do.
That makes having the strength to say no almost a quality in and of itself.
That is true both for external pressure and internal pressure.
— Kurt von Hammerstein-Equord
These days it is so easy to appear hardworking (by volume of contributions) when you have LLMs. The difference is that the inept now can literally create orders of magnitude more than the careful, experienced engineer.
This is a common sentiment and possibly motivated by the belief that writing "clean" code was right all along. Even odds that an AI 2 major model versions smarter will be able to eliminate that debt or rewrite from specs derived from the codebase.
I've yet to have Opus 4.8 fail me with defensive explict code. Often it'll write code that is better than what I might have done. I imagine it would be a nightmare to go through one of the OOP debug chains with implict error handling, but when every function has a runtime assertion which is basically the contract for how it is supposed to work and exactly what to do if it encounters a corrupt state, then things are just so much easier with AI.
I do agree with you on documentation. The amount we have has exploded in the post AI world. Which is a little ironic since the assertion is frankly what you'll need to know and not the 10 pages of prose the AI autogenerated in the shared loop (microsoft's terrible confluence). It is what it is though, and at least it's easier to meet EU compliance rules now, since those are more about the bureaucracy than actual security.
Until the bureaucrats start using AI to create a flood of new rules.
This.
I feel like it is going to become illegal for older engineers to retire until its all cleaned up. When the debt catches up its going to be civilisation level problem, like asbestos or lead.
Perfect framing! I was leaning towards "AI Chernobyl" analogy in my predictions, but I think "asbestos" or "fossil fuels" captures its nature much better. May I borrow it? I definitely see the harmful consequences of "AI" exceeding both, however with sad realization that it usefulness was absolutely below par with any.
Considering the balance of my job is cleaning up technical debt, that's my own retirement I am worried about.
>When you can no longer pay off technical debt, you turn to technical bankruptcy: start over from scratch, greenfield.
What that looks like in say, 2030, I can only guess. We saw this happen to a multinational shipping company due to cryptolocker, but I dont know if theres going to be any interest in replicating that. Certainly the AI barking in the ear of the CEO isnt going to recommend it.
>I predict what will happen is eventually the number of technical bankruptcies will rise
Outside of industries required for national security maybe.
> there will be some new movement to only build codebases that fit within human heads, no AI required.
I suspect that rational governments are probably going to require No AI clauses in their tenders, but that we will also get hundreds of high profile "Whoops we used AI in this contract when we shouldn't" scandals that come to nothing.
I think this is an important point. Software engineers always had the right instincts on how to approach AI for coding -- cautiously. Execs got too coked up on LinkedIn puff pieces from nobodies and adver-prophesizing CEOs selling their tokens and chips that they forced something unnatural upon their orgs.
Now what we see in the software dev space is incredible levels of malicious compliance ("you want slop, I'll give you slop").
This has not been my experience with my fellow engineers IRL on average, but I do feel like there is a significant contingent of us who are ready and raring to yield engineering in its entirety to the LLMs.
Also, I suspect that a lot of HN comments that generally state that AI has been useful for their work are actually not referring to programming work, but it's very easy to default interpret it that way. For example, I just recently saw a comment stating how AI can do 90% of their senior-level job, and when I looked into the guy's profile, he was actually a designer, which makes sense. But on first read I assumed the comment was referring to writing code.
Some teams are incredibly inefficient. And in the end the AI is perfectly capable of creating the textbox.
In all seriousness, though, I'm indeed curious about Anthropic's engineering practice, particular how they can achieve such level of autonomy.
Anthropic's speed of development is about 20% AI and 80% technical debt. We all could be faster if we stopped trying to build things right.
Once Anthropic floats, I predict a screeching halt in feature development to fix all the technical debt that they're accumulating.
Chalk up yet another echo of the 1920s Gilded Age? Between all these economic spasms and the simultaneous tilting towards fascism, I think there is way too much historical rhyming going on right now...
> perfect formatting and at least superficial plausibility
Basically, a library full of books that have nice covers is going to take time to see that all those books are just filled with ipsum lorem. Before, they coudln't stand up a fake library.
The issue comes down to time and effort.
Worse if it’s a mixture of good content and ipsum lorem.