Software engineering and AI questions
I want to take a minute, amidst the Claude Opus 4.6 and GPT-5.3-Codex hysteria, to capture some of the currently-open questions about AI progress and its impact on software engineering. There’s uncertainty at lots of different levels about what the future will look like, and good arguments for very different outcomes.
This is partly for my reference (as these questions keep coming up it’s nice to have the obvious arguments in one place) and partly to document my predictions. I’m going to start at the high level and work my way down.
AGI and doom
It is possible that one or more of the frontier labs will soon crack recursive self-improvement and we will enter a hard takeoff. It is possible that some other advancement will lead to compounding growth in AI capabilities which will massively disrupt the economy and world order. It is possible this has already begun to happen in February 2026. It is possible these advancements will bring about the apocalypse.
To that end, I think the main question is whether scaling reinforcement learning will deliver a step change in model capabilities as scaling pre-training did before it. If that happens, the remaining concerns will be ones of hardware resources and politics (see below). But so far I’ve seen no evidence that this is happening (despite OpenAI dogwhistling about models-building-models).
Without a step change, I expect capability improvements to derive mainly from ‘peripheral’ areas: inference-time compute, orchestration techniques, vertical-specific reinforcement learning, and tooling for evals and context management. Most of these areas are so-far relatively unexplored (deep investment being useless when the gains are overtaken by model capability in six months).
Naturally the returns in ‘peripheral’ improvements will diminish over time, though model capability is clearly already sufficient to transform the nature of most knowledge work. That’s the conservative outcome!
Government and politics
We just got our first highly-publicised clash between a major AI lab and the U.S. government. Checking the script, we are way ahead of schedule! How is the U.S. government going to interfere (or not) in AI research? How quickly will AI research (and related hardware acquisition) become a politically salient issue?
I don’t have a lot to go on here – it seems like we are still very early despite the recent kerfuffle. I do expect the first major government intervention to be import/export related rather than focused on alignment, though.
Hardware supply
This is an area that seems neglected in a lot of analyses. Hardware shortages (whether memory or compute in their various forms) could quickly become the governor on further progress, especially if compounded by trade restrictions.
There also seems to be a nascent anti-datacenter political movement. Will this be a meaningful factor in bringing more compute capacity online? I’m not sure. I would love to read an expert analysis of how much additional capacity can be created nationally this year based on the relevant supply chains. At least then I would have a point to discount from.
Pricing and lock-in
Speaking of compute, the industry has been offering $200 ‘max’ plans for almost a year now. These provide access much cheaper than API pricing, though it’s not clear how much they are subsidized in aggregate.
There has been a break between Anthropic and OpenAI with regard to third-party tools: Anthropic recently disallowed tools which wrap Claude Code while OpenAI took the opposite position.
Although competition is fierce (and benefitting the consumer), there is clear tension between the race to capture market share and the need to increase switching costs. These companies cannot afford (literally) to end up as one commoditized option among many providers.
To avoid this, I think each company will increasingly attempt to ‘lock in’ users to their particular client (Claude Code or Codex). Whether via unique capabilities or smoke and mirrors, they must make users feel that their particular product has some ‘special sauce’ that isn’t available elsewhere. The latter is the cheap option: I expect to see lots of ‘baked in’ features this year that are really generic add-ons (e.g. orchestration).
SWE industry and layoffs
Current model coding abilities are very impressive and there is still lots of productivity juice to squeeze, even without big model-level improvements. Am I out of a job?
On one hand, a lot of software engineering tasks that used to be slow have been massively sped up. You could use this as a justification for laying off 40% of your employees. On the other hand, this could be a Jevons paradox situation where the lower ‘cost’ of something (software) results in larger demand which creates a net-need for programmers.
My best guess is a hybrid situation: really big orgs will probably do some layoffs while companies outside tech go on a hiring spree (they can now afford to do things they couldn’t before). On the layoffs side, there is now cover for any CEO who wants to pull this move to do it (and some will do it to look like they’re doing something).
SWE seniority pipeline
What’s the difference between a senior software engineer and a junior software engineer? Mostly it’s judgement ability borne of experience. A junior can write all sorts of good and bad code, they just don’t have the eye to tell one from the other.
Where does this judgement come from? Well, for the last fifty years the strategy has been to hire junior engineers and give them experiences which over time hone their judgement until they transition into a senior role. This upleveling happened freely because building software involved a lot of object-level work that even inexperienced programmers could contribute to with mentorship and review (so growth was a side effect of other valuable work).
Now, all of that is out the window. If you have a bunch of senior engineers then I don’t think any amount of additional junior engineers can make them net more productive. At the same time, junior engineers can crutch on AI tools in a way that inhibits the very judgement they’re supposed to be developing through intentional practice.
Naively it seems like these factors will combine to depress junior hiring, restrict the supply of seniors, and net-lower the average skill level of the industry.
SWE practices
I am partial to the ‘programming over time’ definition of software engineering which I conceptualize as two axes: time and codebase size. Most of the flame-warring between programmers online is a failure to recognize these quadrants and their various constraints.
The weekend hobbyist, the OSS maintainer, and the corporate cog all exist in unique environments that dictate what is useful and practical when it comes to source control and code review practices, build systems, testing, deployment, and monitoring.
Codebase size typically varies with complexity (it is hard to have a large simple codebase or a small complicated one) so the ‘size’ axis is also generally a ‘complexity’ axis.
Historically, size also varied with people: it took a large team to build (and maintain) a large codebase. Now that the cost of code generation has cratered, this model may need a separate ‘people scale’ axis, implying an entirely new set of practices that make sense in each low/high time/size space[0].
Those already exploring this new conceptual space have been focusing on code review. Does a human look at every AI PR before merge? Does another AI review it? Are there new, LLM-based CI protocols that can substitute or replace existing human-centric practices?
We are at a new frontier where well-accepted software engineering philosophy (e.g. code as a liability) can be challenged. What new tools and interfaces are needed to fully unlock the utility of the latest models? It’s hard to believe that multi-agent tmux sessions and UIs over git worktrees are the end-all, though that’s where we are right now.
I think two overlooked areas are human coordination and responsibility. If code generation was previously throttling organizational progress (since coding is slow and SWEs are expensive), a sudden 2-10x speedup there doesn’t translate to that much delivery, it makes something else the limiting factor!
Probably that ‘something else’ is coordination. I expect that gap between good and bad leadership to become significantly more apparent as the effectiveness of good leaders is multiplied.
The other critical and challenging function of human leaders is responsibility. As the ratio of robots to humans grows, do we concentrate the displaced responsibility in fewer people? It has to go somewhere; a computer can never be held accountable.
[0] While one person can now generate a large amount of code, I’ve yet to see evidence that they can maintain it. Probably some companies will inadvertently sabotage themselves in this way and quietly disappear (no one – founders, employees, investors, customers – will look good getting caught in that situation).