I decided to tack on “Spring 2026 edition” to my blog post because opinions on code gen AI don’t hold up well after an X amount of time. The industry, the approach, and the tools change way too fast to hold on to an opinion more than a quarter at a time.
Productivity might be way up one quarter and crash the next. AI-generated code can be venerated one week and completely trashed the next week.
Current State of Tools
Just to summarize and solidify the state of “now”, let’s go through some of the tools that have shown up in the past quarter – either due to popularity or they were actually just now invented.
Multi-agent setup
Claude Code and other tools have allowed us to run truly multi-agent workflows. In the past (like a year ago), you’d have to code up your own scripts to set up a multi-agent workflow. Now? Now it’s just another SKILL.md that tells Claude Code to split itself into multiple processes.
This has allowed us to:
- work on parallel independent tasks
- allow multiple agents to work and evaluate each other
- talk to the main agent while the background agents do their work
It’s huge and powerful and people are just getting started with it. I’d say that it’s still in its infancy as every person on Twitter seems to be exploring how to really use multi-agent workflows productively.
The Agentic “harness”
We moved from calling everything a “wrapper”, a “manager”, a “service” to “harness”. I keep hearing that word and in this case, the “agentic harness” describes the layer between your machine and an LLM. It lets your LLM interact with your system.
I think this is the place where most discovery and inventiveness happens. We now have tools like Codex (an agentic harness for GPT), OpenCode (an open harness for any model), Gemini CLI, and Claude Code.
Strangely enough, the harness seems to matter more than the model itself. Opus 4.6 in Claude Code and OpenCode might as well be a completely different tool.
Sandboxed agents
Lastly, with the rise of OpenClaw, we’re seeing more and more investment into sandboxed agents. I think this is going to be the future of agentic programming.
Sandboxed agents are “just” agentic harnesses with full unrestricted access to a “sandbox”. You’re essentially running --dangerously-skip-permissions in a place where an agent can’t do any harm. Whether that’s in the cloud or locally, it allows us a certain amount of “containerization” of agents.
SKILLs
In the past quarter/season, SKILLs were introduced. I think they’re quite the revolutionary idea. Not in that skills themselves are revolutionary – they’re just prompts. Before Skills, we’d all just have folders of prompts. This makes prompts reusable. They become the “libraries” of an agentic harness system.
I have a full Skillbox of my personal skills that I literally use every day and all day. Without them, I’d be stuck generating the code the same way I did last year – by manually copying prompts from my ~/prompts folder.
A skill file is just a markdown file with instructions:
# commit
When asked to commit, analyze staged changes,
draft a conventional commit message, and run
git commit. Never amend unless asked. That’s it. That’s a skill. You invoke it with /commit and the agent knows what to do. I have a full collection of these in my Skillbox repo – skills for session tracking, roadmap management, code review, and more. They’re functions, libraries, adapters, whatever you want to call it, but they’re external methods that extend what your agent can do.
So…how’s it going?
With so many advancements, you might think that everything is just easy-peasy. On top of that, we recently got upgrades to Codex, Claude models, and so much more.
The big blow to the AI usage has been the news that Amazon’s use of gen AI tools has led to quite a few issues. And Github has encountered its own problems with uptime as well. All of it is being linked to AI overuse and AI underdelivering on its promise.
So what’s the reality of the day-to-day AI usage?
As a daily driver
For me, I’ve been coding much less and relying on AI much more for the initial code generation and test generation. However, I’ve found that it’s incredibly important to review the finished work and treat it more like a TDD process. The AI writes the tests, the initial code, and ensures the initial implementation works as close to as described as possible. Then, I go in and review the code to see if things were implemented in a fairly clean and reusable fashion.
Gen AI, in my experience, can create problems and all kinds of drawbacks are present when using this tool. It likes to default to shortcuts no matter how much you prompt it. It tends to misunderstand or skip requirements. If you treat the process with AI as iterative – even in the one-shot sense (where you one-shot implementations/fixes/refactors) – you’ll find yourself being able to mitigate these issues and treat them as another step of the process.
I tend to do a variation of starting the work and then iterating with AI or having the AI start the work and go back and forth with it. Unfortunately, even with this process, you can miss underlying issues and sometimes, you have to go all in with hands-on coding.
I’d call this AI tech bankruptcy. AI creates tech debt but in a unique way – it’s instant and sometimes hard to spot. It can create problems and unforeseen bugs. It can also accumulate over time.
Despite all of this, it feels like a new natural workflow. And the tools are improving. We have planning modes, and skills, and smarter models.
Large codebases
Now, on existing codebases, tools like Claude Code can do really well if they’re forced to stick to a specific architecture and set of rules. If you think of AI as a pattern-matching tool, you’ll find yourself with much better outcomes than if you ask Claude Code to build a new piece of architecture from scratch.
I had a great experience with Claude Code extending a smaller project I was building, my Cottage UI library. I set up an extensive UI library scaffold for just one component – a button. It had a storybook setup, tests, an API I liked, etc. I never got around to implementing another component but with Claude Code, I was able to build out a basic version of everything I needed. Now, I can focus on the details and the look of the UI rather than the boilerplate. It was able to take my Button and essentially “copy” it to create all the basic elements of a UI library.
I’ve also had this experience in larger codebases where it was able to take an existing pattern, like a database model or a service, and understand what I’m asking it to do to create all of the logic I need. Then, I was able to go in and tweak a few things and be done. I write heavily standardized tests and AI is able to take those as a template for new tests. This avoids a lot of the AI tech debt that people associate with generated code.
Is it perfect? No, see my previous section. But the AI tooling is definitely improving and it works well on heavily-structured code.
Prototyping and side projects
Now, this is where things really shine. I’ve built a couple dozen AI-generated projects in the past few months – documented 17 of them. I actually got to a point with working on side projects that I was starting to run out of ideas.
There were quite a few tools that I’ve always wanted to implement – like a “start menu” for the terminal, a “steam” client for terminal-based games, creating an ideas manager, a photo sorting TUI, and so many more. And I got to do it.
Prototypes for design, for new applications, etc. take little time to develop. Do you want to integrate a new 3rd-party service? Point your Claude Code at the docs and tell it what to build. Got an idea for a new UI for your existing backend? Point it at your backend docs and let it run. Prototypes have been commoditized to cost very little to develop. Not a month, not a week, not even a day to get from an idea to a functional prototype that can also serve as a base application for the final product.
Tooling and time
A few things worth mentioning about AI tooling. Agentic harnesses and new tools have made huge strides. I’m actually using Codex. Couldn’t believe it myself. OpenCode is also very productive. I’d say that Claude Code still stands above the rest right now.
We also have sandboxed agents but I think that most people, like me, haven’t had a chance to try it out.
A big advantage of gen AI is the tooling in the end. Without Claude Code, I don’t think many of us would be as productive as we are – this includes CC pushing competition to match their pace.
I’d say that gen AI coding currently takes just about the same amount of time as me doing the work myself. Sometimes more, sometimes less. For prototypes, way less.
But one thing Claude Code does well is multitasking. Either by spinning up separate agents in different terminals or by utilizing CC’s own agent management to work on multiple projects. This is truly where time-savings take place.
Having tools that can pick up “agent” definitions (md files with agent prompts) and various “skills” to customize each agent instance means that you’ve got a pseudo-team. Or, you can just call it multiple workflows. You’re able to have an instance of CC per project or feature. They can work autonomously or with your direct influence. Suddenly, each AI project still takes as long as hand coding, but you’re accomplishing 3x6 hours of work instead of 6 hours of work. Kind of – there’s still the tech debt.
Final verdict?
In short:
- the tools are improving and it’s making a difference
- code needs to be reviewed to stave off tech debt
- AI-generated prototypes are the bee’s knees
- unless you parallelize, AI-generated work takes the same amount of time as hand-coded solutions
Ask me again next quarter. I’ll probably have a different answer.