The age of thin clients and middle managers
Thankfully I didn't make a promise to eat my boot when I kept harping about how we shouldn't anthropomorphize AI agents -- because boy oh boy, I'd be full of boot.
Frontier LLMs in mid-2026 aren't the same as they were in 2025. Heck, with how quickly OpenAI and Anthropic are shipping new GPT-5s and Claude 4s, things aren't the same as they were a few months ago.
Unironically, AI is making me smarter
Someone recently wrote that AI is making them dumb. A few months ago, I felt the same way. But starting with GPT-5.3 Codex, AI has genuinely started to make me feel smarter.
Look -- I've never been the best at writing code. So when GPT-5.3 Codex came out I got into the habit of asking it to review my code whenever I pushed PRs. To the surprise of absolutely nobody, I often got completely obliterated. "This is silently introducing a hidden failure mode" / "This should be rewritten using dependency injection" / "This entire section is redundant and should be ablated to tighten the architecture up".
It reminded me of some really brilliant Eastern European folks I've worked with in the past -- highly technical, analytical, and no-nonsense in their brutal honesty.
In the age of sycophantic AI, this is more than welcome. I find myself learning best when there's a sprinkling of shame and self-doubt.
Yeah, anthropomorphize the heck out of them
The moment it clicked for me: I needed to build a quick feature at one point without access to my machine -- just the Claude iOS app hooked up to my desktop via Dispatch. I told it to SSH into my dev workstation and instruct Codex to build the thing, review it, and send a PR I could remotely review using the Github app.
The feature was dead simple and didn't have risky failure modes, so I just let it churn. I specifically asked Claude to drive Codex because while Claude is generally pleasant to talk to and groks high-level product signals well, Codex still writes tighter code -- usually less smelly and stronger architecturally. For fun, I had it run everything in full auto.
In a few minutes, it came back with a PR so well written (and used my tone and language from prior PRs to boot) I was deeply impressed.
And that was when it all clicked: for $200/month in AI costs I could assemble a very competent duo of AI agents I could lean on for a lot of things -- Claude at the helm with its wealth of memories from our previous sessions, and Codex doing the mechanical work of writing and auditing the code.
My typical new feature branch now looks like this: I give Claude the high-level goal and ask it to design a plan. Then, have it dispatch the work of writing the code and performing the static verification to Codex. With the prior context I gave it, I then have Claude run rigorous acceptance and end-to-end testing -- prove to me that it works by actually running a real build in the pipeline, and if relevant, drive Playwright to show me screenshots and recordings of the feature working.
I then go to Github, scrutinize the code, make notes of things I want changed or fixed, and send it back to the duo. I make sure to scold them whenever they overstep the product decision boundary, which they seem to learn really quickly.
For even quicker tasks, I have Claude drive Cursor instead using their Auto model selection (usually defaulting to either Composer 2 or GPT-5.3 Codex), but still have Codex do the rigorous code analysis.
If we're doing anything new-ish this time, I make sure to have Claude materialize a new skill to make the workflow we just performed repeatable, and write memories and handoff notes should I wish to revisit this later.
Tmux, Happy, and Attach
This workflow has gotten so productive for me that I've started exploring ways to break out of the Claude app-as-launchpad pattern. I found a combo that works really well for me: tmux running on my dev workstation, Attach to make pane management a bit more pleasant, and sometimes I drive Happy if I want to start multiple parallel sessions without going through tmux (I find Happy a bit finicky though, so I mostly fall back to tmux + Attach on most days).
In fact, ironically, I'm now shipping and fixing more than when I was tethered to my desk the entire day. I'm shipping features and fixing bugs in a coffee shop, in my living room while watching Minecraft playthroughs, or heck, even during a commute.
Is this agent psychosis?
Good question. I'd like to think that, out of all things my psychosis manifests in, agentic work is not one of them.
I genuinely believe that GPT-5.5 writes code better than I can physically do. And that Opus 4.7 has gotten good enough to the point that it's earned my trust in letting it be at the helm of things I'm shipping.
I'm getting a lot more done with much less toil, and much less mental overhead. It really does feel more like managing a team of highly competent software engineers than talking to a bunch of stochastic puppets.
It hits the sweet spot I've been looking for most of my career: be somewhere between a manager and a tech lead -- have the ability to train engineers and delegate work, while having the opportunity to write code now and then.
We're all middle managers now. I've always had distaste for the term "soft skills" -- it diminishes the very hard skill of interpersonal communication and management. And with agentic coding, I'd argue those skills are just as important as your coding "hard skills" to have what you're shipping land on target. The duo writes the code; my job is making sure they're writing the right code.
Everything runs on my dev workstation remotely now. I've even "downgraded" from a 15-inch M4 Max MacBook to a 13-inch M5 Air -- running the full software stack and writing everything by hand just doesn't make that much sense anymore, but having a duo (and occasional trio) of agents available basically 24/7 to ship stuff on a fully materialized stack makes a ton of sense.