ʕ☞ᴥ ☜ʔ Kix's blog

Fable is... okay

When Anthropic gave everyone access to its "Mythos-class" model Fable, I was very excited to see how it would fit into my workflow.

See, with all the song and dance about how Mythos-class models are supposed to be so powerful to the point of being dangerous -- I was expecting something that's at least twice as smart as Opus.

Instead, it just feels like Opus but without the memory of a goldfish.


How I use Opus in my regular workflows

Claude Opus 4.8 is my orchestrator. I've written about this workflow in detail before, but the short version is: Opus plans the change set, drives sub-agents (usually GPT via Codex) to implement and adversarially review, verifies the final result, then babysits the PR through CI. GPT writes the cleaner code -- it has better attention to detail than I or Claude could -- while Opus handles product context and the harness I've built around it (prompt adherence, recall, and prior memories).

It hasn't 2x-ed my personal velocity, but I am now shipping better-written code at the same pace. Opus has earned my trust over months of this. So when Fable dropped, the obvious question was: can it do this better?


So what if I swap Fable in as orchestrator?

For the past few days, I've replaced the top-level Opus orchestrator with Fable -- it runs the same workflows, same sequences, and does essentially the same thing as Opus.

In a lot of ways, it is better, but not at 2x the price of Opus and certainly not as much as Anthropic's song and dance imply it is.

Where Fable excels:

  1. Better recall across a longer session. When I wanted to fix a cluster of issues instead of just one, it did better than Opus at remembering what the issues were, what's done (and what was implemented), what's left, and all the nuances of the different Opus vs. GPT back-and-forths across a long session. This is genuinely useful -- Opus tends to lose the thread on multi-issue sessions and I have to re-explain context, which breaks momentum.
  2. Conversational style and response. Fable is terser and more direct than Opus, wasting no tokens on fluff and niceties. I care about this more than most people, and Fable's default voice is closer to what I want out of the box -- fewer "Certainly!"-type openers, less padding.

...and that's about it.

Where it falls short (of my expectations):

  1. The code it writes is not meaningfully better by any (subjective) measure than Opus. It still makes the same weird mistakes that Opus makes: smelly TypeScript, inline comments where I don't want them, unit tests that don't add any meaningful proof or validation.
  2. It is way too overconfident for its skill level. It really takes the ethos of "apologize, don't ask for permission" to its vector heart. I've observed it doing risky changes without asking for my explicit approval more than Opus did -- just to circle back and say "Okay, here's where things are at. I made a bad judgment call, no excuses." LLMs aren't software engineers, and Fable seems even more convinced otherwise. Granted, it is very good at realizing it made mistakes and what should have been done instead, but that part feels more performative than substantive.
  3. GPT doesn't find fewer hard blockers when reviewing Fable's work than Opus's -- which I guess is just a general observation about how these models often work better in isolated contexts with targeted changes, regardless of which model proposed the plan. I was expecting Fable to write cleaner code or design tighter plans more of the time than Opus did. (To be fair, if you pit GPT against GPT, it'll also surface a bunch of blockers.)

The absolute hard blocker

After all is said and done, my summary is just that Fable is okay. It's not bad, but it's not as mindblowingly good as Anthropic wants it to appear. Sure, it would be nice to replace Opus with Fable full time in my workflow, but not at 2x the price.

And here's the nail in the coffin: Anthropic announced that all prompts and outputs for Mythos-class models are retained for 30 days for "trust and safety" purposes -- and zero data retention is not available. For a model that costs 2x what Opus does, I'm also supposed to just hand over my prompts and my codebase context for a month? This is just unacceptable.

Turns out, it's all about how strong the harness is around your models these days, more than just the raw model prowess. How well the model works within the workflows you've set is the key indicator of a model's success -- and Fable, for now, doesn't clear that bar.

#ai #anthropic #coding #musings #openai #rant