Claude Has a Goblin Problem, Too
OpenAI’s Charm Offensive, Anthropic’s Missteps, and the Narrative War at the AI Frontier.
Courtroom sketch courtesy OpenAI Images 2.0
In the small morning hours of April 10, a gangly twenty-year-old from a suburb just outside Houston tossed a Molotov cocktail at the gate of Sam Altman’s San Francisco home. Luckily, nobody was hurt. The suspect was arrested an hour later outside OpenAI’s headquarters, where he was attempting to break the glass doors with a chair. And while Silicon Valley took the time (valuable) out of their busy lives to briefly mourn the security structures outside Altman’s $27 million abode, it clearly shook the OpenAI CEO. That evening, Altman posted a photo of his husband and toddler on his blog and, in the same post, wrote the sentence that explains the next three weeks of his life: I underestimated the power of words and narratives.
Three weeks later, we’re here: Altman is sitting beside his lawyers in a federal courthouse in Oakland while Elon Musk, on the witness stand, is asked whether his ambition was once to build “a robot army.” Jurors in voir dire had called Musk “a piece of garbage”; Altman, by comparison, is the reasonable adult founder being attacked by a bitter ex-cofounder. He says little. The frame does the work. Meanwhile, on April 23, the company shipped GPT-5.5, internally codenamed Spud, marketed across every paid tier as the most capable model in the world. At the launch press briefing, a reporter asked OpenAI’s chief research officer whether GPT-5.5 had capabilities comparable to Mythos — the new cybersecurity model that Anthropic, Claude’s maker, had announced two weeks earlier and restricted to twelve named launch partners. He accepted the comparison. The framing landed. Spud is for the people. Mythos is for the boys who go to Davos.
And then, on Tuesday, April 28, an X user named arb8020 posted a screenshot. It was a fragment from OpenAI’s open-source code repository, where the system prompt for Codex — the company’s coding agent — was visible to anyone who scrolled. One line, repeated twice: Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.
The post exploded. Five thousand likes within hours. Three and a half million views. The replies were a call to action. Let my boy talk about creatures. WIRED ran a piece the same day with the headline “OpenAI Really Wants Codex to Shut Up About Goblins.” And for the next forty-eight hours, X was a goblin convention.
The next day, OpenAI published a post titled, “Where the goblins came from.” It is, in genre, an interpretability piece. It walks through how a reward signal trained for the company’s “Nerdy” personality persona started giving extra credit for creature metaphors, which then leaked into other modes through a feedback loop in which the model’s own outputs were folded back into its training data. The team filtered the data, retired the personality, audited an entire bestiary of tic words. (”Most uses of frog turned out to be legitimate” is an actual, real sentence.)
Sam Altman, on X: Feels like codex is having a ChatGPT moment. I meant a goblin moment, sorry. And then: artificial goblin intelligence achieved.
That’s a judo move. It’s called an 払腰. The trick to hit this move, and most moves, is to redirect the opponent’s balance first, so the very thing holding them up, becomes the easiest lever to attack.
For five years, the explanatory work that goes into making sense of why large language models do what they do has been branded — fairly — as Anthropic’s wheelhouse. Constitutional AI. Mechanistic interpretability. The whole pitch, the entire reason a hedge fund or a defense department might prefer Claude to ChatGPT, was that the people who built it could explain, in disciplined and slightly intimidating prose, what was actually happening inside the machine. OpenAI was supposed to be the company that ships first and apologizes later. Anthropic was supposed to be the company that thinks carefully.
Narrative Judo: Illustration of Technique. OpenAI, 1961.
What just happened is that OpenAI took a serious, methodologically transparent piece about reward hacking — the kind of post Anthropic has been committing exhaustively to PDFs for five years — and made it into a viral moment about fun little goblins. Not just any viral story: a story that ordinary people, the kind who do not have a Substack about AI alignment, shared. Because the post was funny and a little weird. And better than, “AI is going to take your career and fuck your face.” And it made the company look human, and self-aware, and willing to publish a list of words that includes “raccoons” and “ogres.”
That very same week, OpenAI also: shipped a flagship model marketed for everyone; sat through a trial that made its CEO look like the only adult in a room with Elon Musk; framed its competitor’s most prestigious launch as Davos; and absorbed a literal Molotov attack with a blog post about family photographs.
The thing nobody is saying, because it is tonally awkward to say in week three of a company’s bad month, is that OpenAI is cooking and Anthropic is slightly stalling.
The substance hasn’t moved. Mythos remains a genuinely impressive piece of cybersecurity tooling — Mozilla used the preview to find and patch two hundred and seventy-one vulnerabilities in Firefox. Claude Opus 4.7 reportedly bested GPT-5.5 in seven of seven head-to-head comparisons in a recent Tom’s Guide review. The papers are still good. The work is still serious. The position, however, has shifted, and the shift is in territory that doesn’t show up on benchmark charts.
Anthropic’s brand, since founding, has been we are the smart people who care deeply. This works when the implicit comparison is OpenAI/xAI as reckless. It does not work when OpenAI is also doing the careful-adult things, and charmingly so, on the channels where ordinary people are paying attention. Care, as a brand asset, has the architecture of a comparison: its value is relational. Mirror the comparison and the asset flips. What was conscientiousness becomes condescension. What was rigor becomes the kid who reminds the teacher about homework; to frame it in terms of archetypes from the 1997 animated children’s television program Recess, you can go from a TJ to a Randall. Real quick.
You can see this happening in the smaller signals. On the same day Anthropic announced Project Glasswing — its restricted-access coalition for Mythos, with that named launch list of AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan, Microsoft, Nvidia, and Palo Alto Networks — a private Discord group focused on tracking unreleased AI models guessed the URL of the model and got in, using a leaked credential from a contractor at an unrelated AI staffing startup. The security press described the access path as “low-sophistication, high-impact.” There is, in this story, an unmistakable folkloric shape: the kids who picked the lock the grown-ups had so carefully put on the cabinet. Anthropic’s response — a contained investigation, a measured statement probably narratively mismatched to a story whose moral, in the popular telling, was that the so-called grown-ups were the problem.
Meanwhile, the Pentagon’s AI chief said publicly that reliance on a single model is “never a good thing,” confirmed expanded use of Google’s models, and the Anthropic-DoD adoption story that had defined a stretch of the spring (and Claude’s breakout adoption moment) quietly cooled. And at the same time, Altman started writing tweets that sound the way Anthropic CEO Dario Amodei used to sound. And the OpenAI corporate voice has clearly migrated or evolved. Now it’s methodologically transparent, intellectually curious, slightly self-deprecating, willing to share the investigation tooling. It is, in other words, the voice Anthropic spent years building. OpenAI didn’t outdo Anthropic at it. OpenAI just decided to also do it, on X. Slightly more playful and less intellectually performative-seeming.
The barbarians, nay, the goblins, are at the gates. Not in a literal sense, of course, but this week the goblins are occupying Claude’s position in the race. The same lesson that runs through OpenAI’s post-mortem —a behavior optimized hard enough to be useful starts leaking past its intended scope — is the brand lesson for Anthropic. For years, care was the optimization target. It was rewarded across every public surface. And now the optimization has broken containment into a space the competitor has just calmly started occupying.
There are exits. We are wrong slowly — public, voluntary epistemic humility, executed before someone else leaks the mistake. We do the unsexy work — the language of labor rather than virtue. We tell you when we’re wrong before you find out. These are not slogans; they are behavioral commitments with reputational risk attached, which is what makes them un-mirrorable. They are, all of them, things Anthropic could plausibly do, because they are extensions of what Anthropic actually does. The harder shift is venue. Long PDFs are reference material. Insular, even. The narrative game is being played in tweets, in the sentences read out loud at press briefings, in post-mortems written like prose.
So, what to do about Claude in lieu of a glass bottle IED? A hint: of the sordid Goblin-like creatures also banned in ChatGPT 5.5’s system prompt, we inexplicably find gremlins, raccoons, trolls, even ogres. When asked about it, one user got a response that they were all “creatures of mischief.” One caveat from OpenAI’s post, however: Most uses of frog turned out to be legitimate. Instructive. Most uses of “we care, we are thoughtful, we are curious,” are all very much still legitimate, too. But, like the goblins in ChatGPT, they are not alone.



