Forgot your password?
typodupeerror
AI The Internet

OpenAI Codex System Prompt Includes Explicit Directive To 'Never Talk About Goblins' 22

An anonymous reader quotes a report from Ars Technica: The system prompt for OpenAI's Codex CLI contains a perplexing and repeated warning for the most recent GPT model to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."

The explicit operational warning was made public last week as part of the latest open source code for Codex CLI that OpenAI posted on GitHub. The prohibition is repeated twice in a 3,500-plus word set of "base instructions" for the recently released GPT-5.5, alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed" and to "never use destructive commands like 'git reset --hard' or 'git checkout --' unless the user has clearly asked for that operation."

Separate system prompt instructions for earlier models contained in the same JSON file do not contain the specific prohibition against mentioning goblins and other creatures, suggesting OpenAI is fighting a new problem that has popped up in its latest model release. Anecdotal evidence on social media shows some users complaining about GPT's penchant for focusing on goblins in completely unrelated conversations in recent days.
Update: OpenAI has published a blog post explaining "where the goblins came from."

In short, a training signal meant to encourage its "Nerdy" personality accidentally rewarded creature-heavy metaphors, causing words like "goblins" and "gremlins" to spread beyond that personality into broader model behavior. OpenAI says it has since retired the Nerdy personality, removed the goblin-friendly reward signal, and filtered creature-word examples from training data to keep the quirk from resurfacing in inappropriate contexts.

OpenAI Codex System Prompt Includes Explicit Directive To 'Never Talk About Goblins'

Comments Filter:
  • by JoshuaZ ( 1134087 ) on Thursday April 30, 2026 @11:11AM (#66120318) Homepage
    This is obviously pretty funny at some level, and an amusing example of how training can go wrong in somewhat subtle ways. This is in some respects a less substantial example of how [Claude Opus essentially hacked itself into caring a lot more about ethics](https://www.lesswrong.com/posts/ioZxrP7BhS5ArK59w/did-claude-3-opus-align-itself-via-gradient-hacking). But both of these are examples of the same central issues: LLM AIs even in their current form are hard to predict, hard to control, and can end up with very weird hard to predict or adjust behavior.
    • They are roughly equivalent to a very knowledgeable human with a bit of schizophrenia.
    • To be fair, in this instance they almost specifically instructed the AI to act like this:

      "You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap o

    • Its why my sister-in-law got upset with me when I pointed to cows and told my 2 year old niece, "Look at those dogs."

  • >>alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed"

    Since overuse of emojis and em dashes are a classic indicator of AI generated text that people now know to look for, it pretty clear they are actively trying to hide the nature of their LLM output.

    • That does seem like a valid concern in terms of the em dash especially, which does look like an attempt to make the text more easily passed off as not AI. I suspect that the emojis may be connected to a separate concern: they are freaking annoying. If I'm using an LLM to code something up, I don't want the comments to include smiley faces or a frowny face comment when it runs into an error, and that's not the only example. Reducing emoji use may be in part just to make users happier.
      • If you care about precision in language, emojs have no space. What each one means can sometimes can be very ambiguous. I only really use AI in search results but I prefer to get information with precise language and not wishy-washy nonsense, not some pictogram that could mean literally anything.

        • by DarkOx ( 621550 )

          You could make all the same criticisms of most common English.

          For instance if someone says that they are ambivalent about a decision?

          What do they mean. About half the people you ask would say, he does not care either-way. The other half would probably say he is torn, or of two minds about it. Of that latter group some might conclude this also implies grave concern about this issue, while others don't.

          So how much clearer a communication was it than writing -\_o_/- ?

  • I shouldn't be surprised, but it is an example of the astonishing amount of control the managers or owners of OpenAI have over what the world sees in their communications with AI. Sure, maybe erasing goblins from conversations isn't a big deal...but it could easily be applied to world events, politicians, crime, corruption, etc.
  • I did once, but I think I got away with it.

  • by Arrogant-Bastard ( 141720 ) on Thursday April 30, 2026 @12:01PM (#66120396)
    When you're trying to train a model, it's critically important that you scrutinize every piece of training data -- meticulously. The larger and more complex the model, the more important this becomes.

    If you neglect this, then the model may fail in anomalous and unpredictable ways. In other words: you can run 10,000 tests and they'll all be just fine, but when you run the 10,001st, the model fails. Worse, you won't know how...or why...or how to fix it, because the answers to those questions are buried in a network too large for a human being to comprehend. This problem has been well-known for decades; it's how things like this: Tesla Autopilot Confuses Boy In Orange Shirt For A Cone In Brazil [insideevs.com] happen. They thought they were training the vision system to recognize traffic cones; they were really training it to recognize orange objects of a certain size and height:width ratio.

    Faced with this situation, you can either (a) go back and figure out what you did wrong in the training process or (b) slap a half-ass patch on this particular failure to just make it go away. Choosing (b) is simple and quick and easy and cheap. But if you pick that choice and skip (a), then you have zero assurance that the 15,027th test or the 21,922nd test won't fail just as badly, because you did nothing to address the root cause.

    And predictably, this -- choice (b) -- s what OpenAI has done. It's predictable because they made no attempt whatsoever to curate the training data in the first place -- they just stole everything they could from the entire Internet -- because they're cheap and lazy and a in hurry to cash in before the bubble bursts. This move is entirely consistent with that approach. I would call it "poor software engineering" but it doesn't even deserve to be in the same sentence with "engineering".
  • Imagine hiring an employee that required such a level of micro-management. You'd be showing them the door and thanking them for their contributions.

  • Robots don't need personality
    They need to work accurately and reliably

  • I clicked on the link ... and my browser Librewolf went nuts:

    Be careful. Something doesnâ€(TM)t look right.

    LibreWolf spotted a potentially serious security issue with arstechnica.com. Someone pretending to be the site could try to steal things like credit card info, passwords, or emails.

    Be careful out there.
  • "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures"

    Pardon me while I channel my inner Carlin creating my next set of OpenAI usernames.

    Trolly McTrollface is about to go HAM on that new sticks-n-stones plugin..

One of the most overlooked advantages to computers is... If they do foul up, there's no law against whacking them around a little. -- Joe Martin

Working...