OpenAI Codex System Prompt Includes Explicit Directive To 'Never Talk About Goblins' 44

Posted by BeauHD on Thursday April 30, 2026 @11:00AM from the news-for-goblins-stuff-that-matters dept.

An anonymous reader quotes a report from Ars Technica: The system prompt for OpenAI's Codex CLI contains a perplexing and repeated warning for the most recent GPT model to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."

The explicit operational warning was made public last week as part of the latest open source code for Codex CLI that OpenAI posted on GitHub. The prohibition is repeated twice in a 3,500-plus word set of "base instructions" for the recently released GPT-5.5, alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed" and to "never use destructive commands like 'git reset --hard' or 'git checkout --' unless the user has clearly asked for that operation."

Separate system prompt instructions for earlier models contained in the same JSON file do not contain the specific prohibition against mentioning goblins and other creatures, suggesting OpenAI is fighting a new problem that has popped up in its latest model release. Anecdotal evidence on social media shows some users complaining about GPT's penchant for focusing on goblins in completely unrelated conversations in recent days. Update: OpenAI has published a blog post explaining "where the goblins came from."

In short, a training signal meant to encourage its "Nerdy" personality accidentally rewarded creature-heavy metaphors, causing words like "goblins" and "gremlins" to spread beyond that personality into broader model behavior. OpenAI says it has since retired the Nerdy personality, removed the goblin-friendly reward signal, and filtered creature-word examples from training data to keep the quirk from resurfacing in inappropriate contexts.

OpenAI Codex System Prompt Includes Explicit Directive To 'Never Talk About Goblins'

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 44 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:2)
  
  by Bill, Shooter of Bul ( 629286 ) writes:
  
  AI isn't ready until it blames previous developers with an appropriate amount of profanity, before resolving in self loathing and self destructive alcoholism
  
  # fuck this shit, fucking lscagg, fuck my life its fucking miller time, actually fuck it. inviting my friends johnny, jim and jack over , fucking three wise men are needed to wipe this fucking shit from my brain
Funny but serious (Score:3)

by JoshuaZ ( 1134087 ) writes: on Thursday April 30, 2026 @11:11AM (#66120318) Homepage

This is obviously pretty funny at some level, and an amusing example of how training can go wrong in somewhat subtle ways. This is in some respects a less substantial example of how [Claude Opus essentially hacked itself into caring a lot more about ethics](https://www.lesswrong.com/posts/ioZxrP7BhS5ArK59w/did-claude-3-opus-align-itself-via-gradient-hacking). But both of these are examples of the same central issues: LLM AIs even in their current form are hard to predict, hard to control, and can end up with very weird hard to predict or adjust behavior.

- Re: (Score:2)
  
  by AmazingRuss ( 555076 ) writes:
  
  They are roughly equivalent to a very knowledgeable human with a bit of schizophrenia.
- Re:Funny but serious (Score:4, Insightful)
  
  by dinfinity ( 2300094 ) writes: on Thursday April 30, 2026 @11:41AM (#66120370)
  
  To be fair, in this instance they almost specifically instructed the AI to act like this:
  "You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness. [...]"
  Why the hell they thought that is what a "Nerdy" personality is, is a whole different story.
  
  - Re: (Score:2)
    
    by geekmux ( 1040042 ) writes:
    
    To be fair, in this instance they almost specifically instructed the AI to act like this:
    "You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness. [...]"
    Why the hell they thought that is what a "Nerdy" personality is, is a whole different story.
    To be fair, Demolition Man was a little nerdy at the time, and might explain the inspiration.
    https://www.youtube.com/watch?... [youtube.com]
    - Re: (Score:2)
      
      by fahrbot-bot ( 874524 ) writes:
      
      To be fair, Demolition Man was a little nerdy at the time, and might explain the inspiration.
      Yup. I mean, how many other science-fiction, action movies discuss Constitutional Amendments [youtube.com]? :-)
  - Re: Funny but serious (Score:2)
    
    by home-electro.com ( 1284676 ) writes:
    
    I wonder why they thought that developing **any** "personality" in model is something users would appreciate. I like the matter of fact style of Claude and I don't consider it to have any personality. I mostly look at the code it outputs anyway.
- Re: (Score:2)
  
  by awwshit ( 6214476 ) writes:
  
  Its why my sister-in-law got upset with me when I pointed to cows and told my 2 year old niece, "Look at those dogs."
  - Re: (Score:3)
    
    by fahrbot-bot ( 874524 ) writes:
    
    Its why my sister-in-law got upset with me when I pointed to cows and told my 2 year old niece, "Look at those dogs."
    And Calvin's dad describing how the world was black and white [reddit.com] before colors were invented. :-)
    I once had a 1969 VW Beetle and those only had one dashboard light for both turn signals -- like this <=> -- and a friend asked me how you could tell if it was lighting up for the left or right signal. I replied that it blinked on then off for the left signal but off then on for the right.
    - Re: Funny but serious (Score:2)
      
      by Tomahawk ( 1343 ) writes:
      
      ROTFL! I just shot water out my nose. Thank you for that.
- Re: (Score:3)
  
  by sacrilicious ( 316896 ) writes:
  
  an amusing example of how training can go wrong
  My understanding is that this isn't a consequence of a flawed training algorithm or process; it's instead a consequence of the limitations of LLMs, emergent from their training materials. It closely parallels another example I've seen around the net, that of asking an LLM about getting a car to the mechanic, noting it's a sunny day and the mechanic is just a block away, and having the LLM suggest walking... which is a consequence of the bias in training materials toward walking because lots of people make
I'm more concerned out this (Score:3)

by Comboman ( 895500 ) writes: on Thursday April 30, 2026 @11:36AM (#66120366)

>>alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed"
Since overuse of emojis and em dashes are a classic indicator of AI generated text that people now know to look for, it pretty clear they are actively trying to hide the nature of their LLM output.

- Re: (Score:3)
  
  by JoshuaZ ( 1134087 ) writes:
  
  That does seem like a valid concern in terms of the em dash especially, which does look like an attempt to make the text more easily passed off as not AI. I suspect that the emojis may be connected to a separate concern: they are freaking annoying. If I'm using an LLM to code something up, I don't want the comments to include smiley faces or a frowny face comment when it runs into an error, and that's not the only example. Reducing emoji use may be in part just to make users happier.
  - Re: (Score:2)
    
    by sarren1901 ( 5415506 ) writes:
    
    If you care about precision in language, emojs have no space. What each one means can sometimes can be very ambiguous. I only really use AI in search results but I prefer to get information with precise language and not wishy-washy nonsense, not some pictogram that could mean literally anything.
    - Re: (Score:2)
      
      by DarkOx ( 621550 ) writes:
      
      You could make all the same criticisms of most common English.
      For instance if someone says that they are ambivalent about a decision?
      What do they mean. About half the people you ask would say, he does not care either-way. The other half would probably say he is torn, or of two minds about it. Of that latter group some might conclude this also implies grave concern about this issue, while others don't.
      So how much clearer a communication was it than writing -\_o_/- ?
      - Re: (Score:2)
        
        by sarren1901 ( 5415506 ) writes:
        
        The difference between using a word such as ambivalence, which has a specific meaning found in the dictionary or an emoj that isn't etched in stone.
        For the record, ambivalence means to be undecided. What you wrote, -\_o_/-, has zero meaning to me. I tried looking it up and no go. I suppose I can look at it and it kind of sort of looks like a person doing a shrug motion but I'm only guessing that because you already mentioned ambivalence, which has a specific meaning that's very easy to look up.
        Maybe I'm jus
Example of control (Score:2)

by fropenn ( 1116699 ) writes:

I shouldn't be surprised, but it is an example of the astonishing amount of control the managers or owners of OpenAI have over what the world sees in their communications with AI. Sure, maybe erasing goblins from conversations isn't a big deal...but it could easily be applied to world events, politicians, crime, corruption, etc.
Also - don't mention the war! (Score:2)

by 93 Escort Wagon ( 326346 ) writes:

I did once, but I think I got away with it.
- Re: (Score:2)
  
  by sarren1901 ( 5415506 ) writes:
  
  We've always been at war with Eurasia.
  - Re: (Score:2)
    
    by CptJeanLuc ( 1889586 ) writes:
    
    First: I just burned your comment in the database, second: i just burned my burning, third: I burned my burning of the burning and so entered a pretty long loop, second to last: Eurasia is an ally and we've always been at war with Eastasia, finally: any criticism of our loyal ally Eurasia will be severely punished.
This is what uncurated training causes (Score:5, Insightful)

by Arrogant-Bastard ( 141720 ) writes: on Thursday April 30, 2026 @12:01PM (#66120396)

When you're trying to train a model, it's critically important that you scrutinize every piece of training data -- meticulously. The larger and more complex the model, the more important this becomes.

If you neglect this, then the model may fail in anomalous and unpredictable ways. In other words: you can run 10,000 tests and they'll all be just fine, but when you run the 10,001st, the model fails. Worse, you won't know how...or why...or how to fix it, because the answers to those questions are buried in a network too large for a human being to comprehend. This problem has been well-known for decades; it's how things like this: Tesla Autopilot Confuses Boy In Orange Shirt For A Cone In Brazil [insideevs.com] happen. They thought they were training the vision system to recognize traffic cones; they were really training it to recognize orange objects of a certain size and height:width ratio.

Faced with this situation, you can either (a) go back and figure out what you did wrong in the training process or (b) slap a half-ass patch on this particular failure to just make it go away. Choosing (b) is simple and quick and easy and cheap. But if you pick that choice and skip (a), then you have zero assurance that the 15,027th test or the 21,922nd test won't fail just as badly, because you did nothing to address the root cause.

And predictably, this -- choice (b) -- s what OpenAI has done. It's predictable because they made no attempt whatsoever to curate the training data in the first place -- they just stole everything they could from the entire Internet -- because they're cheap and lazy and a in hurry to cash in before the bubble bursts. This move is entirely consistent with that approach. I would call it "poor software engineering" but it doesn't even deserve to be in the same sentence with "engineering".

- Re: (Score:2)
  
  by Gideon Fubar ( 833343 ) writes:
  
  something something cost benefit analysis
  I wonder what "exponentially diminishing returns" means
  I'm sure they'll work out that taking shortcuts means not having oversight over decisions made along the way at some point. Maybe.
Imagine hiring an employee (Score:2)

by reanjr ( 588767 ) writes:

Imagine hiring an employee that required such a level of micro-management. You'd be showing them the door and thanking them for their contributions.
encourage its "Nerdy" personality (Score:2)

by MpVpRb ( 1423381 ) writes:

Robots don't need personality
They need to work accurately and reliably
stop (Score:2)

by noshellswill ( 598066 ) writes:

I clicked on the link ... and my browser Librewolf went nuts: Be careful. Something doesnâ€(TM)t look right. LibreWolf spotted a potentially serious security issue with arstechnica.com. Someone pretending to be the site could try to steal things like credit card info, passwords, or emails. Be careful out there.
Seven Flirty Words. (Score:2)

by geekmux ( 1040042 ) writes:

"never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures"
Pardon me while I channel my inner Carlin creating my next set of OpenAI usernames.
Trolly McTrollface is about to go HAM on that new sticks-n-stones plugin..
Personality (Score:2)

by ThurstonMoore ( 605470 ) writes:

How about not giving it a personality at all? Just do what I say to do.
Don't mention goblins (Score:2)

by VAXcat ( 674775 ) writes:

The first rule of Goblin AI club is don't talk about Gobin AI club.
Please add... (Score:2)

by johnnys ( 592333 ) writes:

Asimov's three laws.
We have enough psychopaths and sociopaths trying to rip us off and kill us with MI (meat intelligence). Let's see if we can keep AI ready to protect humans, obey humans, and protect itself. In that order.
Welcome to Slashdot (Score:2)

by nuckfuts ( 690967 ) writes:

News for Nerdy Personalities
The first rule of Fight Club (Score:2)

by Required Snark ( 1702878 ) writes:

No AI in Fight Club. Humans only.
Fight Club (Score:2)

by PPH ( 736903 ) writes:

First rule of Goblins (AI) never talk about Goblins.
rules (Score:2)

by groobly ( 6155920 ) writes:

So, LLMs are evolving into the rules-based systems that were abandoned in the 1980s.
- Re: (Score:2)
  
  by toxonix ( 1793960 ) writes:
  
  AKA expert systems. There was a lot of optimism about what could be achieved by expert systems before the early 70's.
  "Many researchers were caught up in a web of increasing exaggeration"
  https://en.wikipedia.org/wiki/... [wikipedia.org]
Wrong creature (Score:2)

by toxonix ( 1793960 ) writes:

Not goblins or gremlins. You are more likely to be eaten by a grue in the dark. I don't see any mention of grues.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re: (Score:2)

Funny but serious (Score:3)

Re: (Score:2)

Re:Funny but serious (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: Funny but serious (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: Funny but serious (Score:2)

Re: (Score:3)

I'm more concerned out this (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Example of control (Score:2)

Also - don't mention the war! (Score:2)

Re: (Score:2)

Re: (Score:2)

This is what uncurated training causes (Score:5, Insightful)

Re: (Score:2)

Imagine hiring an employee (Score:2)

encourage its "Nerdy" personality (Score:2)

stop (Score:2)

Seven Flirty Words. (Score:2)

Personality (Score:2)

Don't mention goblins (Score:2)

Please add... (Score:2)

Welcome to Slashdot (Score:2)

The first rule of Fight Club (Score:2)

Fight Club (Score:2)

rules (Score:2)

Re: (Score:2)

Wrong creature (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals