OpenAI Codex System Prompt Includes Explicit Directive To 'Never Talk About Goblins' 22
An anonymous reader quotes a report from Ars Technica: The system prompt for OpenAI's Codex CLI contains a perplexing and repeated warning for the most recent GPT model to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."
The explicit operational warning was made public last week as part of the latest open source code for Codex CLI that OpenAI posted on GitHub. The prohibition is repeated twice in a 3,500-plus word set of "base instructions" for the recently released GPT-5.5, alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed" and to "never use destructive commands like 'git reset --hard' or 'git checkout --' unless the user has clearly asked for that operation."
Separate system prompt instructions for earlier models contained in the same JSON file do not contain the specific prohibition against mentioning goblins and other creatures, suggesting OpenAI is fighting a new problem that has popped up in its latest model release. Anecdotal evidence on social media shows some users complaining about GPT's penchant for focusing on goblins in completely unrelated conversations in recent days. Update: OpenAI has published a blog post explaining "where the goblins came from."
In short, a training signal meant to encourage its "Nerdy" personality accidentally rewarded creature-heavy metaphors, causing words like "goblins" and "gremlins" to spread beyond that personality into broader model behavior. OpenAI says it has since retired the Nerdy personality, removed the goblin-friendly reward signal, and filtered creature-word examples from training data to keep the quirk from resurfacing in inappropriate contexts.
The explicit operational warning was made public last week as part of the latest open source code for Codex CLI that OpenAI posted on GitHub. The prohibition is repeated twice in a 3,500-plus word set of "base instructions" for the recently released GPT-5.5, alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed" and to "never use destructive commands like 'git reset --hard' or 'git checkout --' unless the user has clearly asked for that operation."
Separate system prompt instructions for earlier models contained in the same JSON file do not contain the specific prohibition against mentioning goblins and other creatures, suggesting OpenAI is fighting a new problem that has popped up in its latest model release. Anecdotal evidence on social media shows some users complaining about GPT's penchant for focusing on goblins in completely unrelated conversations in recent days. Update: OpenAI has published a blog post explaining "where the goblins came from."
In short, a training signal meant to encourage its "Nerdy" personality accidentally rewarded creature-heavy metaphors, causing words like "goblins" and "gremlins" to spread beyond that personality into broader model behavior. OpenAI says it has since retired the Nerdy personality, removed the goblin-friendly reward signal, and filtered creature-word examples from training data to keep the quirk from resurfacing in inappropriate contexts.
Funny but serious (Score:3)
Re: (Score:2)
Re: (Score:3)
To be fair, in this instance they almost specifically instructed the AI to act like this:
"You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap o
Re: (Score:2)
Its why my sister-in-law got upset with me when I pointed to cows and told my 2 year old niece, "Look at those dogs."
I'm more concerned out this (Score:2)
>>alongside more anodyne reminders not to "use emojis or em dashes unless explicitly instructed"
Since overuse of emojis and em dashes are a classic indicator of AI generated text that people now know to look for, it pretty clear they are actively trying to hide the nature of their LLM output.
Re: (Score:2)
Re: (Score:2)
If you care about precision in language, emojs have no space. What each one means can sometimes can be very ambiguous. I only really use AI in search results but I prefer to get information with precise language and not wishy-washy nonsense, not some pictogram that could mean literally anything.
Re: (Score:2)
You could make all the same criticisms of most common English.
For instance if someone says that they are ambivalent about a decision?
What do they mean. About half the people you ask would say, he does not care either-way. The other half would probably say he is torn, or of two minds about it. Of that latter group some might conclude this also implies grave concern about this issue, while others don't.
So how much clearer a communication was it than writing -\_o_/- ?
Example of control (Score:2)
Also - don't mention the war! (Score:2)
I did once, but I think I got away with it.
Re: (Score:2)
We've always been at war with Eurasia.
This is what uncurated training causes (Score:4, Insightful)
If you neglect this, then the model may fail in anomalous and unpredictable ways. In other words: you can run 10,000 tests and they'll all be just fine, but when you run the 10,001st, the model fails. Worse, you won't know how...or why...or how to fix it, because the answers to those questions are buried in a network too large for a human being to comprehend. This problem has been well-known for decades; it's how things like this: Tesla Autopilot Confuses Boy In Orange Shirt For A Cone In Brazil [insideevs.com] happen. They thought they were training the vision system to recognize traffic cones; they were really training it to recognize orange objects of a certain size and height:width ratio.
Faced with this situation, you can either (a) go back and figure out what you did wrong in the training process or (b) slap a half-ass patch on this particular failure to just make it go away. Choosing (b) is simple and quick and easy and cheap. But if you pick that choice and skip (a), then you have zero assurance that the 15,027th test or the 21,922nd test won't fail just as badly, because you did nothing to address the root cause.
And predictably, this -- choice (b) -- s what OpenAI has done. It's predictable because they made no attempt whatsoever to curate the training data in the first place -- they just stole everything they could from the entire Internet -- because they're cheap and lazy and a in hurry to cash in before the bubble bursts. This move is entirely consistent with that approach. I would call it "poor software engineering" but it doesn't even deserve to be in the same sentence with "engineering".
Imagine hiring an employee (Score:2)
Imagine hiring an employee that required such a level of micro-management. You'd be showing them the door and thanking them for their contributions.
encourage its "Nerdy" personality (Score:2)
Robots don't need personality
They need to work accurately and reliably
stop (Score:2)
Be careful. Something doesnâ€(TM)t look right.
LibreWolf spotted a potentially serious security issue with arstechnica.com. Someone pretending to be the site could try to steal things like credit card info, passwords, or emails.
Be careful out there.
Seven Flirty Words. (Score:2)
"never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures"
Pardon me while I channel my inner Carlin creating my next set of OpenAI usernames.
Trolly McTrollface is about to go HAM on that new sticks-n-stones plugin..