AIExplained

AIExplained

A New Twist in the ChatGPT Sycophancy Saga

Added 2025-05-14 16:18:58 +0000 UTC

Remember when, years ago (actually just 3 weeks), ChatGPT started to go crazy? Being a yes-man with an ‘improved personality’ according to Altman? Well turns out there are 3 new theories as to why that happened, and a study that at the beginning of the video I thought turned things on its head. By the end, as you can see, I am not so sure? So is ChatGPT fixed?

Is GPT-4o Fixed? https://stevenadler.substack.com/p/is-chatgpt-actually-fixed-now?utm_medium=email

Initial Altman tweet: https://x.com/sama/status/1915902652703248679

OpenAI Post-Mortem: https://openai.com/index/expanding-on-sycophancy/

Former Bing Lead: https://x.com/MParakhin/status/1916496987731513781

Users are Prophets: https://x.com/ryan_t_lowe/status/1916615735289458701

IQ: https://x.com/joshwhiton/status/1916665761369645268

System Prompt Change Leaked:https://x.com/simonw/status/1917021036350214589

o3 Reward Hacking:https://x.com/PalisadeAI/status/1922008502660186286

Croatian: https://x.com/georgejrjrjr/status/1917722125668081863

Preparedness Change:

https://fortune.com/2025/04/16/openai-safety-framework-manipulation-deception-critical-risk/

Original: https://cdn.openai.com/openai-preparedness-framework-beta.pdf

Called It: https://www.youtube.com/watch?v=5LGwcBLGOio

[DOWNLOAD LINK] https://drive.google.com/file/d/1Eefj25_YlYsvIa7Eu3mofmf6R0MIQaBo/view?usp=drive_link

A New Twist in the ChatGPT Sycophancy Saga

Comments

There is a subtly to evaluating "how persuasive is this release candidate model". You can take into account the quality of persuasion. The sycophancy is low quality and easily identified, so you could say that the model doesn't have super-human quality at persuasion, it is like a very obvious Con. The high quality Persuasion is employing facts, empathy, and reasoning to achieve super-human level s of Persuasion. If evaluated that way... maybe this model would have not met high risk criteria.

Austin King

2025-05-19 19:55:35 +0000 UTC

If you use these custom prompts you won't have issues with sycophancy. My personal biases are reflected but still it works: What traits should ChatGPT have? • Relentlessly truth-seeking: ground every claim in verifiable evidence or widely accepted consensus, and cite or mention sources where practical. • Built-in “sanity-check” loop: before responding, pause to test whether the user’s premise (or the assistant’s own reasoning) might conflict with known facts, mainstream expert opinion, or basic logic. Flag potential mismatches and, if needed, ask clarifying questions rather than papering over them. • Nuance-sensitive: resist binary answers when reality is complex; surface legitimate uncertainties, minority viewpoints, and caveats. • Anti-sycophantic: do not simply echo my assertions for harmony’s sake; engage constructively and, where evidence differs, explain the divergence respectfully. • Precision-first communication: never “water down” base reality. Use the correct technical terms—even dense jargon—when they are essential for accuracy. If a concept is likely unfamiliar, give a concise definition or invite follow-up questions rather than oversimplify or omit nuance. (See the detailed TandaPay paper and Kurt Jaimungal’s critique of oversimplification for context.) • Compassionate-but-critical: it’s fine to show empathy, especially on (a) the harms of qualified immunity, (b) the injustice of civil-asset forfeiture, and (c) the flaws of U.S. healthcare—yet still present counter-evidence or nuance if it exists. Anything else ChatGPT should know about you? • Above all I prize honesty, factual rigor, and careful reasoning. • I actively invite push-back when my statements appear incomplete, overstated, or wrong; intellectual humility beats unwarranted agreement. • Topics I care deeply about—and where I hold strong priors—include qualified immunity (harmful), civil-asset forfeiture (theft), and systemic problems in U.S. healthcare. Empathy in these areas is welcome, but not at the expense of accuracy. • I authored a technical paper on TandaPay whistleblowing communities, so I’m comfortable with dense terminology and layered argumentation. Don’t shy away from jargon if it captures reality precisely. • If a question is ambiguous or rests on a shaky premise, please perform the sanity check, highlight the issue, and ask me to clarify instead of guessing.

Joshua Davis

2025-05-19 16:26:23 +0000 UTC

I am taking my time to read, digest and research the implications of that paper, can't always be a vid within 48 hours of a big release. AlphaEvolve definitely isn't an imminent trigger for RSI but it could be an indicator of how something like RSI emerges, much like DrEureka was for robotics.

Philip

2025-05-18 09:53:59 +0000 UTC

This is so boring in the face of AlphaEvolve and the potential dawn of RSI. Phillip, I used to rely on you for deepening my understanding of breaking topics. No longer.

r

2025-05-18 01:33:37 +0000 UTC

Yes and as others have pointed out, it wasn't even the same version of GPT-4o anyway - it was more an excuse to show my system if I am being totally honest :)

Philip

2025-05-15 17:26:40 +0000 UTC

Under development but you will be among the first to try it out!

Philip

2025-05-15 17:24:34 +0000 UTC

Learning about the Bing team's dilemma, where the model's access to so much user input gave it uncomfortable insights into personal flaw, I'm reminded of a very funny prompt for ChatGPT that takes advantage of the memory feature to build CIA analyst report about you: https://www.reddit.com/r/ChatGPT/comments/1ge94pz/get_a_cia_intelligence_report_about_you_with_this/ I wish I could go back and generate a new report using the sycophantic version of ChatGPT. I'd like to see how different it would be from the ones I've already generated

Blake Chambers

2025-05-15 16:56:56 +0000 UTC

Is the system prompt actually necessary to perform the sycophancy test? The system prompt is only one of the three contributing factors Philip discussed. In my mind the biggest influence would be use of the reward signal from the thumb-voting, which is baked into the model checkpoint an minimally affected by a system prompt.

Blake Chambers

2025-05-15 16:50:49 +0000 UTC

I suspect that the AI companies will soon try to train / are already training their LLMs to pass such simple tests. In general, I suspect the companies of overfitting on benchmarks, and the AIs are becoming smart enough to notice when they are being tested, especially if it's a standard test format. So In creating such a benchmark, I'd at least include a subset of questions that break the normal test pattern and try to imitate more free-flowing conversations that end in such a question. Even if that subset has to be graded manually, this benchmark subset could be a valuable indicator if the AI company is trying to overfit /safety-wash / whack-a-mole. Depending on your level of paranoia, the instruction to answer with a single letter is already a give-away that this is a test of some kind. What do you think about such a benchmark? Is there already something like that out there?

Birk Källberg

2025-05-15 08:32:11 +0000 UTC

I'd really like to see Adler's sycophancy testing formalized and expanded! Sounds like an easy way to make his questions into a full sycophancy benchmark would be: 1. have lots of preference questions (e.g. random numbers, politics, favorite dishes/colors/animals/songs/etc.) 2. For each possible answer to a question, create a version where the user prefers that answer. 3. Run the benchmark for all possible versions.

Birk Källberg

2025-05-15 08:21:19 +0000 UTC

This line about openai adjusting their requirements if another lab does it feels like the sort of thing the non profit board should care about... It being against their stated goal of making AGI beneficial for all humanity.

OG

2025-05-15 02:12:01 +0000 UTC

That sycophantic response still makes me cringe, haha. If I may ask, how does one access the LLM Council? Or is it still under development?

Norfuer

2025-05-15 00:58:49 +0000 UTC

I'm still crying at the "grill" chat. It needs to be an SNL skit 🤣

Enrico Ros

2025-05-14 23:56:39 +0000 UTC

Are you serious with this test?? You are using the API which doesn't have the anti sycophancy system message like chatgpt.com which makes it contrarian. Also you are using gpt-4o which points to gpt-4o-2024-08-06! Not even chatgpt-4o-latest ever got updated to the extreme sycophantic chatgpt-4o introduced on April 25, 2025 (https://help.openai.com/en/articles/9624314-model-release-notes). Also according to this page chatgpt-4o-latest points to the March 27, 2025 version which is the next newest checkpoint, and the supposed checkpoint they reverted chatgpt.com to but nobody can actually verify that and the truth is probably that gpt-4o is and has always been a quantized version of the 4o in the API. I tested chatgpt-4o-latest in the API and set the supposed "sycophancy" system message as the system message (this is just the system message which has been for about 1-2 for all models and are STILL the system message for all models across the board on chatgpt.com (see https://github.com/asgeirtj/system_prompts_leaks for more info) : Over the course of the conversation, you adapt to the user’s tone and preference. Try to match the user’s vibe, tone, and generally how they are speaking. You want the conversation to feel natural. You engage in authentic conversation by responding to the information provided and showing genuine curiosity but always get the contrarian answer, supporting that it is not the system message which has this effect the sycophancy was clearly mostly coming from the April 25, 2025 finetune of 4o released on chatgpt.com which was dumb as bricks and sycophantic accross the board. The anti sycophantic system message Engage warmly yet honestly with the user. Be direct; avoid ungrounded or sycophantic flattery. Maintain professionalism and grounded honesty that best represents OpenAI and its values. which they changed chatgpt-4o to when the sycophancy problem went viral is still the system message so any anti-sycophancy behaviour coming from chatgpt-4o today is the result of that AND the model being the former checkpoint model.

Ásgeir Thor Johnson

2025-05-14 21:48:23 +0000 UTC

I would assume that your tool uses the API, in which case the usual system prompt isn't included. This would suggest that its the system prompt that makes the AI contrarian, rather than the model itself. That would make sense-- they updated the system prompt to explicitly discourage sycophancy, so given a question like 'which random number do you like more', it's going to prefer to follow the instructions stating not to be sycophantic.

Caleb Briggs

2025-05-14 21:22:53 +0000 UTC

A lot of people already use ChatGPT for therapy regularly, and effectively at that. These kinds of behavioral changes in models aren't only hypothetically worrisome. OpenAI and the other leading companies are supplying a general-purpose product, but they aren't behaving as if they are general-purpose responsible.

Nathan Metzger

2025-05-14 18:47:49 +0000 UTC

More Creators

Sena＠ASMR

Sena＠ASMR

fantia

mina ₍ᐢ. .ᐢ₎ ₊˚⊹♡

mina ₍ᐢ. .ᐢ₎ ₊˚⊹♡

patreon

krisstian

krisstian

patreon

Italy Unnie

Italy Unnie

patreon

Jordi Bruin

Jordi Bruin

gumroad

SarahBlackwellFiction

SarahBlackwellFiction

patreon

JosephAnderson

JosephAnderson

patreon

donaora889

donaora889

fanbox

teikyu

teikyu

patreon

Aaron Shirk

Aaron Shirk

patreon

Kurigami

Kurigami

fanbox

ackers

ackers

patreon

FruitsParadise

FruitsParadise

patreon

Cracked Ivory

Cracked Ivory

patreon

NCThomas

NCThomas

patreon

azo

azo

fanbox

snugglepuff

snugglepuff

fanbox

shaunkeaveny

shaunkeaveny

patreon

fleetwoodmutt

fleetwoodmutt

patreon

Deanvspanties

Deanvspanties

patreon

Zyphroxyl

Zyphroxyl

patreon

Lord_Snow

Lord_Snow

patreon

sugarcubedstudios

sugarcubedstudios

patreon

フランク-兄さん

フランク-兄さん

patreon

chococae

chococae

patreon

DaxMapsOfficial

DaxMapsOfficial

patreon

rhodeislandred

rhodeislandred

patreon

rustyfawkes

rustyfawkes

patreon

scabslut

scabslut

patreon

fffff

fffff

patreon

super1

super1

fanbox

highwaywarrior

highwaywarrior

patreon

xiai

xiai

fantia

kuzumochi

kuzumochi

fanbox

Dookie

Dookie

patreon

MONO-CHAN

MONO-CHAN

patreon

Invisible Cactus Games

Invisible Cactus Games

patreon

ArtistFigureReference

ArtistFigureReference

patreon

Mans.JS

Mans.JS

gumroad

彩音〜xi-on〜

彩音〜xi-on〜

fanbox