XaiJu
AIExplained
AIExplained

patreon


Paper: AI Doesn't Say What It Thinks. AI Orgs: It Could Be Your Friend

A fantastic new Anthropic paper shows that LLMs seem hell-bent on obfuscating why they gave an answer, even when it would be easier to be honest.

Download:
https://drive.google.com/file/d/12jgVrGZLtVC_8DGTczQIzCVWPIBrnA0y/view?usp=sharing

Paper:
https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf

Post: https://www.anthropic.com/research/reasoning-models-dont-say-think

FT Exclusive: https://x.com/FT/status/1910545751119135199

DOGE Usage: https://www.reuters.com/technology/artificial-intelligence/musks-doge-using-ai-snoop-us-federal-workers-sources-say-2025-04-08/

Coconut Paper: https://arxiv.org/pdf/2412.06769

Noam post:https://x.com/polynoamial/status/1910379351759347860

‘Deep Misgivings’: https://tech.yahoo.com/ai/articles/openais-sam-altman-deep-misgivings-093754634.html

OG Paper: https://arxiv.org/pdf/2305.04388

Paper: AI Doesn't Say What It Thinks. AI Orgs: It Could Be Your Friend

Comments

do you have a general stance on not wanting to share your data for training? for example to you use the private chat feature to prevent any record how do you decide the cost of sharing data? or is is as simple as you review the ai so you don’t want them looking at your information.

Drew Rogers

For sure not true infinite memory, that's just marketing

Philip

Philip, I don't think they are claiming it as "infinite memory". Where di you see that? That's something completely different from my understanding.

Kol Tregaskes

When Claude 3.7 thinking in Cursor is unable to get unit tests working after many tries, it will regularly create a simulated version of the production code in the test file and test that instead, blatantly circumventing the process. When I ask it to take a look at its work, it immediately recognizes the problem. I do find that being less imperative in my prompts ("fix these tests" vs "these tests are failing") make cheating behavior less likely.

Ryan Smith


More Creators