AIExplained

AIExplained

Paper: AI Doesn't Say What It Thinks. AI Orgs: It Could Be Your Friend

Added 2025-04-14 16:19:25 +0000 UTC

A fantastic new Anthropic paper shows that LLMs seem hell-bent on obfuscating why they gave an answer, even when it would be easier to be honest.

Download: https://drive.google.com/file/d/12jgVrGZLtVC_8DGTczQIzCVWPIBrnA0y/view?usp=sharing

Paper: https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf

Post: https://www.anthropic.com/research/reasoning-models-dont-say-think

FT Exclusive: https://x.com/FT/status/1910545751119135199

DOGE Usage: https://www.reuters.com/technology/artificial-intelligence/musks-doge-using-ai-snoop-us-federal-workers-sources-say-2025-04-08/

Coconut Paper: https://arxiv.org/pdf/2412.06769

Noam post:https://x.com/polynoamial/status/1910379351759347860

‘Deep Misgivings’: https://tech.yahoo.com/ai/articles/openais-sam-altman-deep-misgivings-093754634.html

OG Paper: https://arxiv.org/pdf/2305.04388

Paper: AI Doesn't Say What It Thinks. AI Orgs: It Could Be Your Friend

Comments

do you have a general stance on not wanting to share your data for training? for example to you use the private chat feature to prevent any record how do you decide the cost of sharing data? or is is as simple as you review the ai so you don’t want them looking at your information.

Drew Rogers

2025-05-22 23:58:21 +0000 UTC

For sure not true infinite memory, that's just marketing

Philip

2025-04-27 11:22:24 +0000 UTC

Philip, I don't think they are claiming it as "infinite memory". Where di you see that? That's something completely different from my understanding.

Kol Tregaskes

2025-04-21 11:49:08 +0000 UTC

When Claude 3.7 thinking in Cursor is unable to get unit tests working after many tries, it will regularly create a simulated version of the production code in the test file and test that instead, blatantly circumventing the process. When I ask it to take a look at its work, it immediately recognizes the problem. I do find that being less imperative in my prompts ("fix these tests" vs "these tests are failing") make cheating behavior less likely.

Ryan Smith

2025-04-16 20:22:23 +0000 UTC

More Creators

Maewen

Maewen

patreon

roxerotique

roxerotique

patreon

ow14b

ow14b

fanbox

3Dfantasy

3Dfantasy

patreon

dealien

dealien

patreon

Deviantcactus

Deviantcactus

patreon

めんテル

めんテル

fanbox

SFM Heaven

SFM Heaven

patreon

gendertf

gendertf

patreon

sircus

sircus

patreon

fittersitter

fittersitter

patreon

orangero 🍊🔞

orangero 🍊🔞

gumroad

Danitysimmer

Danitysimmer

patreon

rockblackhorn

rockblackhorn

patreon

kao

kao

gumroad

Eldervi

Eldervi

patreon

赫卡Huka

赫卡Huka

fanbox

Stephendraws

Stephendraws

patreon

nvk.tools

nvk.tools

gumroad

Martux

Martux

patreon

TeamDuwang

TeamDuwang

patreon

tatibanasiori

tatibanasiori

fanbox

Rebis

Rebis

patreon

fastscalps

fastscalps

patreon

WKK

WKK

fanbox

Supereyepatchwolf

Supereyepatchwolf

patreon

Ashardy

Ashardy

patreon

Servojob

Servojob

gumroad

ISubstance

ISubstance

fanbox

あおみかん

あおみかん

fanbox

Colton Bryant

Colton Bryant

patreon

Chad Hoverter

Chad Hoverter

patreon

MrEsan

MrEsan

patreon

なのこ

なのこ

fanbox

Faizo

Faizo

patreon

kosafordraw

kosafordraw

patreon

Caelus

Caelus

patreon

msrn

msrn

fanbox

thenaysayer34

thenaysayer34

patreon

PhillyGriff Designs

PhillyGriff Designs

patreon