AIExplained

AIExplained

Last 24 Hours: Signs of Introspection in LLMs

Added 2025-10-30 18:40:22 +0000 UTC

Before this paper from Anthropic, out on the 29th, I was a lot more skeptical about LLMs self-reporting their internal state. This is not proof that they can, but partial proof of circuits showing true introspective capability, and more than that, the ability to map a question about it to those circuits.

Plus a big update to lmcouncil.ai.

https://www.anthropic.com/research/introspection

Full Paper: https://transformer-circuits.pub/2025/introspection/index.html#mechanisms

Earlier Work: https://www.anthropic.com/research/mapping-mind-language-model

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

Release Post: https://x.com/AnthropicAI/status/1983584136972677319

Last 24 Hours: Signs of Introspection in LLMs

Comments

That's so great to hear! Can you just confirm whether the scrolling hassle is from using it on mobile. or what type of screen? If mobile, after how many models does this expansion become a chore, the normal 3-4 or more? I can try to implement a solution in next 24 hours.

Philip

2025-11-09 10:34:22 +0000 UTC

Really loving lmcouncil.ai! Feature request: When I have a large number of council members, scrolling from left to right/right to left to review responses is quite cumbersome. Is it possible to add some scroll arrows (perhaps something like what you see in headline image from this blog: https://www.experienceux.co.uk/ux-blog/a-ux-perspective-on-horizontal-scrolling/ )

Bret Brizzee

2025-11-08 19:57:58 +0000 UTC

Done! Just for you!

Philip

2025-11-05 12:32:14 +0000 UTC

Thanks Joshua! And for being a Max sub...

Philip

2025-11-05 12:32:09 +0000 UTC

Damn, not too much I can do about that!

Philip

2025-11-05 12:31:57 +0000 UTC

In LM Council, please add an option to "delete all" past conversations. Thank you!

Riley Thomson

2025-11-05 01:02:33 +0000 UTC

Great content Philiph!

Joshua Davis

2025-11-03 04:47:12 +0000 UTC

The vocal fry in your voice is a bit disturbing.... 😅

Kishore Kumar

2025-11-02 21:35:21 +0000 UTC

This makes me wonder if human introspection is as a result of the physical capabilities of the brain or if it comes about as a result of lived human experiences. If it is a result of the physical brain then it would suggest the possibility that an artificial intelligence could stumble across introspection.

Barnaby Golden

2025-11-02 12:57:59 +0000 UTC

I'm wondering if this is not a case of just writing "bread" or "treasure" or "dust" with vector instead of with words and after that it being the same normal model inference as always. I'm hypothesizing that for example just because model was not trained on "Quick fox jumped over a lazy dog" with "bread" skewed activations, the context-filling circuits would kick in. All in all their job is to extract information that is not written directly. Combine it with suggestive question et voila. So to me the testing was clever and shows interesting way of inference, but was not rigorous enough to conclude what they tentatively did. Oh and congrats on lmcouncil expansion and development! Will most probably give it a try!

Paweł Pieniacki

2025-10-31 10:08:52 +0000 UTC

Again, very fascinating research by Anthropic. Although I am not sure what to make of this. When they inject activity patterns for specific concepts, we would expect (as seen by their prior work on monosemanticity) that the outputs are steered towards this concept. I feel like the combination of the given prompt, where the researchers reveal that the model is tested, and the actual activation injection, is something we would expect even if a model had no capability to introspect. It would be more impressive if the model detected these injections without getting any hints. Also, it's not clear how introspection could work in a feed forward network. One could imagine that parts of the state represented by earlier layers are analyzed by later layers, but this would be the barebone version of introspection. Anyways, very interesting research even though I don't find it super convincing for now.

Phillip Yao-Lakaschus

2025-10-30 21:17:41 +0000 UTC

Thanks Phillip, I was hoping you’d cover the details of that Anthropic research after running out of time to read more than the summary myself! I hope you can keep doing these videos and don’t get too distracted by building - it’s very addictive!

Erik

2025-10-30 20:56:08 +0000 UTC

More Creators

JuiceofYellow

JuiceofYellow

patreon

Clara’s Tum Shop

Clara’s Tum Shop

gumroad

Guinaifen

Guinaifen

patreon

ultimatewkar

ultimatewkar

patreon

kiwoterauu

kiwoterauu

fanbox

Wily279

Wily279

patreon

Sylveria

Sylveria

patreon

thedeepend

thedeepend

patreon

ボロ

fanbox

assogood

assogood

fanbox

Secret Life

Secret Life

patreon

darte77

darte77

patreon

TaylorBreeder

TaylorBreeder

gumroad

ginklaga

ginklaga

fanbox

ぢぇいぢぇい（^JJ^）

ぢぇいぢぇい（^JJ^）

fanbox

overdose pixel

overdose pixel

patreon

phoebus_art

phoebus_art

patreon

むらかみ式部

むらかみ式部

fanbox

GUERILLA Mods

GUERILLA Mods

patreon

maloxx🔞

maloxx🔞

fantia

yanokake

yanokake

fanbox

neti_jp

neti_jp

patreon

MaxPacks - Brushes for Procreate

MaxPacks - Brushes for Procreate

gumroad

Jonatan Mossotto

Jonatan Mossotto

patreon

純粋ナス

純粋ナス

fanbox

screwthename

screwthename

patreon

tk

fanbox

jaseyoung

jaseyoung

patreon

OOPARTS

OOPARTS

dlsite

夢モカ

夢モカ

fanbox

liquidxt1000

liquidxt1000

fanbox

鈴木すきゃな

鈴木すきゃな

fanbox

しゃいる

しゃいる

fanbox

CroissantX

CroissantX

patreon

shinyrosemods

shinyrosemods

patreon

marufukusatoshi

marufukusatoshi

fanbox

furryhour

furryhour

patreon

ChibiSTL

ChibiSTL

patreon

XANADU

XANADU

patreon

Great Gretuski Studios

Great Gretuski Studios

patreon