robertskmiles

robertskmiles

Early Access Video: Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Added 2021-05-22 20:01:00 +0000 UTC

The previous video explained why it's *possible* for trained models to end up with the wrong goals, even when we specify the goals perfectly. This video explains why it's *likely*.

https://youtu.be/IeWljQw3UgQ

Also UPCOMING EVENT TODAY, we'll have another Live Watch Party in one hour (at 10pm UK time, 2pm pacific, 5pm eastern). We'll watch the new video at the same time here:

https://sync-tube.de/rooms/s0Plfqy4d

and afterwards chat about it on the Discord here:
https://discord.gg/2YxbNS5KUK

Early Access Video: Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Comments

Ohbtw: what I always wanted to say: I am currently training a natural neuronal network and $all.this seems dreadfully close to it. NNN comes to dinner with dirty hands. Instruct to wash hands. NNN leaves room for half a second and claims completion. NNN is informed that this is preposterous and instructed to REALLY wash hands. NNN leaves room, short burst of water is heard, NNN returns with statement of success. NNN is informed that amount of water used is implausible with cleaning action. NNN leaves room begrudgingly and after audible cleaning returns, claims success. Inspection reveals that hands are still dirty because water without soap is not cleaning, but merely wetting. NNN is reminded of initial instructions to clean hands, which includes water AND soap. .... etc. Which kinda proves that we live in base-reality and not in a simulation because all code contains bugs. In a sim, at least one of the billion child-type-agents would have found said bugs to get around washing hands to satisfy the reward-function and get to dinner quicker. Maybe that is how they found out who to bring to the Oracle in The Matrix? Cheers! Chris :)

Chris K

2021-05-25 10:38:04 +0000 UTC

Yay!

Chris K

2021-05-25 08:34:01 +0000 UTC

More Creators

DIY PETE

DIY PETE

gumroad

stefaniaferrario

stefaniaferrario

patreon

NANAS

NANAS

patreon

daoistenigma

daoistenigma

patreon

Haku

Haku

fanbox

ChannPika

ChannPika

patreon

Whimsical Deity

Whimsical Deity

patreon

alterism_ai

alterism_ai

patreon

cyty70y1

cyty70y1

fanbox

Maripandraw

Maripandraw

patreon

😊１５😊＇

😊１５😊＇

gumroad

Anamadic

Anamadic

fanbox

pandami

pandami

patreon

Witchking00

Witchking00

gumroad

Eleos59

Eleos59

patreon

紺ナリタ

紺ナリタ

fanbox

犬山玄

犬山玄

fanbox

VirtualWorkouts

VirtualWorkouts

patreon

xiaxia

xiaxia

patreon

harmoniasims4

harmoniasims4

patreon

Anxiety Art

Anxiety Art

patreon

Tomoe Draws

Tomoe Draws

patreon

XYmon

XYmon

patreon

soup!

soup!

gumroad

Reclusive_Ghost

Reclusive_Ghost

patreon

arkapami

arkapami

fanbox

arsonichawt

arsonichawt

fanbox

AKアクサム

AKアクサム

fanbox

Fibilis

Fibilis

patreon

find

find

fanbox

南向絵蜘蛛

南向絵蜘蛛

fanbox

nocchi

nocchi

fanbox

moco🌸

moco🌸

fanbox

@kyousa38

@kyousa38

gumroad

Rensyu

Rensyu

fantia

Satou

Satou

gumroad

nabi_sandi

nabi_sandi

patreon

Null-Ghost

Null-Ghost

gumroad

Leonave

Leonave

patreon

タウリン

タウリン

fantia