XaiJu
robertskmiles
robertskmiles

patreon


Early Access Video: Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

The previous video explained why it's *possible* for trained models to end up with the wrong goals, even when we specify the goals perfectly. This video explains why it's *likely*.

https://youtu.be/IeWljQw3UgQ

Also UPCOMING EVENT TODAY, we'll have another Live Watch Party in one hour (at 10pm UK time, 2pm pacific, 5pm eastern). We'll watch the new video at the same time here:

https://sync-tube.de/rooms/s0Plfqy4d 

and afterwards chat  about it on the Discord here:
https://discord.gg/2YxbNS5KUK

Early Access Video: Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Comments

Ohbtw: what I always wanted to say: I am currently training a natural neuronal network and $all.this seems dreadfully close to it. NNN comes to dinner with dirty hands. Instruct to wash hands. NNN leaves room for half a second and claims completion. NNN is informed that this is preposterous and instructed to REALLY wash hands. NNN leaves room, short burst of water is heard, NNN returns with statement of success. NNN is informed that amount of water used is implausible with cleaning action. NNN leaves room begrudgingly and after audible cleaning returns, claims success. Inspection reveals that hands are still dirty because water without soap is not cleaning, but merely wetting. NNN is reminded of initial instructions to clean hands, which includes water AND soap. .... etc. Which kinda proves that we live in base-reality and not in a simulation because all code contains bugs. In a sim, at least one of the billion child-type-agents would have found said bugs to get around washing hands to satisfy the reward-function and get to dinner quicker. Maybe that is how they found out who to bring to the Oracle in The Matrix? Cheers! Chris :)

Chris K

Yay!

Chris K


More Creators