XaiJu
AIExplained
AIExplained

patreon


Pod 8: Do we have a straight shot to AGI? 'Don't teach, incentivize' - Let's Think Sip by Sip

Are 'Benchmarks All You Need'? And do we have any conceptual breakthroughs to go before text-based AGI? I bring in the latest OpenAI quotes and reflect deeply on what it all means.

Link for Download: https://drive.google.com/file/d/1QV9h0_-uSADSaH_DGAL8tpHLCbFPdeXE/view?usp=sharing

‘Don’t teach. Incentivize’ Hyung Won Chung: https://www.youtube.com/watch?v=kYWUEV_e2ss

RLHF: https://arxiv.org/pdf/2203.02155

Mensa 98th Percentile: https://x.com/JgaltTweets/status/1836093456831402481

Mark Chen, SVP of Research at OpenAI: https://x.com/markchen90/status/1836068847167914162?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet

Pod 8: Do we have a straight shot to AGI? 'Don't teach, incentivize' - Let's Think Sip by Sip Pod 8: Do we have a straight shot to AGI? 'Don't teach, incentivize' - Let's Think Sip by Sip

Comments

I was curious about this as well - was there an answer in the video? I’m not understanding how they could objectively do this

Oliver

So they have turned the great weakness of LLM's -- their stochastic parroticity -- into a strength. Yes, this is the bitter lesson made manifest. And with it, just as in Go, suddenly an intelligence explosion seems not only possible, but perhaps imminent. I find it a bit ironic that Vernor passed just months ago. You do great work, btw. I've been following since April of 2023, I think. You are about the only source I pay attention to on this topic, and given that this is, as you say, surely the story of the century, that is saying something.

Jason Dowd

It's all limited my time, hardware and power. And the underlying data. If this method had no limit OpenAI would not be publishing it so openly, nor hinting how it works

Philip

Yes would go through again, but at much greater scale (and speed), as the technique is already established (plus hardware improvements etc)

Philip

Yeah that would be the final stage, just pure token gibberish leading to insane results

Philip

You do!

Philip

There is always more work Sebastian, the demand couldn't be higher for what you are doing.

Philip

Great points, and thank you James

Philip

Fantastic podcast. FWIW, the problem domains go way beyond taste. I think any domain like ethics or politics (which involve values as well as facts) will continue to be challenging.

James Maclaurin

You hit the bullseye! Very well summarized and perfectly captured the essence of the paradigm shift.

SteveHaupt

As someone who has just started a PhD about creating "semi functional" medical benchmarks for LLMs (it's taken over a year to get funding and access to patient data unfortunately), with a specific focus on extracting simple ground truths out of patient cases that could be automatically verified, your thoughts on the importance of benchmarks due to this paradigm reflects mine exactly. There's a bit of an ethical bind that this new paradigm puts me in. My benchmarks reflect a great deal of manual work, which I'd rather not do again. So only evaluating local models and trying to ensure that the benchmark does not make it into training data seems like a good idea. However, now it might seem that this kind of benchmark might be valuable in the future not just for evaluation, but for training. However, by using it for training, you contaminate its evaluation potential and you'd have to make a new benchmark for evaluation. Maybe in the same way we have training and evaluation sets for data, we will have to make training and evaluation benchmarks? As if I there wasn't enough work to be done already...

Sebastian

It’ll also be interesting to see how it goes for Harmonic (and DeepMind) that using things like Lean for verifiable proofs.

Shawn Fumo

As Steve said, will need to be trained, but they probably can use the CoT traces to help in doing that training faster. I’m sure there needs to be a balance since GPT5 might respond better to slightly diff CoT than 4, but it’s probably still a good base to start with.

Shawn Fumo

You have echoed my thoughts almost exactly, and if what you said is even remotely possible I need to get going on a benchmark ASAP! Very excited to start seeing models think pixel by pixel, I think that vision is going to be really hard to solve for niche domains but I have faith. Looking forward to the new Simple-Bench leaderboard!

Trenton Dambrowitz

This shift in AI training - focusing on getting things right instead of just pleasing us fundamentally change how it solves problems.

Michal Babula

I took Karpathy's "cease to speak English in their chain of thought" to mean it'll be thinking in a higher-dimensional non-human language, not switching between Chinese and French. Like those older Facebook negotiator agents that invented their own language. Another way of thinking about it: an advanced AI would be as limited thinking in English as we would be thinking in chimpanzee grunts or dolphins squeaks and clicks.

Brian Crabtree

I believe the next model of OpenAI has to go through the process again, because the reasoning is not a separate module but gets baked into the model directly. The process to get there has a new separate element.

SteveHaupt

benchmarks are all you need + nuclear reactors and vision. Great insights.

Joshua Davis

lol, I asked while listening, then got to the 3rd breakthrough:))

Robert Gomez-Reino

"relying on objectively correct answers..." I never understand this part. How is o1 rewarded for correct reasoning steps? how/what is asesing that those steps are correct?

Robert Gomez-Reino

Thank you, Philip . If we are to assume the 01 was made as a second layer on top of GPT4(with RL and verifier etc) , can we expect that Open AI can deploy this layer instantly on top of GPT5 ? Or does GPT5 have to go through its own RL process to become 02 model ?

moein merati

Hmm, that's really interesting. I had the idea in my mind that a verifier would be a different model, since the problem of checking a correct solution has a much smaller surface area than that of generating a correct solution, so you could train a smaller classifier on just that. I guess using the same model itself to self improve through reinforcement learning gives you better performance in the end. Really cool!

Alan Ispani

I am still skeptical. I think that if it were truly learning concepts it would be able to recursively apply the same concept over and over again. It can't multiply 2 ten digit numbers. If it had actually learned how to multiply generally, I would think that would carry forward to numbers of any size. I am biased though, personally I do not want to no longer be the dominant species on the planet.

theheatdeathiscoming

If what you're saying is true, then it's very to conceive that an AI with a shocking level of intelligence on benchmarks will be a good enough tool to help with the creation of the next AI paradigm, whatever it might be? Recursive self-improvement may be only a few months away then?

Manuel Bevand


More Creators