Pod 8: Do we have a straight shot to AGI? 'Don't teach, incentivize' - Let's Think Sip by Sip
Added 2024-09-30 14:58:11 +0000 UTC
Are 'Benchmarks All You Need'? And do we have any conceptual breakthroughs to go before text-based AGI? I bring in the latest OpenAI quotes and reflect deeply on what it all means.
I was curious about this as well - was there an answer in the video? I’m not understanding how they could objectively do this
Oliver
2024-10-20 22:36:01 +0000 UTC
So they have turned the great weakness of LLM's -- their stochastic parroticity -- into a strength. Yes, this is the bitter lesson made manifest. And with it, just as in Go, suddenly an intelligence explosion seems not only possible, but perhaps imminent. I find it a bit ironic that Vernor passed just months ago.
You do great work, btw. I've been following since April of 2023, I think. You are about the only source I pay attention to on this topic, and given that this is, as you say, surely the story of the century, that is saying something.
Jason Dowd
2024-10-04 21:19:17 +0000 UTC
It's all limited my time, hardware and power. And the underlying data. If this method had no limit OpenAI would not be publishing it so openly, nor hinting how it works
Philip
2024-10-04 12:33:23 +0000 UTC
Yes would go through again, but at much greater scale (and speed), as the technique is already established (plus hardware improvements etc)
Philip
2024-10-04 12:32:21 +0000 UTC
Yeah that would be the final stage, just pure token gibberish leading to insane results
Philip
2024-10-04 12:31:33 +0000 UTC
You do!
Philip
2024-10-04 12:31:02 +0000 UTC
There is always more work Sebastian, the demand couldn't be higher for what you are doing.
Philip
2024-10-04 12:30:53 +0000 UTC
Great points, and thank you James
Philip
2024-10-04 12:30:25 +0000 UTC
Fantastic podcast. FWIW, the problem domains go way beyond taste. I think any domain like ethics or politics (which involve values as well as facts) will continue to be challenging.
James Maclaurin
2024-10-03 22:41:13 +0000 UTC
You hit the bullseye! Very well summarized and perfectly captured the essence of the paradigm shift.
SteveHaupt
2024-10-02 17:15:49 +0000 UTC
As someone who has just started a PhD about creating "semi functional" medical benchmarks for LLMs (it's taken over a year to get funding and access to patient data unfortunately), with a specific focus on extracting simple ground truths out of patient cases that could be automatically verified, your thoughts on the importance of benchmarks due to this paradigm reflects mine exactly.
There's a bit of an ethical bind that this new paradigm puts me in. My benchmarks reflect a great deal of manual work, which I'd rather not do again. So only evaluating local models and trying to ensure that the benchmark does not make it into training data seems like a good idea. However, now it might seem that this kind of benchmark might be valuable in the future not just for evaluation, but for training. However, by using it for training, you contaminate its evaluation potential and you'd have to make a new benchmark for evaluation.
Maybe in the same way we have training and evaluation sets for data, we will have to make training and evaluation benchmarks? As if I there wasn't enough work to be done already...
Sebastian
2024-10-01 17:42:05 +0000 UTC
It’ll also be interesting to see how it goes for Harmonic (and DeepMind) that using things like Lean for verifiable proofs.
Shawn Fumo
2024-10-01 11:58:15 +0000 UTC
As Steve said, will need to be trained, but they probably can use the CoT traces to help in doing that training faster. I’m sure there needs to be a balance since GPT5 might respond better to slightly diff CoT than 4, but it’s probably still a good base to start with.
Shawn Fumo
2024-10-01 11:55:46 +0000 UTC
You have echoed my thoughts almost exactly, and if what you said is even remotely possible I need to get going on a benchmark ASAP!
Very excited to start seeing models think pixel by pixel, I think that vision is going to be really hard to solve for niche domains but I have faith.
Looking forward to the new Simple-Bench leaderboard!
Trenton Dambrowitz
2024-10-01 08:29:03 +0000 UTC
This shift in AI training - focusing on getting things right instead of just pleasing us fundamentally change how it solves problems.
Michal Babula
2024-10-01 07:33:19 +0000 UTC
I took Karpathy's "cease to speak English in their chain of thought" to mean it'll be thinking in a higher-dimensional non-human language, not switching between Chinese and French. Like those older Facebook negotiator agents that invented their own language. Another way of thinking about it: an advanced AI would be as limited thinking in English as we would be thinking in chimpanzee grunts or dolphins squeaks and clicks.
Brian Crabtree
2024-10-01 07:23:07 +0000 UTC
I believe the next model of OpenAI has to go through the process again, because the reasoning is not a separate module but gets baked into the model directly. The process to get there has a new separate element.
SteveHaupt
2024-10-01 07:07:16 +0000 UTC
benchmarks are all you need + nuclear reactors and vision. Great insights.
Joshua Davis
2024-09-30 21:26:48 +0000 UTC
lol, I asked while listening, then got to the 3rd breakthrough:))
Robert Gomez-Reino
2024-09-30 20:37:56 +0000 UTC
"relying on objectively correct answers..." I never understand this part. How is o1 rewarded for correct reasoning steps? how/what is asesing that those steps are correct?
Robert Gomez-Reino
2024-09-30 20:35:28 +0000 UTC
Thank you, Philip . If we are to assume the 01 was made as a second layer on top of GPT4(with RL and verifier etc) , can we expect that Open AI can deploy this layer instantly on top of GPT5 ? Or does GPT5 have to go through its own RL process to become 02 model ?
moein merati
2024-09-30 19:27:16 +0000 UTC
Hmm, that's really interesting. I had the idea in my mind that a verifier would be a different model, since the problem of checking a correct solution has a much smaller surface area than that of generating a correct solution, so you could train a smaller classifier on just that. I guess using the same model itself to self improve through reinforcement learning gives you better performance in the end. Really cool!
Alan Ispani
2024-09-30 18:21:39 +0000 UTC
I am still skeptical. I think that if it were truly learning concepts it would be able to recursively apply the same concept over and over again. It can't multiply 2 ten digit numbers. If it had actually learned how to multiply generally, I would think that would carry forward to numbers of any size. I am biased though, personally I do not want to no longer be the dominant species on the planet.
theheatdeathiscoming
2024-09-30 16:27:10 +0000 UTC
If what you're saying is true, then it's very to conceive that an AI with a shocking level of intelligence on benchmarks will be a good enough tool to help with the creation of the next AI paradigm, whatever it might be? Recursive self-improvement may be only a few months away then?