Reflections on Sam Altman’s Recent Expectation-Setting on GPT-5
Added 2024-04-28 13:38:44 +0000 UTCI believe the model that will end up being popularly known as GPT-5 has finished training. That comes not just from the analysis in my January video but also Sam Altman’s response to a question I put to him, via an Insiders member.
We gathered that he is personally using the model they just finished training, and so his recent interview comments would be made with the initial impressions of what that model was like. So before my question, let’s look at his comments from the 20VC interview on the 15th April.
In that interview, everyone focused on him saying ‘we're going to steamroll [certain builders] with GPT-5’. But here was the context:
‘If you’re building something on GPT 4 that a reasonable observer would say if GPT-5 is as much better as GPT-4 over GPT-3 was - not because we don't like you, but just because we like have a mission - we're going to steamroll you …But there's a giant set of startups where you benefit from GPT-5 being way better and if you build those and AI progress keeps going the way that we think it's going to go I think for the most part you'll be really happy.’
Notice that even those building with the expectation that GPT-5 will be way better will be ‘for the most part’ happy. And here’s another comment from him that got less attention:
‘With GPT-4, people do use it to help them do science, just in extremely primitive and limited ways and with GPT-6 I think people will say ‘hey this is like helping me as a general purpose tool in all these ways’ and then with GPT-8 maybe people are like … ‘this can do some limited or maybe not so limited tasks for me.’’
Notice the implication. We’ve skipped GPT-5 and even GPT-6 isn’t yet ‘doing tasks’ but is ‘helping in general’. It feels like expectation-setting downwards, especially for those with the belief that GPT-5 will be AGI (here let’s define that as a substitute for hiring the median worker across most industries).
Then we have Dario Amodei, CEO of Anthropic - ‘Nothing truly insane happens in 2024’, and that most of the impressive stuff is ‘2026 onwards’.
But what’s this about a question from me? Well I was lucky enough to put a question to Sam Altman, via an Insiders Discord member, in a private 20 person Stanford AI chat that happened on Wednesday. A video snippet leaked on twitter but it was off-the-record, so we only got live-notes.
My question was: 'Do you see the techniques shared in the Let's Verify Step By Step paper as crucial to future GPTs?'
He said, from live-notes,
'Yes, process level supervision is important, but not the most important (he didn’t say what was)
In-context learning is a promising hint (he emphasizes in-context learning a few times throughout the session)
Let’s Verify/Think Step By Step are the first step similar to GPT 1'
My reaction: Obviously he might be constrained in what he can say. But it’s a really interesting response. Check out my ‘Many Shot Magic’ Insiders video on how in-context learning could be a key metric for raw intelligence. And if let’s verify is GPT-1, I am genuinely curious to see what a GPT-5 of Let’s Verify is like.
Other notes from the event:
There are plans for public/government audits this year
The current economic models will break
Capitalism will have to change, it’s the best economic system we’ve found so far, but not necessarily the best
Current OpenAI research directions:
More efficient attention and 1 trillion context, which would open so many doors (personal AI, no finetuning)
Big focus on adaptive compute
On reasoning vs memorization he believes AI is doing reasoning right now (just poorly)
ML theory does matter, we just might not catch up to capabilities
OpenAI spends much more on inference than on training
Self-verification/critique doesn’t work for current model, though they haven’t trained for it specifically; unclear if it is possible to train for, but believes it will be; we will figure it out in within the next few model versions
On agents: believes that it is more important to focus on agent reasoning/introspection rather than scaling first
GPT-6 might be a smart PhD student in all areas, but won’t dramatically change things within a few years
10th epoch on a calculus textbook has diminishing returns
On using AI to help himself, he said can’t talk about the one he really uses since it’s not released yet
The last note was a big part of why I wrote this post - he is already using the next version, so his comments from mid-April onwards can be weighted a bit more heavily for significance.
None of the comments so far though, from anyone at OpenAI, imply a step-change, a before-and-after game-changer.
I think GPT-5 will be jaw-dropping, let’s be clear, with a video avatar, much better benchmark scores, limited agency to complete basic tasks, 1m+ context, the ability to listen and speak in real-time in an even more life-like manner, and more. But not yet a substitute for the median worker in most industries (call-centers notwithstanding)
If I was being much more speculative than I should be, I would say that the US markets may be pricing in more economic disruption from GPT-5 than may occur, and that a moderate correction could be on the cards. Products can completely change the world (as AGI will), and still be overestimated in the very short run.
As always, let me know what you think. I've trawled pretty much every OpenAI employee comment from the last 4 weeks, and indeed chatted with a couple staff members. But I might have missed something! Have a wonderful day.
Comments
I appreciate your taking the time to reply and share further thoughts. I did not mean to imply that scientific "truths" are merely the consensus opinion of experts. To be science, they must at least correspond to experimental results or observed reality. However, as demonstrated by gravity and QM, the explanation of a scientific truth can vary widely, and it evolves over time as new data becomes available. The best way of characterizing this, IMHO, is Asimov's "relativity of wrong." We never get to The Truth, but we do get progressively closer. Hawking, in his own way, concurred. Cheers, Clay
Clay Farris Naff
2024-05-11 13:33:49 +0000 UTC> Self-verification/critique doesn’t work for current model, though they haven’t trained for it specifically; unclear if it is possible to train for, but believes it will be; we will figure it out in within the next few model versions Why not train a verification model separately to the generation model? Are there approaches which explore this?
Kofi Baah
2024-05-11 13:07:36 +0000 UTCI'd disagree with the idea that scientific truths are rooted in the consensus of experts. Scientific truths are good explanations of physical reality that successfully solve a problem. My intuition is that yes, verification will lead to AI being a reliable source of truth. Not necessarily self-verification in the tree-of-thought sense, but approaches that test a model's conjectures against physical reality. See: Philip's "AI Conquers Gravity" video https://www.youtube.com/watch?v=d5mdW1yPXIg Funsearch https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/
Kofi Baah
2024-05-11 12:59:54 +0000 UTCNote Altman's wording "economic models will break." This is much more extreme than merely mass unemployment. For most of human history society consisted entirely of a few wealthy and powerful elites and a mass of underemployed and violent underclass, yet the economic and political model didn't break despite that. To suggest a complete break of economic model which is usually the last link of the society to breakdown would imply an unimaginable amount of displacement ina very short amount of time, and most importantly it must disempower the elite as much as it does everybody else.
Feitian Li
2024-05-03 03:10:18 +0000 UTCOpenAI employees seem to be extremely confident agi will arrive in a reasonably short amount of time. This is increasingly shocking to me, because as I reckon this is still a relatively open field with a mostly publicly available literature, and nothing has suggested definitively yet that the rest of the unknowns can be figured out.
Feitian Li
2024-05-03 03:01:58 +0000 UTCI guess the more specific questions are: Is the training problem solved with synthetic data? Does it even matter if the synthetic data is wrong? Is the RAG problem solved with 10m+ context windows? RAG = Insert all the things until 9m tokens is reached. Is that easier and more effective then the RAG we are going today with smaller context limits?
GGuy
2024-05-01 01:20:18 +0000 UTCIsn't it obvious? Like we already know that domain specific training and in-context learning drastically improves the outcomes and big part of the battle now is about the scaffolding to effectively achieve it. In training the problem is data, in context the problem is that RAGs don't work yet the way we want them to ;D
Jan Wilczynski
2024-05-01 00:14:05 +0000 UTC"Dario Amodei, CEO of Anthropic - ‘Nothing truly insane happens in 2024’, and that most of the impressive stuff is ‘2026 onwards’." I'm sure this is a well considered comment from Mr Amodei, but I do also wonder if its subject to perspective? The area under the "truly insane" curve feels like it could still encompass significant change. In particular my mind wanders to the performance of agentic GPT 3.5 systems vs GPT-4 (as recently discussed by Andrew Ng in a series from Sequoia Capital IIRC). Can't help but wonder what such clever architectural systems built around GPT-5 might end up being able to do. Mind you the labs probably have such systems behind closed doors already, so perhaps we'll all start to learn what those capabilities are as research discovers them, while the labs look on from their position some number of months in the future knowing what comes next :p
Kemi
2024-04-29 11:53:34 +0000 UTC"In context" of recent papers and the phi-based architectures I wonder if developers are considering a new approach: providing models with all necessary in-context information during both training and inference. Instead of relying on encoded knowledge from training data, this method would supply relevant information as part of the input context for each response generation. I wonder if this approach could potentially improve the model's reasoning capabilities and lead to more accurate, context-aware responses.
GGuy
2024-04-29 01:22:32 +0000 UTCUnderstandable that everyone has focused on Sam's comments lately. But I think Brad Lightcap's contributions in the VC20 interview should get more air time because he is focused on where the rubber hits the road with enterprises. He says, “enterprises have a very natural desire to want to throw the technology into a business process with the pure intent of driving a very quantifiable ROI. ‘I want to take AI and throw it at a very specific process in supply chain management and cut 20% of my spend’.” He thinks leaders, “criminally underrate” how much return you really get by just providing people access to the technology. You can’t quite quantify an exact ROI from GenAI. This is precisely my experience in speaking to leaders. They want to see LLMs as a technology that can solve a specific business problem and hone in on that. Rather a mindset shift is required to see GenAI as a virtual coworker - or army of virtual coworkers. For Moderna to launch 15 new products over 5 years without AI would require 100,000 people, but with technology like ChatGPT they will be able to do it with less than 6,000 workers. What's the ROI on that?
Sean Gallagher
2024-04-28 23:35:19 +0000 UTCSpeaking as someone in the USA, I felt instantly gloomy when I read the comparison to handling the pandemic... I seriously hope we handle AGI/ASI far better than we did the pandemic, but it might be reasonable to use that benchmark to set expectations.. :(
adfaklsdjf
2024-04-28 22:20:52 +0000 UTCPhilip, as always you've provided incisive and well-reasoned analysis. I use GPT-4 Turbo and Claude 3.0 Opus every day. I'm very glad to have them but they have limits I'm not sure can be solved using self-supervised learning -- no matter how smart the model is. E.g, I frequently ask them syntax questions about Python, ffmpeg, zsh, etc. They often make mistakes since there are many versions of those with subtly different syntax. Even if I tell them up front what versions I'm using, the errors still happen. It finally occurred to me -- that version-specific metadata was not captured in the unsupervised training. It further occurred to me that other areas are likely affected by this—legal advice, medical advice, technical troubleshooting, etc. The core LLM principle is unsupervised learning, which hopes that bulk data, scalability, and mysterious emergent behavior will enable further progress. Yet in my limited experience, it appears there may be areas where progress may be constrained due to the lack of metadata in the corpus. I think progress will still happen, but the LLMs may exhibit more "spikey" capability across knowledge domains. It will be interesting to see whether GPT-5 and the other frontier models improve in these areas.
Joe Marler
2024-04-28 21:35:45 +0000 UTCI'm wondering how this statement applies to It and sturtups. Will the model brake? If you build new company on the GTP-4 or 5, will it be whipped out by 6 or 7?
Arek Stryjski
2024-04-28 20:43:09 +0000 UTCSo either the data-set optimization gains are overhyped or he’s lying. He does have every incentive to lie but so far he’s been reasonably honest about internal capabilities. But if you are near to AGI, the move is to lie until you have a decisive strategic advantage and become God Emperor.
Edward Huff
2024-04-28 20:29:51 +0000 UTCFrom the tone of your commentary and Sam Altman's remarks, it's clear that the market is beginning to move past the initial GenAI hype. I've felt this myself. As a developer, I recently stopped using GitHub Copilot because I found it was actually slowing me down. I've noticed other developers experiencing the same trend. However, this shift could be beneficial. It provides an opportunity for innovators and engineers to apply this new technology to real-world business scenarios, though it will require time. Thank you for your commentary, Philipp.
Cyril Sadovsky
2024-04-28 16:12:18 +0000 UTCAdaptive compute is going to be the most important thing going forward, I expect. It’s so strange that people expect these models to do everything right at the first thought. We don’t expect that from humans. The GPT models can already think. They just need time to think.
Taunger
2024-04-28 16:00:52 +0000 UTCAlso, Amara's law states "Technology tends to be overestimated in the short term and underestimated in the long term." Of course it depends on your definition of what is short and long but this may still be applicable here.
Steve DeMoss
2024-04-28 15:58:00 +0000 UTC"Current economic models will break." Really would love more context here as that's potentially an incredibly dramatic statement. Models for the AI industry? Macro economic models? The global economy? Maybe just understanding the question that led to that comment would help.
Steve DeMoss
2024-04-28 15:56:52 +0000 UTCI've been mulling AI epistemology lately. After an Intro to AI presentation I gave, an audience member asked me if AI will be able to cut through all the misinformation in the infosphere and tell us what's true. I gave a very cautious (and inadequate) answer. It made me think about how even mathematical truths only hold for a given set of axioms, and how scientific truths are always tentative and rooted in the consensus of experts, which evolves over time. As for political or historical truths? Who can say? But I am curious to know if you think that verification procedures will lead, eventually, to AI as a reliable source of truth.
Clay Farris Naff
2024-04-28 15:31:19 +0000 UTCI’m still wondering whether we’ll see a GPT-4.5 before GPT-5, based on comments Sam made in his podcast with Lex Fridman a month or so ago… he reflected on whether OpenAI should have more iterative release cycles as opposed to big annual releases to help the public adapt more. Whether it’s GPT-4.5 or GPT-5 I think the things that will have the biggest impact in the short term will be the engineering OpenAI do around the models, not necessarily the models themselves. The things you mention like video avatars, real-time conversations, personalisation and the start of agent-like behaviour will have much bigger real-world impacts than the overall intelligence gains we’re likely to see between generations of models at the moment. However, I do think we will see a breakthrough in model intelligence at some point (maybe 2026/27) but it will be based on a new type of architecture and training approach, probably along the lines of the objective-driven AI approach Yann LeCun has been proposing.
Sean Betts
2024-04-28 15:21:03 +0000 UTCThanks for sharing! Could you share how you are using AI to help you with your work? (Eg to parse through openAI employees comments on X?)
Eddine Maiza
2024-04-28 15:02:28 +0000 UTC"- The current economic models will break -Capitalism will have to change" I realise that most, if not everyone here would intuitively agree with that statement. I do. I'd be surprised if many people at research labs disagree. I guess the question is: "how bad will it have to get before the system is forced to change?" In a way wouldn't it be better if an ASI drops unexpectedly and we need to adapt ASAP (similar to in a pandemic)? As opposed to the incremental change where the frustration of people build as more and more jobs are displaced and more and more people are disgruntled as new models become more powerful.
Machiel Reyneke
2024-04-28 15:01:51 +0000 UTCWow, I totally missed the one about GPT4-6-8... I feel that GPT4 is already helping me in multiple tasks which I don't consider basic at all. I wonder if this means like ChatGPT or really what one can build around GPTX. If really GPT6 will be JUST saying "this is helping me as a general-purpose tool..." then we (at least me) are up to disappointment regarding GPT5. The quality I am most interested on is reasoning power, the rest can be built around with today's tools (ok, some decent inference speed to enable real time use cases might be required too) and Sama promised a leap on that one. I find quite frustrating the ambiguity around that topic: "you need to plan and build big or we will steamroll you, but I cannot really tell you how different the reasoning of this next model is". Could it be that it is really difficult to express how smart the models are from now on? So we can only say they will be "way smarter" or "significant smarter" (this was used by Sama for GPT4 April release btw...). If GPT5 doesn't deliver in reasoning as promised (same as from 3.5 to 4) then I will feel I was lied to.
Robert Gomez-Reino
2024-04-28 14:53:49 +0000 UTCSad only one question went through to Sam, but I think it was the most important one. When you think about it from a business’s perspective, in-context learning is probably the most important thing to get right. If I’m trying to teach an agent how to order parts for a car we’re fixing I don’t care how well it does on HellaSwag or MMLU, it needs to understand our processes and follow them properly! I would absolutely love to see how many shot prompting, GPT-5 level reasoning and In-context learning, and SmartGPT combine on multi-modal tasks that the model wasn’t originally designed for.
Trenton Dambrowitz
2024-04-28 14:43:47 +0000 UTCSo slow incremental progress is going to be the way for the next couple years
Anouar Mansour
2024-04-28 14:35:49 +0000 UTCGreat analysis. I listened to the same comments for clues from Sam. I'm recalling how, just a few months ago, we were all debating about whether or not GPT-4 would be disappointing, and in reflection, yes, I think Sam was right in forecasting that GPT-4 has been a bit disappointing. What I'm MOST curious about is what theory of mind or intelligence they are using to forecast what abilities and disruptive ability these models have, or if it's just intuition from getting early access.
David Shapiro
2024-04-28 14:30:57 +0000 UTCThanks for the update. I’m convinced that the short term investment is/was very high and could crash a bit because the money is just rolling in.
Steven
2024-04-28 13:54:08 +0000 UTC