Drawing on 3 new articles, an interview in the last 48 hours, and 4 papers, I'll argue that we should not let o1's 'ability to self-correct' go sailing past in the night.
Link for Offline Viewing and Download: https://drive.google.com/file/d/1iULiA-t8yEniM5UkRGEHq99-ZB_7uqX8/view?usp=sharing
Noam Brown (and co) Interview: 2024-10-03 12:42:49 +0000 UTC
View Post
Are 'Benchmarks All You Need'? And do we have any conceptual breakthroughs to go before text-based AGI? I bring in the latest OpenAI quotes and reflect deeply on what it all means.
Link for Download: https://drive.google.com/file/d/1QV9h0_-uSADSaH_DGAL8tpHLCbFPdeXE/view?usp=sharing
‘Don’t teach. Incentivize’ Hyung Won Chung: 2024-09-30 14:58:11 +0000 UTC
View Post
A new paper from the last few days has dropped, and it's a good one. LLMs can now be said to plan, and I have all the analysis as well as exclusive clips from my interview with the lead author. And I don't believe any one else has reported that this breakthrough performance from o1 now exceeds average human performance for this core task.
Link for Off-line Viewing and Download: 2024-09-24 14:05:00 +0000 UTC
View Post
Less than 24 hours ago we got the claim that a multi-million dollar 'final test' for AI was being put together. But I ask questions about what it will achieve, drawing on evidence from 3 papers, Simple Bench, and my own analysis. Hopefully, this video will show you why 'o1 = AGI' claims leave a lot to be desired.
Link for Offline Viewing/Download: https://drive.goog...
2024-09-17 18:12:45 +0000 UTC
View Post
Surely now that companies like OpenAI, whose only goal is to create an AGI, are worth $100B+, we have a settled definition of 'AGI' itself? No? Or even a set of rival definitions, each of which are well-defined? Well, strap yourself in, we are gonna find out where the term started and show you that things aren't so simple...
Link for Off-line viewing and Download: http...
2024-09-09 16:00:08 +0000 UTC
View Post
A 20,000-word new report on AI scaling, and yes, I read it all to bring you the highlights. What are the biggest unanswered questions for whether we will scale models 10,000x and is there a deeper question that underlies them all? Plus new clips from Anthropic CEO, Simple update, Eric Schmidt and more…
Link for Download and Off-line Watching: https://drive.google.com...
2024-09-01 21:04:42 +0000 UTC
View Post
Full results from the first Simple Bench run (including latest model updates), the new website, more insight into the questions and what the gaping hole in basic reasoning means, plus my plans going forward.
Link for Off-line Viewing and Download: https://drive.google.com/file/d/1l1qEoXuRXEPSVwu_thmFpTbhgQwLHsTh/view?usp=sharing
Simple Bench Website Prev...
2024-08-19 14:44:44 +0000 UTC
View Post
The term 'bitter lesson' is thrown about a lot, but what does it actually mean? Does it leave humans irrelevant or is it about something deeper? Drawing on lessons from MuZero and the annotated original essay by Rich Sutton.
Link for Offline-viewing: https://drive.google.com/file/d/1q5hewZoI1zHJ8nP8uZCuyybYj1H7wkjq/view?usp=sharing
Bitter Lesson Essay
...
2024-08-12 09:21:24 +0000 UTC
View Post
Mistral Large flops hard, but what exactly is this benchmark, what are some more of its questions, why is it different, and what is next? All, or at least some, of these questions will be answered.
2024-07-30 18:50:25 +0000 UTC
View Post
One of my favorite videos in the series! A new bonus series explaining some of the most controversial terms in artificial intelligence, this time covering the term 'emergent behaviors'. Deciding if you think models do - or do not - display emergent behaviors could shape your perspective on AI, so let me know what you think at the end of the video!
Original Emergent Abilities Paper
https://arxiv.org/pdf/2206...
2024-07-15 15:31:45 +0000 UTC
View Post
‘Can any model do [insert task]?’ is a much harder question than it seems. I’m going to give you five vivid categories, with unambiguous examples, drawing on 6 new papers, of the kind of detail that is so often lost in 2024 debates on AI.
Link for Off-line Watching and Download: https://drive.google.com/file/d/1ep3Asw6_1LZRJoCKU1VGSaUcYMP6djBq/view?usp=sharing...
2024-07-04 18:17:18 +0000 UTC
View Post
There is a clear dividing line emerging at the height of OpenAI, and in AGI labs more broadly. This pod reflects on the 'reasoning' and 'scale' axes, including fascinating new comments from OpenAI researcher Noam Brown about his CTO, Murati, claiming GPT-4 as 'a smart highschooler'. Plus my take on Sutskever and Superintelligence and the 'Altman shift.'
OpenAI CTO Mira Murati Comments: https://x.com/t...
2024-06-23 17:35:01 +0000 UTC
View Post
A new bonus series (2/8 episodes) explaining some of the most controversial terms in artificial intelligence, this time covering the term 'open source'. In some quarters, it's the most controversial term of them all. Here, we mainly focus on the difference between open source and open weights - a key distinction!
Link for Off-line Viewing: https://drive.google.com/f...
2024-06-19 12:52:18 +0000 UTC
View Post
Recently fired OpenAI researcher Leopold Aschenbrenner has produced an essay that will either confirm him to be absolutely crazy, a target of an OpenAI lawsuit, or bizarrely prophetic. I went through all 165 pages, plus his recent 4.5 hr interview (and other less recent material) to bring you just the highlights. More deeply, I give my framework about whether I think scaling is enough. Join me on a journey into the murky, controversial depths of AGI speculation.
Link for Download and ...
2024-06-07 19:40:24 +0000 UTC
View Post
A new bonus series (8 episodes) explaining some of the most controversial terms in artificial intelligence, starting with an OG term for LLMs, as 'stochastic parrots'. Find out where the term came from, why it stuck, and enter the debate over whether it is justified.
2024-06-03 13:02:00 +0000 UTC
View Post
This video won’t just show you the problem with a range of the most popular benchmarks (though it will do that, from MMLU-Pro to GPQA, GSM8K, LMSYS and more). It will show you a useable path forward, so that we might finally get benchmarks we can trust, that really get to the underlying capacities of our models. Guest-starring an Andrej Karpathy comment!
Karpathy Comment: https://www.youtube.com/watch...
2024-05-20 14:25:21 +0000 UTC
View Post
Exclusive: The second, eye-opening instalment of AI Insiders the tutorial series on Prompt Injections - Donato Capitella on what the threat is, how it is changing, and what you can do about it, at any level.
Downloadable File for Off-line Viewing: https://drive.google.com/file/d/1yqL3RGnYln4oguHTzXF6yCRd776frYUh/view?usp=sharing
2024-05-17 12:56:20 +0000 UTC
View Post
Let's take a moment to reflect on the import of GPT 4o and the cascading social ramifications of development and after development. Then, I investigate an interesting OpenAI tweet, talk aboutforthcoming guests and go deep on the decision of when to declare AGI (assuming we can define it). I end with some thoughts on the Sutskever departure and what it means in the bigger picture.
https://...
2024-05-15 20:47:35 +0000 UTC
View Post
I believe the model that will end up being popularly known as GPT-5 has finished training. That comes not just from the analysis in my January video but also Sam Altman’s response to a question I put to him, via an Insiders member.
We gathered that he is personally using the model they just finished training, and so his recent interview comments would be made with the initial impressi...
2024-04-28 13:38:44 +0000 UTC
View Post
Two recent papers (DeepMind + Anthropic tag-team) and a failed $10k bet have reminded people not to underestimate what models can learn from the data you give them in the prompt. Let me show you how this can be harnessed to get better results, even if you don’t have great demonstrations at hand. Then we’ll see the bet that lost a start-up founder $10k but hopefully won’t fool you. And end with Anthropic’s jailbreaking revelations and how even they think such jailbreaks not be stoppabl...
2024-04-23 14:56:12 +0000 UTC
View Post
I have always wanted to have a web demo of SmartGPT, to show anyone how powerful basic prompting scaffolds can be. But I wanted it to be even more interesting than what I showed last year, so the iteration I'm sharing today incorporates one clear improvement to the system that got an unofficial 89.0% on the MMLU: you can get models like Claude 3 Opus researching and resolving the outputs of GPT-4 Turbo, and vice versa. You'll need an API key for any model family you are using, but the website...
2024-04-12 15:09:48 +0000 UTC
View Post
Highlights from the interview with Aravind Srinivas, co-founder and CEO of Perplexity. Plus the news today not only of the first hints of instant search from OpenAI but of Google epochal shift to a search generative experience. I’ll put all this, and your questions, directly to the man who is helping shape the future of search.
Exclusive Interview with Aravind Srinivas, CEO of Perplexity - ft. your questions and mine...
FT Google Report: 2024-04-04 20:12:42 +0000 UTC
View Post
Yesterday’s dramatic Bloomberg headlines showcased an ‘AI Jobs Apocalypse’, warnings of ‘millions of jobs lost in next 3-4 years’, triggered by a new 44-page paper from London. I interview the IPPR lead author Carsten Jung and get to the bottom of it all, giving my critiques of the paper, where I agree, why I am optimistic and hopefully show you that you should always look deeper than the headlines. The future of jobs in the age of AI is too important a topic not to.
2024-03-28 23:25:52 +0000 UTC
View Post
The only theme for this episode is unpredictability, from the swirling new rumours of GPT-5 release dates from Business Insider, to the challenges of promoting interviews that don't happen, behind-the-scenes chats, how we can't rely on AGI Lab leader reassurances and key extracts from the portentous essay from the late Vernor Vinge: 'The Coming Technological Singularity'.
GPT-5 Summer?: 2024-03-24 19:09:25 +0000 UTC
View Post
I don’t often do personal updates, I just sprinkle them in, on the off chance anyone wants a bit more behind-the-scenes. Two things come to mind to mention today: the repercussions of not being shocking and of meeting AGI Insiders.
First, is it me or has AI coverage devolved a fair bit more in recent months, across the board? The way algorithms incentivise sensationalism is brutal, and clear, especially as a content creator myself. And trust me, ‘SHOCKING’ works, and not ...
2024-03-15 13:42:37 +0000 UTC
View Post
What are just the most interesting details from the Musk-Altman Lawsuit? Can Gemini 1.5 help me sort through the morass of relevant tweets? I want to give you the history of the battle over the definition of those three key words - artificial general intelligence - and what it means for us all.
Lawsuit: https://www.courthousenews.com/wp-content/uploads/2024...
2024-03-03 21:58:16 +0000 UTC
View Post
This month has seen the launch of a Discord channel that I am very excited about. We have hundreds of incredible people on here at the bleeding edge of implementing and understanding AI, and so naturally, we need a place to exchange best practices, in a friendly and professional environment (which can be rare these days!). For those on our Discord, you know it as ai-professional-tips, for those not, a 30-sec guide on connecting to our Discord via Patreon is attached at the bottom - I would lo...
2024-02-25 18:55:29 +0000 UTC
View Post
Everything you missed in the world of AI threats because of Sora and Gemini. From Compute Overhang @Sama, to a laudable Bioweapon study from OpenAI, and from state-actors using GPT-4 to the future of warfare.
Goody-2: https://www.goody2.ai/chat
State-Actors Use AI: https://openai.com/blog/...
2024-02-22 15:38:51 +0000 UTC
View Post
Take a 14 minute tour with me of the cutting edge of deepfakes, from speech-to-speech to politics, YouTube and business. We'll discuss the upsides, including with a senior figure at Elevenlabs - and you'll get to hear my voice with a different content and personality - business potential, as well as threats and downsides. Including developments from just the last week, new papers, an exclusive interview and more, this is the <15 min primer on all things deepfake.
Elevenlabs -...
2024-02-15 17:20:08 +0000 UTC
View Post
As always, first dibs on questions for my interview guests goes to you guys. And I am lucky enough to able to have Aravind Srinivas, Perplexity founder and CEO, formerly of OpenAI, as a guest later this month. No guarantees for any question but the most upvoted ones will get a very strong look-in!
For those who don't know, Perplexity are being talked of as the Google-search-replacers, though of course time will tell:
New York Times- Can This A.I...
2024-02-08 11:56:25 +0000 UTC
View Post