SmartGPT Website Demo and Community Project
Added 2024-04-12 15:09:48 +0000 UTCI have always wanted to have a web demo of SmartGPT, to show anyone how powerful basic prompting scaffolds can be. But I wanted it to be even more interesting than what I showed last year, so the iteration I'm sharing today incorporates one clear improvement to the system that got an unofficial 89.0% on the MMLU: you can get models like Claude 3 Opus researching and resolving the outputs of GPT-4 Turbo, and vice versa. You'll need an API key for any model family you are using, but the website is now live at https://smartgpt-ui.vercel.app/en
Who better to share this with than my Patreons? It's not a finished 'product' but a living entity (big thanks already to Yannick Metz, phd student from Germany, for the UI and much more). A paradigm improvement (spoiler) that's coming is automatic prompt optimisation (Josh Stapleton, ML engineer extraordinaire, is leading that effort) and multi-modality (MMMU here we come?). For sure there are still bugs but the code is open-source (see 'Info' for GitHub), ripe for tinkering by anyone who wants a shout-out! :)
Just to take one use case I was doing this morning, from the MATH benchmark: 'How many perfect squares less than 10,000 can be represented as the difference of two consecutive perfect squares?'. Claude 3 Opus gets it wrong more times than not, but with help from GPT-4 Turbo, the final correct answer is given. For code, the other way round, GPT-4 outputting, Claude 3 reviewing, often makes more sense. Aside from reasoning, I also suggest experimenting with how you can use it to try to get around the models not 'seeing' the words because of tokenisation - e.g. asking for line endings, beginnings, counting things for summaries etc.
Aside from being just nifty/cool, I see it as one extra way of getting funding for more research, bigger benchmark runs and maybe a little hired help for me with Explained and Insiders. At the moment all the work on AI Explained/Insiders is solo, including research, narrating, video-editing, interviews, comments, personal emails, Discord etc. So, in addition to the career-saving support on Patreon, which has already kept AI Explained alive as a channel (it was touch and go in Sept-Oct '23), I plan on promoting a free newsletter that will also have a $9/month option. You guys already support me enough, this 'Essentials' extra would only be for new folk who can't afford Insiders but want to support hype-free AI content, get more of my writing (I love to write, and once almost worked at the Economist), see a few of the oldest and no-longer-watched Insiders videos, and play with SmartGPT 2.0 (and then 3.0, etc). If everything works out, I could one day have the funds to nab a top-flight journalist - e.g. from the Information, who got the recent Leopold Aschenbrenner leak - and get even more scoops, interviews and content for Insiders. Or just an assistant. Or maybe pay to clone myself, :)
Either way, have a wonderful day.
- Philip
Comments
is it possible you can add claude 3.5 to model or allow for us to set to newest models?
Pranav S
2024-09-20 12:21:07 +0000 UTCI tried a few different times and I found that the output was poorer than if I simply used Claude 3 or even Gemini Pro 1.5... sorry. I'm happy to share my chat with you, maybe I'm doing something wrong.
Joshua Davis
2024-05-13 01:29:13 +0000 UTCI keep getting this error: Message limit is {{maxLength}} characters. You have entered {{valueLength}} characters.
Joshua Davis
2024-05-13 00:53:37 +0000 UTCWow! I had time to try it out today! I do a lot of data modeling for digital humanities and digital history. I was able to create (almost) a complete data model, just based on historical sources in plain text as input. I already had some sophisticated prompts, but this is the next stage. Right now, agent-based workflows are a hot topic, aren't they? Your smartGPT could even be scaled up by using multiple teams of agents based on smartGPT... what do you think? And today i listened to 4 professors at a panel discussion at my university about "ai and research". all they talked about was how to write papers and that students submit ai texts. at the same time, with the big context windows we already see and the next generation of llm and methods/tools liek smartGPT, you can already imagine that whole research processes could possibly be automated. crazy! :)
Christopher Pollin
2024-04-25 14:44:28 +0000 UTCThanks Philip, I meant stored in clear on your side. Just trying to assess what type of test we should or not perform. thanks!
Robert Gomez-Reino
2024-04-24 11:30:40 +0000 UTCHey Robert, yes if you click save to template top right that will store modified system and researcher/resolver prompts! Trying to keep things fairly user-friendly, while still allowing customisation.
Philip
2024-04-24 10:34:40 +0000 UTCThank you John! Keep us updated.
Philip
2024-04-24 10:30:30 +0000 UTCMy feeling is that this agentic approaches with gpt4-like capabilities need to be balanced with speed. I want it to be fast enough (really few seconds) to be able to steer results and repeat things. We've put substantial effort in previous months in collaboration with CERN and the amount of work to align workers, resolvers, reviewers to any general (in our case engineering) task become increasingly difficult. You end up many times waiting for mins to get an answer not aligned with the project patterns or just wrong. I am now convinced that shorter (but much faster) interactions, even if 60% times only ok, is much more productive because you can steer, request corrections, etc. and thus have a realistic working flow. More agentic could still be ok for more restricted set of results you can tune for agents for IMHO. Otherwise also very interesting to keep researching on this flow, so when the next thing (GPT5?) comes we just need to plug it in :D
Robert Gomez-Reino
2024-04-24 07:28:03 +0000 UTCPhilip, would our prompts be stored somewhere when testing this tool? :D We have our own tailor made version and I would love to check how your's improve our chain of thoughts arch. Our current one requires large system prompts (like 20k tokens or so), I see we don't have this as I assume you are providing your agents your own system prompts, do you think this would make it difficult to work providing this large context information in user prompts?
Robert Gomez-Reino
2024-04-24 07:21:43 +0000 UTCKudos Philip, Yannick, Josh. My use case comes from my teaching. I recently gave a two-question quiz (in a Microsoft Form) to my students to "give me something true (and non-trivial) about the Big Bang" and "give me something false (and non-trivial) about the Big Bang". I input a short rubric plus a student's response, and I'm impressed that SmartGPT can analyze student answers in-depth and give a total score which is consistent with the rubric. Well done!
John Weisenfeld
2024-04-17 09:31:29 +0000 UTCHi Philip, still not working, let me try other browsers. I use Brave, which can be temperamental sometimes.
Kol Tregaskes
2024-04-15 19:29:48 +0000 UTCYes, I did! Thanks a lot. Was just extremely busy with a new product release until Friday & slept through the weekend. Will answer later today!
Leander Maerkisch
2024-04-15 13:17:51 +0000 UTCThank you! And trust me, I am as ambitious for it as you are with these improvements, all while keeping in mind an average user who has never seen an API before.
Philip
2024-04-15 12:49:53 +0000 UTCThank you Sean.
Philip
2024-04-15 12:49:02 +0000 UTCThanks Leander! Hope you got my earlier email re: Alberto!
Philip
2024-04-15 12:48:52 +0000 UTCAlso, it's crazy to think that we're at the vacuum tube stages of genAI - what will the microchip stage look like?
Machiel Reyneke
2024-04-15 09:32:18 +0000 UTCThanks for sharing SmartGPT, Philip! I've tested it, and so far super-impressed :) Used Haiku as the assistant with Opus as the resolver and it worked well for a few typical product management use cases. It'll be interesting to see how these agentic workflows evolve - I would personally love to have a blend of Autogen (with custom agents, including a default setup like the SmartGPT "Assistant", "Researcher", "Resolver" pattern), alongside your implementation of N parallel assistants. I guess later the best would be if a model can classify the initial prompt in terms of difficulty and automatically set up all the necessary agents to do all the interactions, and let the harder steps be N parallel agents with resolvers ala SmartGPT.
Machiel Reyneke
2024-04-15 09:31:33 +0000 UTCPhilip, Relieved to hear you avoided a September demise. We’d all be immeasurably poorer without AI Explained and Insiders. You should be immensely proud, not just in explaining AI, but in shaping its future.
Sean Gallagher
2024-04-15 01:21:56 +0000 UTCWow, congrats on the launch of this new "product"! The folder structure to store similar conversations is already a UI improvement over the classic ChatGPT interface. Well done & excited to test the waters!
Leander Maerkisch
2024-04-14 20:53:14 +0000 UTCThanks so much Dorian, excited for where this will go. I will stick with that phrasing then!
Philip
2024-04-14 13:01:35 +0000 UTCAny updates Kol? :) As pretty much my earliest subscriber on YT...
Philip
2024-04-14 13:01:14 +0000 UTCFantastic, thank you so much for sharing this with us, Philip! By the way, "hype-free AI content" is a very compelling phrasing for me. 👍
Dorian Iten
2024-04-14 06:51:25 +0000 UTCYou'd be surprised what use cases it is good at! Quite a few! So glad to hear Steve :)
Philip
2024-04-12 21:12:44 +0000 UTCI am impressed. It solves my default thinking problem with which I test all models consistently. No model so far could solve it, only the new gpt-4-turbo sometimes gets it right and ChatGPT with browsing and code interpreter is a bit more reliable but still often wrong. What is my default thinking problem: How will the time difference between Munich and Sydney change in 2024? Gives the answer in json format. Start with 01.01: time difference, continue with all dates where the value changes and end with 31.12: time difference. Its a difficult problem, bc Germany and Australia change time at different dates and in different directions.
SteveHaupt
2024-04-12 21:08:45 +0000 UTCKol and Jon, we are getting error messages about incorrect API keys and insufficient balance with Anthropic, and 1 error for quota limit with OpenAI, so do try again at some point with those fixed! Would love your feedback.
Philip
2024-04-12 19:53:49 +0000 UTCThank you so much Christopher, who knows how many are here because of you.
Philip
2024-04-12 19:51:54 +0000 UTCI tried again without changing anything and it worked. I suspect it's a time-out issue, because when it doesn't work, I see "Initial GPT Answers (3 Asks, Model: gpt-4-turbo)" (Generating response), then blankness for 30 seconds or so, then no output and "Regenerate response". No red error message. It's like it gave a response when it actually didn't. Maybe it didn't have time to generate and evaluate three responses before it timed out. Perhaps there's some way to show the progress of the request in more detail? Anyway, I won't flag any more issues in here as it's not the right place. I'll keep playing around with different length requests. Thanks.
Jon Millward
2024-04-12 19:49:58 +0000 UTCWe will try to get to the bottom of it, looking now through our error messages. Did you re-submit the API keys? I checked it and working my end, any red error message up top?
Philip
2024-04-12 19:40:34 +0000 UTCThis is what I see too. It did work once but not since.
Jon Millward
2024-04-12 19:22:22 +0000 UTCSame here, I'm using GPT-4-Turbo, added my API, asked it a question and now I'm stuck on "Initial GPT Answers (3 Asks, Model: gpt-4-turbo):"
Kol Tregaskes
2024-04-12 19:20:05 +0000 UTCOh my. I've been waiting for this for a very long time. Thanks, Phillip.
Kol Tregaskes
2024-04-12 19:19:52 +0000 UTCOoh didn't finish reading since I'm at work, I can definitely do that.
Dane Wagenhoffer
2024-04-12 18:10:40 +0000 UTCWorking on that, first Gemini 1.5 but then those, yep. Feel free to help too!
Philip
2024-04-12 18:10:04 +0000 UTCCan you add Cohere or Mistral API support? I have pretty much switched all my scaled LLM use to cohere so it might be interesting to test
Dane Wagenhoffer
2024-04-12 18:04:27 +0000 UTCJust checked, working for me. If it persists, email via address in Info. Sometimes it times out for really long tasks, with many asks, test it on slightly shorter.
Philip
2024-04-12 17:56:53 +0000 UTCHaha
Philip
2024-04-12 17:56:40 +0000 UTCyep, it's working great Philip!
Stefan
2024-04-12 17:54:37 +0000 UTC{to be read in the most sarcastic tone one could possibly muster up} Philip, if you would simply stop doing research and just believe all the hype out there... All you need to do is take a "FREE 10-minute course on AI", "subscribe to ChatGPT", or "download an AMAZING and FREE open-source model from Hugging Face". You don't need to hire anybody. Just "use AI". Obviously, AI can "easily" simplify every aspect of your personal and professional life, as well as all your job functions. "Hire an AI agent" sit back and let the subscription dollars roll in!
Tony Coffman
2024-04-12 17:53:24 +0000 UTCIt's not outputting any responses for me despite giving it API keys for Claude and GPT, both with credit. I see the credits being used in Usage for both, but no output...hmmm. Will keep trying.
Jon Millward
2024-04-12 17:10:34 +0000 UTCI'll definitely have to give it a looksee in that case. My only concern would be the, I imagine, ginormous API charge if I roll it out often enough. I do two articles a day, six days a week. Some of these things go up to 10k words. Not all, but some. I guess I'll start testing by throwing in a whopper and see how much it'll cost. Thanks for all your hard work, Philip. You really stand out among the crowd of AI content creators, all without having to publish everyday (on potential nothingburgers) and SHOCKING the audience with your titles!
Norfuer
2024-04-12 16:08:41 +0000 UTCThx! Your YouTube channel and Patreon are already the most recommended and quoted resource in all my workshops! :)
Christopher Pollin
2024-04-12 16:03:03 +0000 UTCI mean, if you share it in the context of promoting the Patreon that sustains it, then I would be more than happy! For most situations that might be in the form of a visual demo from you that they can then play with if they sign up, but for key networkers you feel might really be able to spread the word then a simple link would be effective. So semi-private is how I would best phrase it (if I wanted 100k users, I would just put the link on my channel!). I trust your judgement, this post was more to thank my Patrons and get early feedback before we iterate in the coming weeks.
Philip
2024-04-12 16:00:25 +0000 UTCI have found it great for summaries. A few individual calls often miss a detail here and there, but rarely after going through the full SGPT process. Obviously might have to adapt the system prompts for your use case.
Philip
2024-04-12 15:59:49 +0000 UTCFor work, I summarise (often lengthy and highly technical) scientific papers for social media. Been using a GPT for it since that launched, but I feel it's not quite as smart as it used to be. The outputs tend to be too vague and lack the meat of what I'd like to be a more in-depth but still comprehensible summary. I wonder if this is a valid use case for SmartGPT, to have it output layman-friendly but accurate summaries of often highly technical material.
Norfuer
2024-04-12 15:58:24 +0000 UTCYeah, this system alone would get SOTA on reasoning, but multi-modality is where it gets really exciting.
Philip
2024-04-12 15:51:47 +0000 UTCcool! i do a lot (really a lot) of genai & prompt engineering workshops for researchers (data modelling, creation, programming, thesis and proposal writing, etc.). having something with more reasoning power would be really cool to do some showcases. is it ok to share the link, or should it stay internal? and if so, how should it be cited :) In other words, is it in your interest to get it out into the world, if others show it at workshops? best, Christopher
Christopher Pollin
2024-04-12 15:48:37 +0000 UTCSome very impressive first impressions on a variety of questions, very interesting to see how it works with nuanced tasks like creating repair plans from images. Nice Job Philip!
Trenton Dambrowitz
2024-04-12 15:43:34 +0000 UTCI’ll play around a bit this weekend. Congrats on getting it published.
Nathaniel DiMemmo
2024-04-12 15:12:42 +0000 UTCCan’t wait to try it out!
Adrian Ott
2024-04-12 15:12:05 +0000 UTC