AIExplained

SmartGPT Website Demo and Community Project

Added 2024-04-12 15:09:48 +0000 UTC

I have always wanted to have a web demo of SmartGPT, to show anyone how powerful basic prompting scaffolds can be. But I wanted it to be even more interesting than what I showed last year, so the iteration I'm sharing today incorporates one clear improvement to the system that got an unofficial 89.0% on the MMLU: you can get models like Claude 3 Opus researching and resolving the outputs of GPT-4 Turbo, and vice versa. You'll need an API key for any model family you are using, but the website is now live at https://smartgpt-ui.vercel.app/en

Who better to share this with than my Patreons? It's not a finished 'product' but a living entity (big thanks already to Yannick Metz, phd student from Germany, for the UI and much more). A paradigm improvement (spoiler) that's coming is automatic prompt optimisation (Josh Stapleton, ML engineer extraordinaire, is leading that effort) and multi-modality (MMMU here we come?). For sure there are still bugs but the code is open-source (see 'Info' for GitHub), ripe for tinkering by anyone who wants a shout-out! :)

Just to take one use case I was doing this morning, from the MATH benchmark: 'How many perfect squares less than 10,000 can be represented as the difference of two consecutive perfect squares?'. Claude 3 Opus gets it wrong more times than not, but with help from GPT-4 Turbo, the final correct answer is given. For code, the other way round, GPT-4 outputting, Claude 3 reviewing, often makes more sense. Aside from reasoning, I also suggest experimenting with how you can use it to try to get around the models not 'seeing' the words because of tokenisation - e.g. asking for line endings, beginnings, counting things for summaries etc.

Aside from being just nifty/cool, I see it as one extra way of getting funding for more research, bigger benchmark runs and maybe a little hired help for me with Explained and Insiders. At the moment all the work on AI Explained/Insiders is solo, including research, narrating, video-editing, interviews, comments, personal emails, Discord etc. So, in addition to the career-saving support on Patreon, which has already kept AI Explained alive as a channel (it was touch and go in Sept-Oct '23), I plan on promoting a free newsletter that will also have a $9/month option. You guys already support me enough, this 'Essentials' extra would only be for new folk who can't afford Insiders but want to support hype-free AI content, get more of my writing (I love to write, and once almost worked at the Economist), see a few of the oldest and no-longer-watched Insiders videos, and play with SmartGPT 2.0 (and then 3.0, etc). If everything works out, I could one day have the funds to nab a top-flight journalist - e.g. from the Information, who got the recent Leopold Aschenbrenner leak - and get even more scoops, interviews and content for Insiders. Or just an assistant. Or maybe pay to clone myself, :)

Either way, have a wonderful day.

- Philip

Comments

is it possible you can add claude 3.5 to model or allow for us to set to newest models?

Pranav S

2024-09-20 12:21:07 +0000 UTC

I tried a few different times and I found that the output was poorer than if I simply used Claude 3 or even Gemini Pro 1.5... sorry. I'm happy to share my chat with you, maybe I'm doing something wrong.

Joshua Davis

2024-05-13 01:29:13 +0000 UTC

I keep getting this error: Message limit is {{maxLength}} characters. You have entered {{valueLength}} characters.

Joshua Davis

2024-05-13 00:53:37 +0000 UTC

Wow! I had time to try it out today! I do a lot of data modeling for digital humanities and digital history. I was able to create (almost) a complete data model, just based on historical sources in plain text as input. I already had some sophisticated prompts, but this is the next stage. Right now, agent-based workflows are a hot topic, aren't they? Your smartGPT could even be scaled up by using multiple teams of agents based on smartGPT... what do you think? And today i listened to 4 professors at a panel discussion at my university about "ai and research". all they talked about was how to write papers and that students submit ai texts. at the same time, with the big context windows we already see and the next generation of llm and methods/tools liek smartGPT, you can already imagine that whole research processes could possibly be automated. crazy! :)

Christopher Pollin

2024-04-25 14:44:28 +0000 UTC

Thanks Philip, I meant stored in clear on your side. Just trying to assess what type of test we should or not perform. thanks!

Robert Gomez-Reino

2024-04-24 11:30:40 +0000 UTC

Hey Robert, yes if you click save to template top right that will store modified system and researcher/resolver prompts! Trying to keep things fairly user-friendly, while still allowing customisation.

Philip

2024-04-24 10:34:40 +0000 UTC

Thank you John! Keep us updated.

Philip

2024-04-24 10:30:30 +0000 UTC

My feeling is that this agentic approaches with gpt4-like capabilities need to be balanced with speed. I want it to be fast enough (really few seconds) to be able to steer results and repeat things. We've put substantial effort in previous months in collaboration with CERN and the amount of work to align workers, resolvers, reviewers to any general (in our case engineering) task become increasingly difficult. You end up many times waiting for mins to get an answer not aligned with the project patterns or just wrong. I am now convinced that shorter (but much faster) interactions, even if 60% times only ok, is much more productive because you can steer, request corrections, etc. and thus have a realistic working flow. More agentic could still be ok for more restricted set of results you can tune for agents for IMHO. Otherwise also very interesting to keep researching on this flow, so when the next thing (GPT5?) comes we just need to plug it in :D

Robert Gomez-Reino

2024-04-24 07:28:03 +0000 UTC

Philip, would our prompts be stored somewhere when testing this tool? :D We have our own tailor made version and I would love to check how your's improve our chain of thoughts arch. Our current one requires large system prompts (like 20k tokens or so), I see we don't have this as I assume you are providing your agents your own system prompts, do you think this would make it difficult to work providing this large context information in user prompts?

Robert Gomez-Reino

2024-04-24 07:21:43 +0000 UTC

Kudos Philip, Yannick, Josh. My use case comes from my teaching. I recently gave a two-question quiz (in a Microsoft Form) to my students to "give me something true (and non-trivial) about the Big Bang" and "give me something false (and non-trivial) about the Big Bang". I input a short rubric plus a student's response, and I'm impressed that SmartGPT can analyze student answers in-depth and give a total score which is consistent with the rubric. Well done!

John Weisenfeld

2024-04-17 09:31:29 +0000 UTC

Hi Philip, still not working, let me try other browsers. I use Brave, which can be temperamental sometimes.

Kol Tregaskes

2024-04-15 19:29:48 +0000 UTC

Yes, I did! Thanks a lot. Was just extremely busy with a new product release until Friday & slept through the weekend. Will answer later today!

Leander Maerkisch

2024-04-15 13:17:51 +0000 UTC

Thank you! And trust me, I am as ambitious for it as you are with these improvements, all while keeping in mind an average user who has never seen an API before.

Philip

2024-04-15 12:49:53 +0000 UTC

Thank you Sean.

Philip

2024-04-15 12:49:02 +0000 UTC

Thanks Leander! Hope you got my earlier email re: Alberto!

Philip

2024-04-15 12:48:52 +0000 UTC

Also, it's crazy to think that we're at the vacuum tube stages of genAI - what will the microchip stage look like?

Machiel Reyneke

2024-04-15 09:32:18 +0000 UTC

Thanks for sharing SmartGPT, Philip! I've tested it, and so far super-impressed :) Used Haiku as the assistant with Opus as the resolver and it worked well for a few typical product management use cases. It'll be interesting to see how these agentic workflows evolve - I would personally love to have a blend of Autogen (with custom agents, including a default setup like the SmartGPT "Assistant", "Researcher", "Resolver" pattern), alongside your implementation of N parallel assistants. I guess later the best would be if a model can classify the initial prompt in terms of difficulty and automatically set up all the necessary agents to do all the interactions, and let the harder steps be N parallel agents with resolvers ala SmartGPT.

Machiel Reyneke

2024-04-15 09:31:33 +0000 UTC

Philip, Relieved to hear you avoided a September demise. We’d all be immeasurably poorer without AI Explained and Insiders. You should be immensely proud, not just in explaining AI, but in shaping its future.

Sean Gallagher

2024-04-15 01:21:56 +0000 UTC

Wow, congrats on the launch of this new "product"! The folder structure to store similar conversations is already a UI improvement over the classic ChatGPT interface. Well done & excited to test the waters!

Leander Maerkisch

2024-04-14 20:53:14 +0000 UTC

Thanks so much Dorian, excited for where this will go. I will stick with that phrasing then!

Philip

2024-04-14 13:01:35 +0000 UTC

Any updates Kol? :) As pretty much my earliest subscriber on YT...

Philip

2024-04-14 13:01:14 +0000 UTC

Fantastic, thank you so much for sharing this with us, Philip! By the way, "hype-free AI content" is a very compelling phrasing for me. 👍

Dorian Iten

2024-04-14 06:51:25 +0000 UTC

You'd be surprised what use cases it is good at! Quite a few! So glad to hear Steve :)

Philip

2024-04-12 21:12:44 +0000 UTC

I am impressed. It solves my default thinking problem with which I test all models consistently. No model so far could solve it, only the new gpt-4-turbo sometimes gets it right and ChatGPT with browsing and code interpreter is a bit more reliable but still often wrong. What is my default thinking problem: How will the time difference between Munich and Sydney change in 2024? Gives the answer in json format. Start with 01.01: time difference, continue with all dates where the value changes and end with 31.12: time difference. Its a difficult problem, bc Germany and Australia change time at different dates and in different directions.

SteveHaupt

2024-04-12 21:08:45 +0000 UTC

Kol and Jon, we are getting error messages about incorrect API keys and insufficient balance with Anthropic, and 1 error for quota limit with OpenAI, so do try again at some point with those fixed! Would love your feedback.

Philip

2024-04-12 19:53:49 +0000 UTC

Thank you so much Christopher, who knows how many are here because of you.

Philip

2024-04-12 19:51:54 +0000 UTC

I tried again without changing anything and it worked. I suspect it's a time-out issue, because when it doesn't work, I see "Initial GPT Answers (3 Asks, Model: gpt-4-turbo)" (Generating response), then blankness for 30 seconds or so, then no output and "Regenerate response". No red error message. It's like it gave a response when it actually didn't. Maybe it didn't have time to generate and evaluate three responses before it timed out. Perhaps there's some way to show the progress of the request in more detail? Anyway, I won't flag any more issues in here as it's not the right place. I'll keep playing around with different length requests. Thanks.

Jon Millward

2024-04-12 19:49:58 +0000 UTC

We will try to get to the bottom of it, looking now through our error messages. Did you re-submit the API keys? I checked it and working my end, any red error message up top?

Philip

2024-04-12 19:40:34 +0000 UTC

This is what I see too. It did work once but not since.

Jon Millward

2024-04-12 19:22:22 +0000 UTC

Same here, I'm using GPT-4-Turbo, added my API, asked it a question and now I'm stuck on "Initial GPT Answers (3 Asks, Model: gpt-4-turbo):"

Kol Tregaskes

2024-04-12 19:20:05 +0000 UTC

Oh my. I've been waiting for this for a very long time. Thanks, Phillip.

Kol Tregaskes

2024-04-12 19:19:52 +0000 UTC

Ooh didn't finish reading since I'm at work, I can definitely do that.

Dane Wagenhoffer

2024-04-12 18:10:40 +0000 UTC

Working on that, first Gemini 1.5 but then those, yep. Feel free to help too!

Philip

2024-04-12 18:10:04 +0000 UTC

Can you add Cohere or Mistral API support? I have pretty much switched all my scaled LLM use to cohere so it might be interesting to test

Dane Wagenhoffer

2024-04-12 18:04:27 +0000 UTC

Just checked, working for me. If it persists, email via address in Info. Sometimes it times out for really long tasks, with many asks, test it on slightly shorter.

Philip

2024-04-12 17:56:53 +0000 UTC

Haha

Philip

2024-04-12 17:56:40 +0000 UTC

yep, it's working great Philip!

Stefan

2024-04-12 17:54:37 +0000 UTC

{to be read in the most sarcastic tone one could possibly muster up} Philip, if you would simply stop doing research and just believe all the hype out there... All you need to do is take a "FREE 10-minute course on AI", "subscribe to ChatGPT", or "download an AMAZING and FREE open-source model from Hugging Face". You don't need to hire anybody. Just "use AI". Obviously, AI can "easily" simplify every aspect of your personal and professional life, as well as all your job functions. "Hire an AI agent" sit back and let the subscription dollars roll in!

Tony Coffman

2024-04-12 17:53:24 +0000 UTC

It's not outputting any responses for me despite giving it API keys for Claude and GPT, both with credit. I see the credits being used in Usage for both, but no output...hmmm. Will keep trying.

Jon Millward

2024-04-12 17:10:34 +0000 UTC

I'll definitely have to give it a looksee in that case. My only concern would be the, I imagine, ginormous API charge if I roll it out often enough. I do two articles a day, six days a week. Some of these things go up to 10k words. Not all, but some. I guess I'll start testing by throwing in a whopper and see how much it'll cost. Thanks for all your hard work, Philip. You really stand out among the crowd of AI content creators, all without having to publish everyday (on potential nothingburgers) and SHOCKING the audience with your titles!

Norfuer

2024-04-12 16:08:41 +0000 UTC

Thx! Your YouTube channel and Patreon are already the most recommended and quoted resource in all my workshops! :)

Christopher Pollin

2024-04-12 16:03:03 +0000 UTC

I mean, if you share it in the context of promoting the Patreon that sustains it, then I would be more than happy! For most situations that might be in the form of a visual demo from you that they can then play with if they sign up, but for key networkers you feel might really be able to spread the word then a simple link would be effective. So semi-private is how I would best phrase it (if I wanted 100k users, I would just put the link on my channel!). I trust your judgement, this post was more to thank my Patrons and get early feedback before we iterate in the coming weeks.

Philip

2024-04-12 16:00:25 +0000 UTC

I have found it great for summaries. A few individual calls often miss a detail here and there, but rarely after going through the full SGPT process. Obviously might have to adapt the system prompts for your use case.

Philip

2024-04-12 15:59:49 +0000 UTC

For work, I summarise (often lengthy and highly technical) scientific papers for social media. Been using a GPT for it since that launched, but I feel it's not quite as smart as it used to be. The outputs tend to be too vague and lack the meat of what I'd like to be a more in-depth but still comprehensible summary. I wonder if this is a valid use case for SmartGPT, to have it output layman-friendly but accurate summaries of often highly technical material.

Norfuer

2024-04-12 15:58:24 +0000 UTC

Yeah, this system alone would get SOTA on reasoning, but multi-modality is where it gets really exciting.

Philip

2024-04-12 15:51:47 +0000 UTC

cool! i do a lot (really a lot) of genai & prompt engineering workshops for researchers (data modelling, creation, programming, thesis and proposal writing, etc.). having something with more reasoning power would be really cool to do some showcases. is it ok to share the link, or should it stay internal? and if so, how should it be cited :) In other words, is it in your interest to get it out into the world, if others show it at workshops? best, Christopher

Christopher Pollin

2024-04-12 15:48:37 +0000 UTC

Some very impressive first impressions on a variety of questions, very interesting to see how it works with nuanced tasks like creating repair plans from images. Nice Job Philip!

Trenton Dambrowitz

2024-04-12 15:43:34 +0000 UTC

I’ll play around a bit this weekend. Congrats on getting it published.

Nathaniel DiMemmo

2024-04-12 15:12:42 +0000 UTC

Can’t wait to try it out!

Adrian Ott

2024-04-12 15:12:05 +0000 UTC

More Creators

freshydessy/Xintage

patreon

sigerufn

fanbox

KautoAskt

patreon

苔ちゃん

fantia

改良センター所長

fantia

Shinji Mito

fanbox

obhan

patreon

spokeninreverse

fanbox

わんコメ

fanbox

Thanuki

fanbox

pinart

patreon

AstronsBasement

patreon

tiktikkobold

patreon

dizdoodz

gumroad

登川とが

fanbox

ayumu98

patreon

LapisLazuli

fanbox

ぼっしー避難先

fanbox

Wwavy Newton

patreon

bakedpoptart

patreon

Linarahe

gumroad

Merk

gumroad

ウォーター

fanbox

emzurl

patreon

KAT2SKATE

patreon

AsmrAspen

patreon

Ai Holoart

patreon

Raichiyo

gumroad

Lotus Okami

gumroad

赤井ほっぺ

fanbox

SprinkleVerse

patreon

bonuslevel

patreon

miniMania

patreon

CloudBird

patreon

cringeworthington

patreon

fatgainer96

patreon

ryokoneko

patreon

The Outer Vaults

patreon

Lucazu

patreon

NewMusclegirlPassion_88585

patreon