a year ago

Claude 4 is here but is is good at Convex?

Newsuesday today in AI. Claude 4 has officially dropped and it's supposed to be awesome. I mean, you can't exactly always take a company's word when it comes to benchmarks, but these graphs sure do look impressive. Its pricing is also equally impressive, and by impressive, I mean expensive. But how does it perform on actual day-to-day tasks rather than just synthetic benchmarks? Will it be able to unseat the previous versions of itself and take the crown in the top of the convex leaderboard? That's what we want to find out today. So, grab yourself a lovely cup of tea and let's dive in. All right. So, to find out uh what Claude 4 is like, I'm going to try two different things. I'm going to try a chef project and I'm also going to try my own little benchmark project that I do in tic-tac-toe. So, let's get things started off with Chef. So the teams worked hard and managed to integrate Claude 4 already into Chef. So I'm just going to do Claude for Sonnet and we'll just try the Instagram clone. We'll kick that off. I think at the same time as well, I'm going to see whether we can do both of these at the same time. Get it to do an Instagram clone using Claude 3.5. Okay, so they're both running now. All right. And then the next thing I'm going to do, I've opened up my uh tic-tac-toe project here. So the idea with this is I already have this I have this project, this uh tic-tac-toe game that runs locally. And what I want the AI to do is upgrade it to instead of using local storage to use convex uh for the back end. I just show you quickly uh this project in action. So here here is just quickly in action. mic. Create new game. New game can add an AI and you can play against an AI. Um, so yeah, but the problem is that this is all local. So if we were to open up in another tab, then we create a game, then it's not going to show up on the other one because they're obviously in two entirely separate worlds. So what I've done instead is I have uh my project running goway terminal. So we can see go away terminal and then I have uh installed convex myself and I have the dev server just running already. I used to find that it would get stuck running dev servers in cursor. I'm not sure whether that's still the case but I don't want to mess with it now. But one thing I'm going to do before we stick the prompt in here is I'm going to get the convex rules in here cuz that usually does make a fairly big difference. So, I'm going to create the folder um cursor/ruules and then I'm going to drop the convex rules into there. So, we get the convex rules from the convex docs. We go to the side, go to AI codegen, and then we grab the rules here for cursor. All right. And then I just drag them over to the rules folder. Get rid of that one. And uh we should be good to go now. So, autoattached, we can say always. We'll just set it to always. Just make sure it's always in uh attached when we're doing this one. Okay. So, now let's drop in the prompt. So, I have the prompt here that I always use. So, I'm just going to stick that in. Well, we're going to make sure in cursor that we select claude for sonnet. Uh, so it's already I've already it's already updated. Um, so we have set and opus here, but I'm just going to use sonnet. Drop the prompt in. So, the prompt is convex make the state run on the server. I've already installed convex in the project and have convex dev server running in another terminal window. To do this, we're going to first create a schema. Please look at the existing state in the app and work out what the types that need to go in the schema. Before designing the schema, please take a look at the core functionality of the app. We're going to create the functions. This is queries, mutations, and actions as needed. We are then going to upgrade the React client to use these new queries, mutations, and actions. So, I probably wouldn't typically do this in a project. In fact, I'm going to get start off while I talk. But I probably wouldn't typically do it this way if I was going to do this for real. This is actually quite a lot of work for an LLM to do. So, it's it's a good test of an LLM just to see how well it can do with like a lot of this like a lot of stuff together at the same time. But normally, if you want the best results, probably I would break this up into separate steps. I would be like, "Okay, let's work out what the schema is going to be and then let's work out what the functions are going to be and then we'll just do like checks along each way." But as this is me just kind of like really kind of test the limits here, I like this one for that. Okay, let's uh let's hop back over to Chef and see how it's doing. Oh, okay. So, I think this was the 3.5 one. Yeah, this is the 3.5 one. It's finished. And this is the Oh, it's saying 3.5. I think it this one was four. H I think this one was the four one. The top one. But let's uh let's try signing in an anonymously here. This actually be a good test. I'll probably I'll go back and have a look when we finish this because I don't know which one's which. So this is no like a bias in here. So I'm going to go back and find out afterwards which one's which. But okay. So we got that. So we can um drag and drop an image in here. So let's find a picture of me. Uh pictures. Uh there's picture of Where's pictures of me? Photos of me. My handsome self. There we go. We'll drop a picture of me in there. Is it uploading? Who knows? Did that work? Hey, image uploaded. There we go. Oh, look at that lovely feature. Great. And then my photos. It shows there. Fantastic. And obviously this is should be convex. So, we should be able to pop it out and sign in anonymously again. And if I drop in another photo of me, uh, this one, what I'm the troll face maybe is a bit smaller. Great. We see the troll one in there. And then my photos. We just see the one that I've uploaded. And does the liking work? Oh, the liking worked as well. Fantastic. Great. Nice. So, whichever one that one was, um, I get rid of this preview now. Whichever one this one was, did a really, really good job. Let's have a look at this one. Sign in. Okay. So, visually already looks a little bit different. Um, upload. Okay. Interesting. This one, this one's doing a caption as well. So, let's try uploading that one into there. Okay. Nice. Troll. There we go. Photo uploaded. My photos. Great. It's showing there. And the stream showing there as well. And we got a caption. Nice. I prefer this one. Uh let's just double check. Pop it out and make sure as a we can sign in as a different user and we can upload something else. Uh drag and drop it in again. Okay. Drag and drop that one this time. Um black and white. my photos. Nice. Good. Good. I like it. They both did a really good job. Now, let's add something to this. Um, what should we add? Let me think. I think on this one, I would like it so that after I have uploaded a photo, please im immediately take me to the stream. We'll see how that does, right? And then we'll go back to this one. We'll close down that preview and we'll say on this one, what should we add? Should we add the captions? Uh, when I upload a photo, it should also ask me for a caption. Okay, we'll see where it does that. I think it's going to struggle with this one because if it hasn't uh set the schema to include captions, which it hasn't. So, I'm looking at the schema now for the convex schema. It has likes and it has images. Oh, it has a description. So, maybe it'll use description for the caption. We'll see. Okay, let's hop back into uh cursor and see how it's doing. Oh dear. Okay. Ah, see it's trying to run the PowerShell command to run the dev and I've already told it several times. I told it in the prompt that it doesn't need to do that. This is the thing that kind of frustrates me a little bit sometimes with like cursor is that like it gets stuck sometimes with a convex dev server. So I tell it I'm going to run it myself but then it insists on running it itself as well. So it's like ah come on. Okay, let's go back to down here. So what's it done? It has uh good understanding of the app structure. Let's create the convex schema. Cool. So, let's check out the convex schema that's created. Cool. So, we got players, we got games. Okay, looks reasonable. Now, let's create the main convex functions. So, let's go into the games. Let's have a look. Um yeah, looks reasonable except it seems to have struggling on Oh, I think that's just because it's not being used, but it looks like it's imported. You know, it's correctly understood that, you know, it has to pull from the generated server, which sometimes it doesn't always understand. Um particularly like like older models. Now, let's set up the convex client in the React app. Okay, so it's done that. Update the types to use convex ids. Great. Create a new comic space game and say hook. Fantastic. So, it's done a good job there. So, it's doing all this. Update the roots. Great. Now, let's check update environment. So, it didn't need to do this. I say this is an issue. It couldn't find the end.local in the workspace. It's there. It's here. But the problem is I believe cursor cannot see end files. This is to protect your secret keys and stuff I assume. So that if you put secret keys inside of a N file, it doesn't upload it. But because of that fact, the cursor thinks that we haven't got convex running. So it's a bit of an issue there. Maybe we as convex, we need to fix this. We need to put it inside the rules or some other way that it's still able to realize, okay, convex is running, is installed. Actually, it should just be able to check package. JSON for it being installed, but I guess it doesn't know if it's running. It'd be nice if cursor could inspect Oh, it's not running. Hang on. No, no, no, it is running. It is running. It'd be nice if Cursor could inspect the terminal and check what's actually being running in the terminal. But there may well be security reasons for not doing that. So, where did it actually fall over? So, it tried to run dev, but you know, I'm already running it, so it shouldn't need to do that. Let me run the PowerShell. So, I think it has just bombed out here. So convex is showing me an error. Document ID whatever is does not match the schema object is missing missing the required field. So that could be that I have run this demo before and it's struggling. Let me open the convex dashboard. So the way that I usually open the convex dashboard is I just type bun convex uh dashboard and it gives you a link and it clicks it. But I already have it here. I think I did I have it on here already? No. Let's open it up in the browser and let's have a look. Is there any data in these tables already? Ah, okay. So, I have some players here. Let's delete these players out of here. Games out of here. Okay. So, there's no data in there. So, yep, there we go. So, convex is now able to upload, but I don't know why it's erroring here. Could not find public function. Did you forget to run? Let's just try stopping it and running it again. There we go. Okay, it's all good. I don't know what was happening there, but it's uh it's all good now. So, I think I think is this completely finished? Um, let's try hopping into the actual app and can we refresh? Refresh. Refresh. Refresh on this side as well. Uh, why is this side not refreshing? Fresh. This side's errored. Okay, let's just try this side first. Let's create a game. Can we create a game? No. Okay. Why is that erroring? Let's have a look at the console. We got errors here. Quest objects missing the fields. All right. Okay. We'll copy that and we'll stick that into cursor. Just make sure I'm definitely refreshed. Create. Yeah, that's what we get. We get that error. So, we'll copy that error and we'll stick that into here. Maybe it didn't finish. Um, I'll just tell it you don't need to run convex. I'm already running it. I'm getting this error at runtime when I try to create a game. Tell you what will be nice. I'm sure they're working on this, but a way that it is able to maybe link into the Chrome console. So, is able to see and proactively see the errors and proactively do stuff so I don't have to do this copy pasting. There may well be an MCP server now I think about it that will do this for me. Anyway, leave me a comment down below if you know there's a nice way to do this inside of Cursor. Okay, so it looks like Curs has found the issue and is going to fix this for us. While that's doing that, let's hop back over to uh Chef and see what's going on here. So, after successfully uploading the photo, be automated stream added the upload success. Okay. So, we try uploading another photo. Let's see whether it's going to hop us back into the stream. So, pictures, photos of me. That one. That may have been a big photo. [Music] Hello. Now, ah, it should have taken me back to the stream. After successfully uploading photo, you'd be auto taking the stream. Maybe. Does it need a refresh? H. Let's have a look. Try again. We'll try me with shave shaved head shaved. Okay. Is it going to take us to the stream? It did. Okay. So, apparently it needed a refresh. Interesting. But it did it. It followed the instructions. So, whichever this bottom one was, whether it was four or 3.5, I do apologize again. I got a terrible memory. I will check after this and follow up. Uh, but that did a good job. So, let's have a look what this one's done. So, I asked this one um to do the caption. So, now when you drag and drop an image, you'll see a preview of the image, a text input for adding a caption, share and cancel buttons. Okay. All right. So, let's have a look. All right. So, we So, we have we have to drag and drop on this one. So, let's drag and drop this guy. Uh, I didn't see ah maybe I need to refresh again. Let's try the troll one. It's a bit small. Is it going to give me the Oh, it did. I don't know why it's requiring a refresh there. The hot reloading obviously not working great. Um, troll. So, now we can share. Great. I mean, both of them did a really good job there. So, I mean, I guess Claude 4 isn't worse than 3.5 is my conclusion. Um, with that, let's hop back into cursor and see how it's doing. I don't know why that's open. Go away. Uh, and it's still going. Okay. So, what's it what's it been up to? Uh oh. Oh, you're trying to get around the What are you doing? It's trying to deploy again. Why? What is it doing? Okay, so it gave me the error and it says first let's fix the convex function validator errors. Okay, now let's remove the old game state. Okay, so it's because it we didn't let it finish before. Maybe I wasn't being fair on it. So now it's tidying stuff up. Fix a potential issue in this now. Now what's it doing? Checking this environment variable issue. There isn't. There isn't. And now it's trying to run. Okay, it's trying to get around the fact that it can't look at the end files. That's interesting. By the way, I have YOLO mode turned on. So, I guess if you didn't and you didn't want it to try and get around its own restrictions. Uh you could prevent when it you see it running a terminal command like this, you could uh prevent it. Uh great. There is a end file. So, oh, so it is going to check the Okay, there we go. So it knows that it can't it can't see it. So let me try to create environment available. The Oh, so now it's going to create it. It's getting really confused. Okay, maybe I need to tell it. Um, trust me, bro. The convex end file is there. You just can't see it. move on and finish fixing the things. Interesting. Like it's doing thinking. So, planning next moves. So, by default, Claude 4. Oh, I said thought for 4 seconds, 8 seconds. So, by default, Claude is doing a thinking step, which is kind of cool, actually. Interesting. So, it found the issue. Likely when the current play is restored from local storage, it might not have the proper structure. Let me fix this. Interesting. Maybe it's because of the upgrade process. So before we were saving the player to local storage and so it's still there from local storage and now we've done the upgrade to convex. Convex requires some other fields but we're pulling it from local storage from the previous version. So probably what we should have done is cleared local storage to be fair to this. But it's able to work that out. So okay. Okay. So, I guess it's done. Let's uh let's give it another crack. Let's go over to here. So, go over to our project and let's create a game. Missing ID player mic. Let me try refreshing. Create game. No. You know what? I'm going to be fair. I'm going to wipe the storage here. So, let's go local storage. Just clear all that. And then let's try again. Okay. My key. and see what this does. Create a game. Oh, and another error. H returns object contains extra field creation time is not a validator. All right. Well, that's a shame. I think this we could make this work, but I think I'm going to have to leave this here because this video is going to go on too long. Um the conclusions. So, um, I just went back and had a look and the first Chef one that we did was the one with the captions straight off the bat and actually think it was probably the one that I preferred the most. And the second one we did, it didn't have the captions, but we added them later was the claw 3.5. Um, so I mean I think four did better. It did. It followed the instructions more. But I mean that could well be runto- run variants because I have done this quite a few times in the past and I get different apps each time. So, but in general, I think both of them did a really great job and I think it's just a testament to the Chef team just, you know, continuing to improve the way that the AI is able to build stuff in Chef. just did a really good job and obviously it's got a back end built in with orth and all the kind of lovely stuff that we get on the cursor front. Well, I kind of may have given it a bit of a hard task to begin with. You know, I probably would normally split it down into separate things. So, I'm being a bit unfair with it, but it also may have been issues with me, you know, not clearing local storage. So, I'm going to call jury out on that one. I think I will have to try over the next couple of weeks and see how I do with Claude 4 uh to see, you know, definitively like how it's going to do in cursor. So, I did tease at the start that we're going to have a look at the leaderboard. So, we should probably do that. So, at the top we have Claude 3.5 Sonnet still the number one undisputed champion with 82.4%. Claude 3.7 set its uh older sibling 77.3 and the new Claude 4 Sonnet at 76.5. So still very good model but not the top of the leaderboard. Claude 3.5 Sonnet still does the best on Convex. I don't know is anything going to be able to top that? I mean All right. Well, thanks for watching guys. Um, if you want to check out another video a bit like this one, then you might want to check out this video when I did I went through and compared Chef to a bunch of other vibe coding tools like Bolt and Lovable. Uh, it was a fun video. So, yeah, check it out. And if you have any comments, please do leave them down below or come find me on Discord. Until next time, cheerio.

Claude 4 just dropped from Anthropic—and it’s already sparking debates among developers building full-stack apps with modern backends like Convex. This video tests Claude 4’s real-world dev performance in two live builds: spinning up an Instagram clone in Chef and upgrading a local Tic-Tac-Toe game with Convex as the backend.

If you’re a developer comparing Claude 4 vs Claude 3.5 for agentic codegen, backend integration, or app workflows with Convex + Cursor, this is the walkthrough to watch. From schema generation to real-time photo uploads and environment variable debugging, you’ll see where Claude 4 shines—and where it still struggles compared to its predecessors.

This video doesn’t just look at benchmarks—it shows you how Claude 4 handles practical backend and fullstack coding tasks. You’ll see what works, what breaks, and where it still lags behind Claude 3.5, especially when Convex is in the mix.

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started