a year ago

Streaming vs. Syncing: Why Your Chat App Is Burning Bandwidth

Are you looking to build a top tier AI chat experience on Convex? Then you might want to check out the persistent text streaming component. It solves one of the really thorny optimization problems that you will probably encounter when you try to implement chat streaming on Convex. To understand the problem a little bit more, let's take a look at an example. Okay, so this is a fairly basic AI chat app. So we can ask it something. Uh, please tell me a story. Close enough. Um and then we'll see the uh result come streaming in. Now the question is what happens if we refresh the page uh when it comes streaming in. So let's try again. Please tell me a story and then we refresh the page. Uh-oh. The stream has stopped. Nothing's being persisted to the database. So it's not a particularly great user experience. Ideally, we would love to have it that the stream continues on streaming in and then when they refresh the page, we see what the message was. To solve this, we can leverage convex by persisting the stream to the database. Let's have a look at a diagram to explain what I mean. So, right now the user is uh making a request to the server. The server is making a request to open AAI which is then coming back uh with the stream and it's being streamed back to the client. So we're getting chunks of data that are then going from the ser from OpenAI to the server back to the browser. And what we're going to do now is we're going to add a convex database uh convex here into the mix. And instead what's going to happen is the user is going to make a request to the server. The server is going to get open open AI and then we're going to push instead of pushing the stream down to the user we're going to push it into the convex database. And then because of the way that Convex works, Convex is going to send automatically sync the updates down to the user. So let's have a look what that looks like. Okay. So now if we ask the question, please tell me a story, we will see that it's going to start coming in. And if we refresh midway through, it's continued to stream in and we have it right there. So we can even do a split screen. If we do it again, we clear chat. Please tell me a story, you'll see that it's going to come in on both sides. There is an issue here though. So if I just close down this side and we open up the inspector, then we go to the network tab and we look for maybe I need to refresh. uh we look for sync here. We look for the websocket. So this is the convex websocket. This is how it does uh the updating, automated updating. Now if I just clear here and I clear the chat and then I ask again, tell me a story and I send it. What we're going to do is we're going to see a lot of messages come in here. And if we click on one of these and we inspect it, we're going to see modifications type query updated and we're going to see the text of the story and it's the entire text. So if we have a look at another one like one a little bit later modification value. Oh, you might not be able to see that quite so easily. Let me just zoom in a bit. But here we can see that it's the entire text of uh the story that's coming in. So let's have a look at the diagram why this is an issue. All right. All right. So, what's happening here is that what when we start the streaming, OpenAI is going to send us a chunk down to the server and then we're going to save that chunk into the database, but we're going to insert it into the same row in the database. So, as another chunk comes in, we're not putting it down like that. We are actually just going to merge it and we're going to say this is a story about and then obviously when the next chunk comes in a cat called it's going to go in here and about a cat like this. And of course each time we this row in the database gets updated it's going to get pushed in in entirety down to the client. So that's why we're seeing um the entire message being sent down. It's kind of a lot of data to be sending back um for this particular use case where we were doing like high frequency updates. Okay. So, how did things work before we added convex? So, I've reverted the code back again uh to the version we had before with the streaming. And so, let's have a look at see what happens in the inspector this time. So, let's do uh tell me a story. Uh we press enter and we get this chat stream HTTP response. And if we open up the response tab, we can see that it's coming in word by word into the response. And then on the client side, we're just building up a string from each of those chunks that comes in. Make makes sense, right? So this is going to obviously ma massively reduce the amount of bandwidth that is required. So rather than sending each um entire message as it as it gets built up, we're just sending the chunk each time. So it's obviously like a linear amount of bandwidth rather than like an exponentially growing amount. So what can we do here? What we want really is we want like a combination of both of these. We want to still be able to have that, you know, two-way communication and streaming between these two, but we also want to be able to persist it into the database at the same time. We need something like persistent text streaming maybe. So that's exactly what this component is going to do. is going to stream the text back to the user while also at the same time persisting into the database. Let's take a look at it in action. Okay, so let's try it out. Uh tell me uh story. Now if I press send, we're going to see that we get that HTTP uh goodness with the streaming. But then if we also refresh the page, hey, it's still there and it's finished off the message. And yeah, now if we have uh side by side and I to do it again, tell me a story. Notice that on the left hand side we're streaming in word by word, but on the right hand side we're streaming in sentence by sentence or paragraph by paragraph. So, what that means is that we're going to be reducing the number of updates that you're going to be getting in. If somebody else has got another tab open, you've got another tab open or somebody else is looking at the same chat, you're going to get fewer updates coming in, but the person who sent that message is going to get a really nice responsive experience and we're also going to minimize the amount of bandwidth that we're sending to them by just sending them that word by word as it comes in. So, how does all this work? Um, well, let's hop into the code and I will show you. Okay, so this is a pretty standard React V app. Um, at the top level we have this app component and the side there we have this chat window component. Um, and this is the part that's going to also contain the message box. So if we go down to the message box here, um, where is it? Uh, the button here, the and the input field. And then then we have the form on submit here. And what that's going to do is it's going to call this send message function which is a handler from our use mutation. And a mutation is if you're not familiar with convex is a serverless function uh that lives in the cloud and it's transactional and all that kind of stuff. But you don't need to know that for now. But what it's going to do is it's going to call the first thing it's going to do is call into the streaming component to create a stream. So the streaming component is a component obviously provided by the convex persistent text streaming uh component library and the way that we set up the component. I should probably just mention is in the convex.config.ts. This is a file by convention is named this and then this is how we uh add a component in convex. We define an app and then we add the uh component to it. So then we can create the API to access that component and um then we can inside our send message function we can access the uh create stream. So that's going to give us a response stream ID. So this is a way of that the component uses to manage the chunks that get added to the database. Maybe we should just have a quick look at uh what it looks like in the convex dashboard actually. So this is the convex dashboard. It allows us to see all the data in the database. We only have one table which is the use messages table. Um, and it just lists the messages and the response stream like we just saw in the code. But what we can do is click here this little dropown. We make this a little bit bigger. Uh, and then we can click this persistent text streaming uh thing here which is the component components own little internal um namespaced version of the database. So we can see here that there's two tables. We've got streams and we've got chunks. So the streams are the way that the component aggregates the data together uh aggregates the chunks together and the chunks is the thing that actually contains the data. So you can see that here is is part of the story and the way we get the entire message is we just say okay give me all chunks that match this stream ID. So if we hop back into the code now, so that's what's going to happen when we hit send message. We're going to ask the component to create a stream. So it gives us a stream ID. So ready to insert chunks into the database. And we're also going to insert a message into our own table so we can keep uh track of what the user said. So then back on the client, we're going to use the convex use query hook plus this uh query here, list messages. And all that's going to do is just going to return us every single message in the users table. And once we have that messages, we can go down here and we can iterate over them. The each message is going to be a user message plus the response to that message. So the first message will be whatever the user sent and we're saying is true and that's the message there. And then the second part will be the response to the message. So we're saying is user is false and then we're going to pass in as a child this server message. So if we open that up, we see where the meat of this component is really is this stream hook here. So what we're going to pass to that is a function a convex function query here get stream body which given a stream is going to use the component again to get stream body. And what that component function is going to do is just do what I just said before, which is it's going to look in that chunks table and it's going to return every single chunk that matches that and then it's going to collapse that together into a single message, the body of the stream. And the second parameter is going to be this actual URL that's going to start the streaming process off. So this is done as a URL rather than a convex function because it's actually a HTTP endpoint. So let's have a look here in the HTTP file. So this is again if you're not familiar with convex this is how convex uh defines rest endpoints that you can call from the client side. So by calling the function uh the file http and then exporting um a http router we can define um uh rest endpoints on it. And one of them is the options. So to get around calls and then the second one is this post endpoint. If we have a look at the handler, we can open this up and we can see that it's a HTTP action that is going to take in the the body which is going to contain a stream ID and then we're going to call into the streaming component and we are going to uh do the main body of the the stream. So, which is where we're going to call open AI and we're going to tell it we're going to want to start streaming. And then we're going to as the stream comes back, we're going to take each chunk of the stream and we're going to call this append function that the streaming component gives us. And then the streaming component is then going to return a response. And it's going to return the response as a streaming response. So when we do append the component is going to immediately return that as part of the streaming response the HTTP streaming response but it's also going to persist that into the database in a you know in an intelligent way by you know per sentence or per paragraph or whatever it's set to. So then back in the server message component we can we get back the text and the status and you'll notice that we have this is driven here. So the third parameter is driven. So driven means that are we the one that started this off? Are we the driver of this streaming uh or are we not? And if we are, this is the magic. If we are the driver, then we're going to use the streaming the HTTP streaming response because we're the one that started it. We have that connection, that HTTP connection. Otherwise, we are another tab that's open or we are ourselves in the future. Therefore, we're not a driver and we want to rely on the convex's synced version of that data. And then as for rendering, pretty simple. We just render it as markdown. So that's about it for the most of the code. Uh most important code. Again, the main thing that it's doing is it's managing that switching between are we the one that's driving the chat and therefore we need to HTTP stream versus are we um somebody else in another tab and therefore we need to see the persisted per you know sentence per paragraph version of this. So there we have it. If you're planning on building a streaming AI app using convex, you might want to take a look at the persistent text streaming component. I do want to mention here that I would consider this kind of an optimization. You can get easily get quite far by just streaming the data straight into the convex database um without having to do this HTTP streaming part. Uh it won't be as efficient, but it will work great and I've used it in the past plenty of times. If however you're planning on doing a more advanced agentic chat experience, you might want to subscribe to the channel as I have another video coming up that's going to go into a lot more depth in another convex component, an agentic convex component that not only does the persistent text streaming, but also assists obviously with agentic abilities. In the meantime, if you're looking for something to watch next, I can recommend this video I did recently about an advanced agentic app I built using Convex. It was a heap of fun. Until next time, thanks for watching. Cheerio.

If you’re a developer building an AI chat app and want to ensure seamless user experiences—even during page refreshes or across multiple tabs—this video is for you. It introduces Convex’s Persistent Text Streaming component, a solution designed to handle real-time streaming and data persistence challenges in AI chat applications.

The Convex component to get started: https://www.convex.dev/components/persistent-text-streaming

🔍 What You’ll Learn • How to implement real-time AI chat streaming using Convex. • Techniques to persist chat data, ensuring continuity across sessions and devices. • Strategies to optimize bandwidth and reduce redundant data transmission. • Integration of Convex’s serverless functions and database for efficient backend operations. By the end of this video, you’ll have a clear understanding of how to build a robust, real-time AI chat application that maintains state and performance, even under challenging conditions.

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started