Shop Talk: Building an AI-Powered Voice-Controlled Shopping Agent with Daily Bots and Convex
how can I assist you today okay so this is shop talk it's a chopping list app that you can control entirely with your voice and to show it off today I'm going to do something a little bit risky I'm going to invite my boss Jamie on to help me build a shopping list together hi Jamie hey I'm ready to build a list let's do it so I think since joining convex I've noticed that AI is just starting to take over the world and before some clever startup perhaps called Skynet comes along um I think it might be wise we do a little bit of prep work and maybe produce a doomsday shopping list what do you think yeah let's prepare for the end of the world that sounds like fun let's have a go please create a AI doomsday shopping list the AI doomsday shopping list open that list what items would you like to add open that list please the AI doomsday shopping list is now open maybe I'll add what would you like to do next first to that list um add some rice [Music] please rice has been added to the list there it is covered how about we get water covered now let me add something here can we please add water purification tablets to the list water purification tablets have been added to the list nice um let's add some weapons so we can fight off the AI hordes weapons have been added to the list oh man what's next oh that's awesome it just spread it all out and added all individual things there nice all right okay cool let me try something here um uh could we uh remove anything that requires ammunition cuz we might run out all items that require am has been removed from the list uh we could probably correct it right let's see if it um I think that a baton didn't require ammunition right like we could probably keep that one the it is nice well I I like the starting point here Mike we've got like our food our water and a means of Defending it I feel like we've we've made progress we've we've got the minimal Essentials covered I think yeah yeah okay okay well thanks for that Jamie um and um yeah I'll see you soon all right thanks Mike see you bye well that went okayish um now you know what I built let's have a look and see how I built it so before we begin Let's uh let's just take a quick high level view of how all this works so um we have our shop talk app uh which is a standard react app and inside there we have our daily bot client so that is the meat of the project it uh it it controls the voice the text to speech the the speech to text and the uh llm integration and then we have the convex client which talks to the convex back end which has actions mutations and queries in it and that allows us to then sync our state down to the other devices that are connected to the back end so obviously the headline feature of shop talk is its ability for your ability to talk to it and have it talk back to you and the way that I achieved that is by using a really really cool platform Library whatever you want to call it called daily Bots and what daily Bots does is it uh combines several tools together into a single cohesive hole allowing you to make an app like this so it does the speech to text part which is translates what I say into text and then it does the uh llm part so it takes that text and feeds it into a large language model to do the processing and then it takes the text that the large language model takes back out and converts that into text to play via sound back to me so we can hear what the AI is saying but just to be clear daily Bots isn't a model in itself it's a more of an orchestration platform that combines various Services together and then lays on some features on top such as Interruption handling and background noise reduction and multi-turn context management these are things low-level features You' have to write yourself so it's great that daily Bots can take care of that for you so you don't have to do it yourself and so for the specifics of this project um I used the speech to text model uh deep gram I used uh the textto speech model cartesia and the llm is just open AI so um gbt 4 mini if this was to be like a real like business project I probably would experiment with what I used here to see which ones work better or not and depending upon your cost and your business model and so and so forth so now that we have our daily bot service in place the way that we get that to interact with the rest of our application is via function calling so function calling works by giving the llm a number of tools that it can use and the way it works is like when you ask the llm a question from for example please create me a new shopping list called Date night it may return with some text that's intended for you such as sure I will do that now but then it will also provide some small piece of hidden text which will be structured in a way that it is to execute one of these functions so here is some functions so for example we have the function create shopping list which creates a new shopping list then we have a function for get current list get the current route cuz forget from the demo you can change the route or the page using your voice um you can add items to a list you can add a single item to a list we can update and so on and so forth so just note that these are only function definitions so these are only what we passed to the llm when we pass the prompt to tell it the thing the tools that it can use um when the model responds um and it says in its little hidden output I want to use a certain tool we then have to handle that so where that comes in is in the function call Handler and this happens on the client side so um the way I've implemented it is in a is in a react component if we just scroll down here to the llm helper handle function call and we passed some parameters and we scroll down here and we can see here this is where we handle every single function name um that we have in our application and if we want to just take an example like we said I want to create a shop list called date night for example um what we're going to do is we're going to take that and then we're going to uh pull out the name of the list that we want to create and then we're going to call this create list function which is our mutation that our convex mutation which we defined up here and if we want to quickly take a look at what that function does that mutation does we can see here it just inserts into our convex database um into the shopping list table with the given name and obviously because of the way convex works any changes that we make to our database which is a reactive database will get synced down to all clients automatically for us one of the cool thing of shop talk I wanted to talk about is the transcripts feature this is actually a ability provided to us by daily Bots and allows us to Output the transcript of our speech to text directly onto the screen and this provides really great feedback to the user to make sure that the model heard what you actually said [Music] so is this the future of apps I'm not so sure well it's really cool and relatively easy to implement I would worry about two main barriers first the current cost economics even though daily Bots is a really cost- effective solution it is still quite expensive for example for 10 minutes of usage that would cost 20 cents and while that might not sound like a lot if your users are using your apps for hours upon hours a day this could get quite expensive very quickly secondly users have been trained to interact with apps using their mouse and keyboard so getting them to change that behavior to use their voice could be challenging having said that though I can imagine some of this technology being integrated into our general uh assistance like Siri Google Alexa uh I can imagine saying something like I want pizza tonight please add these ingredients to my shopping list and then having it update my list in the background without me requiring to open the apps I can imagine that being like truly a compelling uh use case so that's it for this video I hope you enjoyed watching let me know in the comments below if you think we'll be seeing more of this kind of voice control interface from apps in the future or not until next time bye
Does anyone else feel overwhelmed by the flood of new AI projects and tools these days? I know I do.
I've found the best way to avoid analysis paralysis is to just dive in, start tinkering with a new tool and build something with it.
With that in mind, I'd like to show you a little demo I put together that combines a cool new AI tool called Daily Bots with Convex. It's a collaborative shopping list / todos application that you can control entirely with your voice.
Check out the demo video below, and if you're interested, I'll walk you through how I built it.
how can I assist you with your shopping list today okay so this is shop talk a small application I built to experiment with voice control interfaces so it was built with convex and daily Bots and as you can see it's just a basic shopping list app um we have a number of lists here if I go into one of those um we can do all the usual s shopping list things like tick things off and remove them add new items and delete them but why do this manually in the age of AI okay so let's let's talk to the AI and see if we can uh do it with our voice hi can we remove markers from the list Markus is not found in the list markers markers have been removed from the list thank you can we add some A4 paper to the list four Pap uh the size A4 for paper has been updated to a for paper and because this is a convex application we can do all the usual convex things we would expect uh such as you know automatically syncing between um every connected client so if we open up this other uh version of the app that's running we can untick things and tick them off and edit them if we want and we can see that the data is automatically synced between these two tabs cool so let's do a slightly more complicated thing now let's turn back on hi go back to the lists you've been returned to the shopping list okay so I have my date night coming up can we create a list for that I have created the shopping list ate night that's great can we open that the date night shopping list is now open hm can you suggest a nice romantic meal that I can create for my partner consider preparing a creamy mushroom risoto paired with a side of garlic butter asparagus chocolate can you add the ingredients for that onto the list I have added the ingredients for the meal including red wine to your date night shopping list oh lovely actually it's my uh cheat year this year so can you add me some ingredients to make cookies as well I've added the ingredients for cookies to your date night shopping list that's fantastic thank you you're welcome enjoy your date night okay so that's it for the demo I hope you enjoyed it leave comments down below thanks
Before we begin I think it might be useful to try to give a high level view of what's going on here.
The Shop Talk app uses a Daily Bots client to enable voice controls through various services. This client triggers function handlers in our client-side app, which execute tasks on behalf of the LLM. These function handlers then communicate with Convex, which manages our state and ensures synchronization across all Shop Talk instances.
Don't worry if this isn't completely clear yet, I'll explain in more detail below.
Voice Control & Daily Bots
The headline feature of Shop Talk is the ability to totally interact with your voice.
There's a seemingly endless array of libraries and services that provide this functionality. Since it's honestly quite overwhelming, I went with one that a friend had success with: Daily Bots
Daily Bots (Pipecat under the hood) is an all-in-one platform for voice-based interactions. It combines three essential components of a voice-controlled AI assistant:
Speech To Text (STT) - converts your spoken words into text
LLM "brain" - processes the text, understands context, executes functions, and generates responses
Text to Speech (TTS) - transforms the text response back into spoken words
Daily Bots handles many behind-the-scenes tasks that you'd otherwise need to manage manually, including conversation history management, call recording, and performance metrics.
To clarify, DailyBots (Pipecat) doesn't actually implement the STT, TTS, or LLM services, it's the orchestration layer that brings them all together. You can configure your choice of services for each component, and their documentation explains this setup process well.
For this project, I used the default services recommended in their documentation:
If I were to turn this into a real product, I would definitely experiment with these services more to find the best combination, but for now these work well enough for this demo project.
Realtime Collaboration with Convex
Before I dive deeper into the specifics of how the Voice Control works I want to quickly talk about how Convex fits in here. For this demo I used the following Convex features:
I needed HTTP actions for this project because the DailyBots client works by calling a REST endpoint when it starts up, connecting the user and initiating a session with the STT and TTS services.
1<RTVIClientProvider2 client={3newRTVIClient({4 transport:newDailyTransport(),5 params:{6// this is the root url of our Convex http actions7// for example: https://adorable-grouse-876.convex.site8 baseUrl:import.meta.env.VITE_CONVEX_SITE_URL,9 endpoints:{10 connect:"/connect",11 action:"/actions",12},13},14})15}16>17<App/>18</RTVIClientProvider>19
This is the bread and butter of Convex. It provides fast, type-safe ways to Create, Read, Update, and Delete lists and list items.
I won't show all the code here, you can check it out at the links above, but here's a quick example of how we return all items on a given shopping list:
This ensures that the publicly shared lists and items don't get out of control over time, I know what you guys are like 😉
The dailyReset function is here if you want to take a look.
DailyBots Function Calling
Now that we have our server-side functionality sorted out, how does the AI actually interact with those queries and mutations?
It works through Function Calling. When you give the AI a command like "Create a new shopping list called 'date night'", it returns both a natural response (like "Sure") and some "hidden" text that tells DailyBots to execute a pre-defined function, in this case, create_shopping_list.
The AI (LLM) can do this because we provide the names and structure of these functions as part of its context when making calls.
For Shop Talk we have several functions:
1import{RTVIClientConfigOption}from"@pipecat-ai/client-js";23exportconst functionDefinitions ={4 create_shopping_list:{5 name:"create_shopping_list",6 description:"Create a new shopping list with a name and optional items",7 parameters:{8 type:"object",9 properties:{10 name:{11 type:"string",12 description:"The name of the shopping list",13},14},15 required:["name"],16},17},18 add_items:{19 name:"add_items",20 description:"Add multiple items to the current shopping list at once",21 parameters:{22 type:"object",2324// ... and the rest25
These are just the “definitions” for the functions, the code that actually implements what happens when the AI decides it wants to call once of these functions is actually handled on the client in FunctionCallHandler.tsx:
1// Code edited for brevity..23exportconstFunctionCallHandler:React.FC=()=>{45// Our Convex mutation6const createList =useMutation(api.shoppingLists.mutations.create);78React.useEffect(()=>{910// Dailybots callback to let us know the AI wants to call a function11 llmHelper.handleFunctionCall(async(fn:FunctionCallParams)=>{12const args = fn.argumentsasany;13const functionName = fn.functionNameasFunctionNames;1415// When the AI wants to call create_shopping_list16if(functionName ==="create_shopping_list"){17if(!args.name)returnreturnError("name is required");1819// We call our Convex mutation with the name of the list to create20const listId =awaitcreateList({ name: args.name});2122returnreturnSuccess(`Created shopping list "${args.name}"`,{23 listId,24});25}2627// ... handling the other functions28
Transcripts
DailyBots has a nice feature built into it which emits an event whenever the Bot or User speaks, this is provided as text in the form of a transcript. I (Composer) was able to leverage this to produce a nice little transcript on the sidebar.
The transcript feature provides valuable debugging capabilities for voice recognition by showing exactly what the AI heard and how it responded. It also gives users a handy written record of their shopping list interactions for future reference.
While not essential, I thought this feature was cool enough to keep.
Cost Control
After a day of development and a team show-and-tell, I checked the DailyBots usage report and was surprised by the costs:
So I dug into it a bit more and noticed that there were a bunch of long sessions that were causing the high minutes usage:
This isn't surprising, since opening the tab automatically starts a DailyBots connection. Even when users mute themselves, the connection remains open and counts as a continuous "Session."
At $0.02 per minute, a 10-minute session costs 20 cents, not devastating given the functionality, but it's quite expensive from a business perspective. You'd need to be careful about pricing your application and metering usage.
To help control these potential costs, I've reduced the session length from 10 minutes to 3 minutes, with a popup that allows users to reconnect after disconnection.
It would have been nice if Daily Bots offered an option to automatically detect inactivity and disconnect or pause metering after a period of time. The cynical side of me suspects they aren't financially motivated to implement this feature however.
I will continue to monitor the costs for this and if they start to get out of control I may have to put in some rate limiting 😅
Tangent: Cursor Composer
Apologies, I’m going to go on a quick tangent here to note that during the development of this demo I made copious use of Cursors new Composer. Its the first project I have used this seriously in and I have got to say, I am seriously impressed.
It really sped up the development of the project. Now that it has built in linting after codegen it is much more reliable than it has been in the past.
I think I need to dedicate an entire post to talking about just Composer and Convex but for now here are some quick thoughts:
Great for really labor intensive tasks like “This React component is getting a little large, lets split it up into separate components in separate files”, this saved so much refactor time for me.
I added my own rules both to the project and personal rules to Cursor itself, this is still an ongoing experiment for me. I’m not sure how much it pays attention to these as it seems to totally ignore some clear instructions sometimes.
Cursor does an “okay” job with Convex code. It knows about how to do some of it but not others. I think there is work to be done here by me to explore the best ways to give cursor some more context on how best use Convex
After the initial frantic prototyping phase I did using v0 and composer I did have to go through and take some time to understand and untangle a bunch of things by hand. I did this as it is important for me that the code be human-readable and easily digestible for demo purposes.
I am not great with Tailwind so its great that Composer does more than a good enough job iterating on the magical syntax. I do wonder how it might fare in a much larger and strictly controlled project however.
Anyways enough of that tangent, lets back to Shop Talk shall we?
Is this the future?
So is real-time voice interaction going to be embedded into every app? I'm not so sure.
While it's really cool and relatively easy to implement, I worry about two main barriers. First, the current cost economics might not make sense for most web or mobile apps. Second, users have been trained to interact with apps using mouse, keyboard, or touch, getting them to switch to voice could be challenging.
Having said that, I can see these capabilities being integrated into our general assistants like Siri, Google, or Alexa. Imagine saying "I want pizza tonight, please add the ingredients to my shopping list" and having it quietly update your list in the background without requiring you to open any apps. That would be truly compelling. This makes me once again question what the future of apps looks like 🤔.
Thought on Daily Bots
Overall, I'm impressed with Daily Bots' functionality. Having developed voice-controlled apps in the past, I appreciate the effort they've put into simplifying these complex features.
However, I did encounter several issues and rough edges during development, which I'll outline below:
The documentation needs improvement, with many broken or outdated examples and gaps in crucial areas.
It's particularly confusing that they recommend including both @pipecat-ai/client-react and realtime-ai packages. These export similar types with identical names, making VSCode auto-import and AI code generation unnecessarily difficult.
Their API design raises some questions—especially the LLMHelper, which seems overly complex. A single client object would be more straightforward than multiple helper classes.
I encountered several issues that might be bugs or questionable design choices. For instance, when updating LLM context locally, the assistant would restart despite explicitly setting the flag to prevent interruption.
Finally
Cheers for checking this out, I hope you found it interesting! After diving in and getting hands-on experience with these AI tools, I feel much less anxious about the state of AI. If you're feeling uncertain too, I highly encourage you to jump in and try these tools and services yourself. There's nothing like direct experience to help you form your own opinions!
Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.