4 months ago

Lessons from Building an AI App Builder on Convex

Introduction

Over the past few months, we have built and grown Chef into the only AI app builder that knows backend. But, this process didn’t start with Chef. It started with building the Convex, the database that Chef is built on top of.

At Convex, we made a bold bet that humans perform better with good abstractions. Turns out that AI benefits from the same abstractions that humans do, which has made the Chef platform so powerful.

In this article, we’ll go over the architecture of Chef. Then, we’ll dive into why this architecture works well. Once we understand the architecture and philosophy behind Chef, we can finish with some principles that are useful for building AI coding agents.

Architecture Overview

Let’s get started by going through what its like building a Notion clone in Chef. We start by providing a prompt with the MVP and let it cook.

Chef prompt What’s going on under the hood is that we provision a Convex instance with a template and run a configuration script. This template uses Vite + React and already has Convex auth integrated. Once this template is created, we send up the user’s prompt to the server along with our system prompt.

Chef cooking

Our system prompt for Chef is an adaptation of our Convex rules. Specifically, we customized these rules to work well with our template and added information about the tools the LLM has access to. Lastly, we added examples that show how the LLM should go about addressing a user’s prompt. This involves using write, edit, and view tools to evaluate and write code and then typechecking it with npx convex dev and fixing any errors.

Chef agent loop

This agent loop is what makes Convex special. Since Convex is typesafe from the backend to frontend, we are able to have a good heuristic for correctness. In addition, when the LLM is wrong, Convex provides good error messages, which the LLM uses to fix issues. This makes Chef able to navigate more complex problems than other app builders.

Data Storage

Not only do all the apps built on Chef use Convex for the backend, but Chef itself also uses Convex for the backend. We use the database to store metadata like emails, team ids, and chat names and use Convex storage to store filesystem snapshots and chat history.

Why does Chef work well?

Now that we understand the architecture of Chef, let’s dive into why these pieces come together to make a great product.

Opinionated Template

Chef starts with an opinionated template that allows us to to pick the “right” tools for the job. We decided to use Vite + React on the frontend and Convex for the backend. We used these frameworks because they are simple and well-understood. Other platforms allow more configuration, but were not optimal because they are likely to confuse the LLM.

Decisions like this allowed us constrain the LLMs behavior in areas that could cause bad outcomes and focus its “creativeness” in the domain we care about: the appearance and functionality of the application.

Programmatic prevention of bad states

Even with great prompting, LLMs still do the wrong thing. This is especially problematic when these wrong things can cause irrecoverable states. We experienced this in Chef when we first integrated Convex auth. The LLM would often try to edit the authentication related files, even when prompted not to, and would break the app.

To mitigate this, we programmatically prevented the LLM from writing to specific files. This meant that even if the LLM tried to edit a file, we would not persist the changes. This eliminated a whole class of problems for us. Having safeguards like this prevents the catastrophic states, and sets LLMs up better for success.

Queries as code

The Convex platform allows you to define your entire backend in code. This makes it the perfect match for LLMs because they are really good at writing code. Additionally, Convex provides end-to-end type safety, which allows us to check the LLMs output and allow it to fix its output when its incorrect. This provides a big benefit over SQL based databases because it is a lot harder to check for the correctness of SQL code and make sure that types are correct until you actually run the code.

Components

Before Chef, we built a framework, called components, that allows you to create modularized pieces of Convex code that have their own tables. They were intended to simplify complex problems like a sharded counter or workpool into a simple abstraction for developers.

Components are also extremely beneficial for LLMs because it enables them to create robust code that leverages existing solutions. This allows LLMs to operate at a higher level and focus on more complex tasks. This framework has helped us add superpowers to Chef because we are able to add things like a collaborative text editor with only a few lines of code.

Learnings

In putting these pieces together, we have discovered some important principles for building agentic AI applications. These principles have helped guide the technical direction for Chef, and can help you build AI applications too!

Good abstractions

Great abstractions fundamentally allow you to do more with less. In the case of agents, this means providing LLMs with not just fewer decisions to make, but the right decisions to make. You want to make sure the tools you give agents are expressive enough to complete the task at hand, but not too complicated that they get confused.

This principle can get applied to anything from prompt tuning and choosing defaults to which tools LLMs have access to.

Limit “wrong” decisions by LLMs

LLMs are prone to making the wrong decision, even when prompted not to. Thus, it is important that you constrain the solution space for the problem you are trying to solve. This means limiting the amount of ways an LLM can get itself into an irrecoverable state. At a high level, you must make the “hard” decisions for the LLM and allow it to be creative in the ways that are conducive to the problem you are solving.

For example, in Chef, we “solved” the problems of what frameworks, template, and authentication to use, while allowed it to be creative in how to build the UI and application logic.

Provide great examples

Providing simple and correct examples to LLMs significantly improve your outputs. This is especially important when the model needs to call many different types of tools because they need to be able to pattern match.

The flip side of this is that models will pick up on your bad examples too. It is important that you audit your examples and make sure that they are consistent and demonstrate the exact behavior you want to see.

Evals are the secret sauce

Evals are a quantitative way to evaluate different LLMs against your use case. They consist of data, tasks, and and scorers. Data is a set of examples to test your AI agent on, tasks are the function that you want the LLM to perform, and scorers are what you use to evaluate the output of a task.

Evals are important because they force you to define what good looks like for your use case. It is easy to ship LLM apps off of “vibes”, but evals force you to really think about the outcomes that you care about. Evals also allow you to iterate on prompts more quickly and confidently. Lastly, they allow you to easily evaluate models for your use-case, which is very valuable as new models are constantly being released.

We learned a ton building Chef, and hope some of our lessons can be useful you!

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started