Bright ideas and techniques for building with Convex.
Profile image
Michal Srb
5 months ago

Build AI Chat with OpenAI's Assistants API

AI chat bot in a docs site

AI Chat using OpenAI Assistants API

On November 7th OpenAI released its Assistants API, enabling chat bot with context retrieval implementations without needing a messages or vector database. In this post, we’ll cover how to leverage this API to build a fully functioning AI chat interface.

The implementation here can be a basis for a more complex and tailored experience, which ties app-specific and user-specific information together in a single interface, surpassing the capabilities of the standalone ChatGPT.

This post is part of a series where we compare three different implementations of the same AI chatbot and summarize the differences:

  1. Using OpenAI’s Assistants API (this post)
  2. Using Langchain with our own storage
  3. Custom message store and vector search retrieval
  4. Are vector databases dead?

All code can be found on GitHub.

get-convex/convex-ai-chat-openai

AI Chat implemented with the OpenAI Assistants API.

Overview of the user experience

To compare the three approaches we’ll build an AI chatbot for the Convex docs site. You can go to the docs now and interact with the bot there:

docs screenshot with ai chat bot

We’re not just exposing the standard LLM model (such as GPT4), but we’re also retrieving the relevant documents for the question and passing them to the LLM as context (RAG).

Overview of RAG chat

Here’s a graphical overview of the steps involved. Thanks to the new Assistants API, OpenAI will take care of most of this process.

process overview for RAG chat

We need to load our data, optionally split it, and then “embed” it (run it through a model that encodes the data into an array of numbers). This step needs to happen at least once, but we probably want to run it periodically as the source data (in our example the Convex documentation and other information about Convex) changes.

After we ingest data, we can serve traffic. In our case, we reply to user questions about Convex. Each question can be embedded (using the same model that we used during the ingest step), and then the embedding is used to look up the most relevant contextual data. We then pass on this contextual data, the question, and any chat history to the LLM as a prompt, and return the LLM’s answer back to the user.

As we’ll see in the second post, a rephrasing step can be added to enable searching data based on the whole chat history, not just the last user message. The Assistants API might or might not be using this technique, as the search is completely abstracted away from us.

Our Server Setup & Schema

While the Assistants API is powerful, we still want our own server for a few reasons:

  1. We don’t want to expose our OpenAI API key. The OpenAI requests need to be authenticated, but if we send them from the browser, any user could see our key and use it themself.
  2. We’ll use the server to pre-process our documentation data before uploading it to OpenAI.
  3. Future: Having a server can enable the Assistants API to call functions that we can expose to take some action on behalf of the user.
  4. Future: If we want to build a more sophisticated experience, we can directly access our product database during the chat.

We’ll define the following three tables in Convex(source):

export default defineSchema({
  documents: defineTable({
    // The original page URL for the document
    url: v.string(),
    // The parsed document content
    text: v.string(),
    // The ID returned after uploading to OpenAI
    fileId: v.union(v.string(), v.null()),
  }).index("byUrl", ["url"]),
  messages: defineTable({
    // Whether the message is from the AI or the human
    isViewer: v.boolean(),
    // Which conversation this message belongs to
    sessionId: v.string(),
    // Message content
    text: v.string(),
  }).index("bySessionId", ["sessionId"]),
  threads: defineTable({
    // Client-generated conversation identifier
    sessionId: v.string(),
    // Conversation identifier used by the OpenAI server
    threadId: v.string(),
  }).index("bySessionId", ["sessionId"]),
});

Since we’ll be using the OpenAI SDK, we’ll set our OPENAI_API_KEY in the environment variable settings on the Convex dashboard.

Ingest data: Creating an assistant

We’ll start by creating an OpenAI assistant. You can do this on the OpenAI dashboard, or using the API (source):

export const createAssistant = internalAction({
  args: {},
  handler: async () => {
    const openai = new OpenAI();
    const assistant = await openai.beta.assistants.create({
      instructions:
        "Answer the user questions based on the provided documents " +
        "or report that the question cannot be answered based on " +
        "these documents. Keep the answer informative but brief, " +
        "do not enumerate all possibilities.",
      model: "gpt-4-1106-preview",
      tools: [{ type: "retrieval" }],
    });
    return assistant.id;
  },
});

You can run this action from the Convex dashboard and then save the returned ID as ASSISTANT_ID in your Convex backend’s environment variables.

Ingest data: Loading data

We’ll cover the detailed description of scraping in the third post in the series, but here we’ll discuss how to upload data to our OpenAI assistant.

While scraping the docs, we stored the data in the "documents" table. We also have a field on each document that stores the fileId returned by OpenAI after we upload it. Here’s the function responsible for uploading our files to OpenAI and attaching them to the assistant (source):

export const uploadDocuments = internalAction({
  args: {
    documentIds: v.array(v.id("documents")),
  },
  handler: async (ctx, { documentIds }) => {
    const openai = new OpenAI();
    await map(documentIds, async (documentId) => {
      const document = await ctx.runQuery(internal.init.getDocument, {
        documentId,
      });
      if (document === null || document.fileId !== null) {
        return;
      }
      const { text, url } = document;
      const blob = new File([text], fileName(url));

      const { id: fileId } = await openai.files.create({
        file: blob,
        purpose: "assistants",
      });
      await openai.beta.assistants.files.create(process.env.ASSISTANT_ID!, {
        file_id: fileId,
      });
      await ctx.runMutation(internal.init.saveFileId, { documentId, fileId });
    });
  },
});

This action uses the openai.files.create API to upload each document as a Blob, and then the openai.beta.assistants.files.create to attach the file to our assistant. You could also manually upload the files on the OpenAI dashboard.

When ready, we can upload and attach all documents from our table in bulk (source).

Ingest data: Splitting & embedding data

Nothing to do here, as OpenAI takes care of this step for us after we upload and attach the files automatically! 👏

Serving traffic

Here’s a sequence diagram for the traffic-serving portion of our app:

overview of answer implementation

Serving traffic: answering a question

When a user hits send in our little chat box, we’ll kick off generating the answer (source):

export const send = mutation({
  args: {
    message: v.string(),
    sessionId: v.string(),
  },
  handler: async (ctx, { message, sessionId }) => {
    await ctx.db.insert("messages", {
      isViewer: true,
      text: message,
      sessionId,
    });
    await ctx.scheduler.runAfter(0, internal.serve.answer, {
      sessionId,
      message,
    });
  },
});

First, we save the message to the "messages" table, which will update our UI (if you’re not familiar with Convex and wonder how this works, head over to the Convex tutorial).

The scheduler.runAfter(0, …) call is Convex parlance for “run an async job immediately”.

Let’s look at the implementation of the answer action (source):

export const answer = internalAction({
  args: {
    sessionId: v.string(),
    message: v.string(),
  },
  handler: async (ctx, { sessionId, message }) => {
    const openai = new OpenAI();

    const threadId = await getOrCreateThread(ctx, openai, sessionId);

    const { id: lastMessageId } = await openai.beta.threads.messages.create(
      threadId,
      { role: "user", content: message }
    );

    const { id: runId } = await openai.beta.threads.runs.create(threadId, {
      assistant_id: process.env.ASSISTANT_ID!,
    });

    await pollForAnswer(ctx, { threadId, sessionId, lastMessageId, runId });
  },
});

The steps we follow are:

  1. Get or create an OpenAI thread for the current session
  2. Add the user message to the OpenAI thread
  3. Create an assistant run
  4. Start polling for the answer

Polling is needed because as of the time of this writing, the Assistants API doesn’t support streaming or a single-shot async request.

Here’s the function performing the polling every 500ms1 (source):

async function pollForAnswer(
  ctx: ActionCtx,
  args: {
    sessionId: string;
    threadId: string;
    runId: string;
    lastMessageId: string;
  }
) {
  const { sessionId, threadId, runId, lastMessageId } = args;
  const openai = new OpenAI();
  while (true) {
    await sleep(500);
    const run = await openai.beta.threads.runs.retrieve(threadId, runId);
    switch (run.status) {
      case "failed":
      case "expired":
      case "cancelled":
        await ctx.runMutation(internal.serve.addMessage, {
          text: "I cannot reply at this time. Reach out to the team on Discord",
          sessionId,
        });
        return;
      case "completed": {
        const { data: newMessages } = await openai.beta.threads.messages.list(
          threadId,
          { after: lastMessageId, order: "asc" }
        );
        await map(newMessages, async ({ content }) => {
          const text = content
            .filter((item): item is MessageContentText => item.type === "text")
            .map(({ text }) => text.value)
            .join("\n\n");
          await ctx.runMutation(internal.serve.addMessage, { text, sessionId });
        });
        return;
      }
    }
  }
}

When the run is "completed" we want to store the new message in our "messages" table, which will update the UI. We do this using the { after: lastMessageId, order: "asc" } arguments to the openai.beta.threads.messages.list API to get only the new messages, and then we save them to our database in the addMessage mutation (source).

Serving traffic: listing messages

There’s only one more endpoint left to cover, and that’s the query that returns all messages for a given session (source):

export const list = query({
  args: {
    sessionId: v.string(),
  },
  handler: async (ctx, args) => {
    return await ctx.db
      .query("messages")
      .withIndex("bySessionId", (q) => q.eq("sessionId", args.sessionId))
      .collect();
  },
});

The client subscribes to this query and will get all messages for the session, automatically updating when new messages are added. Learn more about Convex reactivity here.

And that’s it! The backend for our RAG chat is complete.

Frontend Implementation

You can find the front-end implementation of our chat box here. It is only about 250 lines of React and Tailwind CSS, but it does pack a lot of functionality:

  1. Realtime reactive updates with the LLM replies.
  2. Preserving the session ID in sessionStorage (so it survives page reloads).
  3. Scrolling to the bottom of the thread.
    1. Stop scrolling if the user scrolls up.
  4. Opening the dialog in a React portal.
  5. Disabling the send button when the input is empty.
  6. Two different UIs: Small in the corner of the page and expanded.
  7. Info icon with a tooltip for additional education/disclaimer.
  8. Loading indicators
  9. Dark mode

If you check out the repo you can see that Convex is only used for the chat modal. This is a good example if you want to drop the chat component into an existing website.

Conclusion

OpenAI’s assistants API makes it quite easy to set up an AI chatbot with context retrieval. The API is in beta and some of its edges are a bit rough at the moment, but these issues will likely be quickly addressed. The Assistants API has additional capabilities we didn’t discuss, mainly the ability for the assistant to run custom functions. This might enable tighter integration with our product database. Alternatively, we can implement context retrieval directly on our server. We cover the first such implementation, using LangChain, in our next post. To see how all of the implementations compare, see this post.

Footnotes

  1. You can also use the scheduler to do this, by re-scheduling the function every 500ms with scheduler.runAfter(500, . The while loop approach here is just for simplicity.

Build in minutes, scale forever.

Convex is the backend application platform with everything you need to build your project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.

Get started