Stack logo
Sync up on the latest from Convex.
Profile image
Michal Srb
a year ago

Build AI Chat with LangChain and Convex

AI chat bot in a docs site

AI Chat using LangChain and Convex

Langchain is a very popular library for building AI applications in Python or JavaScript/TypeScript. In this second post in our series, we’ll build an AI-powered chat interface using LangChain and its new Convex integration.

All code can be found on GitHub.

get-convex/convex-ai-chat-langchain

AI Chat implemented with LangChain and Convex.

This post is part of a series where we compare three different implementations of the same AI chatbot and summarize the differences:

  1. Using OpenAI’s Assistants API
  2. Using Langchain with our own storage (this post)
  3. Custom message store and vector search retrieval
  4. Are vector databases dead?

Overview of the user experience

We’ll build an AI chatbot for the Convex docs site. You can go to the docs now and interact with the bot there:

docs screenshot with ai chat botdocs screenshot with ai chat bot

We’re not just exposing the standard LLM model (such as GPT4), but we’re also retrieving the relevant documents for the question and passing them to the LLM as context (RAG).

Overview of RAG chat

Here’s a graphical overview of the steps involved. LangChain will lead to a very short implementation, as almost the entirety of the process will be handled by the library.

rag chat process overview with langchainrag chat process overview with langchain

We need to load our data, optionally split it, and then “embed” it (run it through a model that encodes the data into an array of numbers). This step needs to happen at least once, but we probably want to run it periodically as the source data (in our example the Convex docs pages and other information about Convex) changes.

After we ingest data, we can serve traffic, in our case reply to user questions and follow-ups. Each question can be embedded (using the same model that we used during the ingest step), and then the embedding is used to look up the most relevant contextual data. We then pass on this data, the question, and any chat history to the LLM as a prompt, and return back to the user the LLM’s answer.

The LangChain implementation we’ll be using performs a rephrasing step before searching data:

serve step summarization diagramserve step summarization diagram

This is helpful for getting more useful documents retrieved for follow-up questions, but it does mean that the LLM is not seeing the conversation as a conversation, but instead is only seeing a single prompt that hopefully summarizes the previous discussion. It is possible to configure LangChain to not do this, but we’ll stick to the defaults in this post.

Our Server Setup & Schema

We’ll be using the following schema, based on the LangChain Convex integrations docs for cache, for documents, and for messages (source):

export default defineSchema({
  // Simple cache to avoid recomputing embeddings
  cache: defineTable({
    // content
    key: v.string(),
    // embedding
    value: v.any(),
  }).index("byKey", ["key"]),
  // one row for each chunk of a document
  documents: defineTable({
    embedding: v.array(v.number()),
    text: v.string(),
    metadata: v.any(),
  }).vectorIndex("byEmbedding", {
    vectorField: "embedding",
    dimensions: 1536,
  }),
  messages: defineTable({
    // Which conversation this message belongs to
    sessionId: v.string(),
    message: v.object({
      // The message author, either AI or human
      type: v.string(),
      data: v.object({
        // The text of the message
        content: v.string(),
        role: v.optional(v.string()),
        name: v.optional(v.string()),
        additional_kwargs: v.optional(v.any()),
      }),
    }),
  }).index("bySessionId", ["sessionId"]),
});

We’ll also add langchain/db.ts file with the following export:

export * from "langchain/util/convex";

This declares internal functions used by LangChain to interact with our tables.

Since we’ll be using the OpenAI SDK, we’ll set our OPENAI_API_KEY in the environment variable settings on the Convex dashboard.

Ingest data: Loading & splitting & embedding data

We’ll lean on LangChain to help us scrape our docs, split them, and embed them for semantic search. We won’t have much control over the exact formatting of the documents using LangChain, but can implement manual parsing which we’ll cover in the third post in this series.

Let’s look at the function responsible for ingesting a single page (source):

export const fetchAndEmbedSingle = internalAction({
  args: {
    url: v.string(),
  },
  handler: async (ctx, { url }) => {
    const loader = new CheerioWebBaseLoader(url);
    const data = await loader.load();
    const textSplitter = new RecursiveCharacterTextSplitter({
      chunkSize: 1000,
      chunkOverlap: 200,
    });

    const splitDocs = await textSplitter.splitDocuments(data);

    const embeddings = new CacheBackedEmbeddings({
      underlyingEmbeddings: new OpenAIEmbeddings(),
      documentEmbeddingStore: new ConvexKVStore({ ctx }),
    });

    await ConvexVectorStore.fromDocuments(splitDocs, embeddings, { ctx });
  },
});

The CheerioWebBaseLoader parses the HTML and returns just the text on the page with no formatting.

Then the RecursiveCharacterTextSplitter splits the documents into 1000-character chunks, with 200-character overlap. The size of the chunk depends on the context limits of the model we’re intending to use, we want to make sure that each chunk can fully fit in the context window. If we make chunks smaller we’ll make it more likely that multiple documents will be used as context, which can be better or worse, it really depends on our data and the kinds of questions the system is intended to answer.

Finally, we use ConvexVectorStore to run each chunk through OpenAI’s embedding API, caching them using ConvexKVStore in the “cache” table, and storing the chunks with the embeddings in the “documents” table.

Serving traffic: answering a question

Just like with ingesting data, LangChain really streamlines the process of answering a message (source):

export const answer = internalAction({
  args: {
    sessionId: v.string(),
    message: v.string(),
  },
  handler: async (ctx, { sessionId, message }) => {
    const vectorStore = new ConvexVectorStore(new OpenAIEmbeddings(), { ctx });

    const model = new ChatOpenAI({ modelName: "gpt-4-32k" });
    const memory = new BufferMemory({
      chatHistory: new ConvexChatMessageHistory({ sessionId, ctx }),
      memoryKey: "chat_history",
      outputKey: "text",
      returnMessages: true,
    });
    const chain = ConversationalRetrievalQAChain.fromLLM(
      model,
      vectorStore.asRetriever(),
      { memory }
    );

    await chain.call({ question: message });
  },
});

First, we set up the ConvexVectorStore using the same OpenAI embeddings API.

Next, we initialize ConvexChatMessageHistory which will both read and write to our "messages" table, given the current sessionId.

And finally, we run the ConversationalRetrievalQAChain, which will write both the user message and the LLM answer to the “messages” table.

Serving traffic: listing messages

The only other function we need is a query for listing current messages given a sessionId (source):

export const list = query({
  args: {
    sessionId: v.string(),
  },
  handler: async (ctx, args) => {
    return (
      await ctx.db
        .query("messages")
        .withIndex("bySessionId", (q) => q.eq("sessionId", args.sessionId))
        .collect()
    ).map(({ message: { data, type }, ...fields }) => ({
      ...fields,
      isViewer: type === "human",
      text: data.content,
    }));
  },
});

We’re doing a little transformation on each message to return it in the same format as in our first post.

Debugging tip ☝️

If you’re trying to debug a chain, enable the verbose mode like this:

    const chain = ConversationalRetrievalQAChain.fromLLM(
      model,
      vectorStore.asRetriever(),
      { memory, verbose: true }
    );

Then check out the logs page on your Convex dashboard to see all the inputs and outputs for each step in the chain.

Frontend Implementation

You can find the front-end implementation of our chat box here. It is only about 250 lines of React and Tailwind CSS, but it does pack a lot of functionality:

  1. Realtime reactive updates with the LLM replies.
  2. Preserving the session ID in sessionStorage (so it survives page reloads).
  3. Scrolling to the bottom of the thread.
    1. Stop scrolling if the user scrolls up.
  4. Opening the dialog in a React portal.
  5. Disabling the send button when the input is empty.
  6. Two different UIs: Small in the corner of the page and expanded.
  7. Info icon with a tooltip for additional education/disclaimer.
  8. Loading indicators.
  9. Dark mode.

If you check out the repo you can see that Convex is only used for the chat modal. This is a good example if you want to drop the chat component into an existing website.

Conclusion

LangChain packs a lot of power into a simple library interface, and with very little code gets us a fairly sophisticated implementation. On the flip side, we do lose some type-safety, and figuring out what exactly the library does under the hood, and how to configure it, can be tricky. Continue on to our third post if you’re interested in having total control over how your AI chat app is implemented. Continue to the final post in this series to read more about the tradeoffs between different AI product implementations.

Build in minutes, scale forever.

Convex is the sync platform with everything you need to build your full-stack project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.

Get started