Are Vector Databases Dead?
This year vector databases have sprung up like mushrooms to enable applications to retrieve context based on semantic search. A large portion of these applications have used the retrieved context to augment the ability of large language models (LLMs) in a pattern known as RAG. On November 7th OpenAI released its Assistants API, enabling the implementation of AI chat interfaces with context retrieval without needing a separate message store or vector database. Does this new API make vector databases obsolete?
Yes! And no.
This post concludes a series comparing three different implementations of AI chat backends to help answer this question. Read on for the final verdict.
In the current age of ChatGPT, GPTs, Bards, Bings, and Claudes it might not be obvious why anyone should be building another AI-powered chat, but there are good reasons:
- Proximity: People’s behavior is driven by convenience, and talking to an AI right in your product (website, app) might be more convenient than opening a separate chat product.
- Control: Big LLMs have broad knowledge so they need specific prompting to get an answer about a specific topic. Having to provide this prompt again and again can be onerous for end-users. Your product can set up or augment the prompt automatically.
- Context: Although LLMs retain a lot of knowledge from training, they might lack specific knowledge, either because it didn’t exist, wasn’t public, or wasn’t important enough at training time. Providing specific context as a part of the prompt leads to better accuracy and less hallucination.
Combining all three together can lead to integrations not possible in the frame of existing general chat interfaces. For example, we could provide the LLM with user-specific information, give it precise instructions, and have it take actions directly in our product. Think: going to Amazon and asking to reorder and double the order of an item you bought last week, and having your shopping cart update to reflect this conversation. We can render product information inline and have it be interactive without the user having to navigate between different websites or apps.
This post discusses the tradeoffs and details of implementing this kind of AI system. A basic example of a custom AI-powered chat interface is the AI chatbot on the Convex docs site. You can play with it by clicking the chat bubble button at the top of the page.
To help answer the titular question, we implemented this AI chat backend in three ways. A guided walkthrough and the source code for each implementation is linked here:
- Using the new OpenAI Assistants API
- Using Langchain with the Convex message and vector DB
- Entirely on Convex
This example demonstrates the three reasons for building a custom AI chat:
- The interface is embedded in the docs site where developers look for technical information about Convex.
- The LLM is prompted to answer questions about Convex.
- The implementation passes the LLM the most relevant documentation pertaining to the question.
Even in this simple use case, the context is not static. The Convex team iterates on the docs and the product constantly, and the AI chat needs to reflect these changes. So when designing the implementations we considered how this contextual information will be updated over time.
The new Assistants API tries to provide a cohesive package, and this does have its strengths for us as the developers of the product:
- We don’t have to worry about the context data splitting (chunking) and search implementations.
- OpenAI’s implementation might improve over time without us having to invest in any effort.
- Other tools besides retrieval, currently function calling and code interpreter, are built-in, and OpenAI might add more tools in the future.
Yet these come with tradeoffs, broadly around how much we control and how much we still have to do:
- The data splitting and search are a black box. It’s common (and we did so in our implementation) to tune the chunk size to get the best results from semantic search. But we can’t control this with the Assistants API, yet only we understand and can evaluate the quality of the context retrieval.
- The flip side to the API improving over time is that it might produce unexpected results in the future.
- We might want to build more complicated chains of context retrieval and LLM prompting, see RAG with guardrails. But since we don’t control the implementation this is not possible.
- We still need a server to call the API from, to simplify the uploading and updating of contextual data, and to execute function calls from the LLM.
- It’s likely that we will still want to store message content for analytics, tight integration with our products, and abuse prevention. This leads to the need for syncing the messages to and from OpenAI, which can be more complex than having our server be the source of truth and only interact with OpenAI in a request/response fashion.
- Vendor lock-in: the implementations compared here all use OpenAI, as it performs well on both pricing and performance, especially considering the quality of results from GPT4. But adopting the Assistants API would make it much harder to move to a different LLM provider.
Additionally, the API has some rough edges at the moment. See the implementation post for more details.
Overall we’d suggest waiting before the API moves out of beta before adopting it. Even then, it should be carefully considered what kind of final experience one is trying to achieve, as more custom experiences will likely benefit from a custom implementation.
LangChain is an extremely popular and yet somewhat controversial library. It combines two seemingly opposing goals:
- Make AI development more approachable.
- Wrap, encapsulate, and expose sophisticated AI use cases.
In the second implementation, these two goals manifested themselves:
- The implementation is the shortest, as LangChain exposes very high level APIs tailored for exactly our use case. We didn’t even need to write a single system prompt.
- Under the hood the “chain” actually performs summarization before performing the context search and prompting to the LLM.
- Because LangChain abstracts the implementation, it would be trivial to swap in a different embedding or LLM model provider, or different vector database.
- It provides extensive “automatic” debug logging.
Achieving these two goals though does have its downsides:
- LangChain defaults can be more complex than what the implementation really needs, and perform worse than simpler alternatives. This is especially true as LangChain tries to be agnostic to which LLM it interfaces with. Different LLMs will perform very differently and therefore might require different approaches to prompting and summarization.
- Once a developer wants to veer off of the default path, they have to understand the ways in which LangChain can be configured. This is tricky, and due to the high amount of abstraction, reading LangChain’s source code to figure out what it does under the hood can be difficult.
- The TypeScript version of LangChain is not perfectly typed, in part because the return values depend on configs nested deep in the chain.
Overall LangChain is great for prototyping and getting a system working quickly, but for a more sophisticated integration it’s a good idea to refactor into a fully custom implementation.
Since Convex includes vector search the entire system can be implemented in it, only using external embedding and an LLM provider. In fact the LangChain implementation already used Convex under the hood for vector search, message storage and embedding caching.
Implementing the system without using a library like LangChain gives us the most control about exactly how the system works:
- How and when data is split and how it is cached.
- When and how the data is embedded.
- How much context is used.
- In what format the data is passed to the LLM.
- What are the prompts involved.
- How are the responses processed and streamed back to our interface.
All of this is in plain sight in our code.
We have to admit that this comparison would be different if we weren’t using Convex, and instead had a more traditional server talk to an external relational/document databases and a vector database. With Convex, the custom implementation isn’t that much more complicated than the alternatives. And hence it’s easier to argue for the added up-front cost of developing a custom system to reap the benefits of total control. In turn, we get end-to-end type safety and full transparency.
To answer the original question, does the new Assistants API make vector databases obsolete? We can argue, yes: If a use case aligns closely with what the API provides, and the alternative is shuffling data to an external vector database, then one might as well shuffle the data to the Assistants API instead.
But we would argue that integrated AI experiences should offer more than standalone ChatGPT to justify their existence, and to build such experiences it's best to retain control over the implementation by leveraging a database that can both store the product data and allow semantically searching it. This contextual data can then be used get the most out of current LLMs.
Convex is the backend application platform with everything you need to build your project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.