Bright ideas and techniques for building with Convex.
Profile image
Ian Macartney
3 months ago

Database Relationship Helpers

Code for joining queries with helpers for one-to-many, many-to-many, and more.

In the Relationship Structures post, we looked at how to structure one-to-one, one-to-many and many-to-many relationships using a relational database, and what those queries look like in Convex.

In a SQL-based database, you might be used to the JOIN operator, which connects fields from multiple tables into a single flat result. With Convex, we chose to instead expose predictable primitives that you can compose to fetch data, without the sometimes-unpredictable black-box of a query planner. To read more about our thoughts on SQL, read this post. To read more about our indexes, read the docs.

In this post, we’ll look at some helper functions to help write code to traverse relationships in a readable, predictable, and debuggable way. The code is in the convex-helpers npm package if you want to use it in your project. You can see the source (including fancy typescript types) here. By the end, we’ll be able to compose functions to execute a complex query involving the SQL equivalent of selecting, joining, grouping, sorting, and fetching distinct documents.

The examples will reference this schema:

defineSchema({
  users: defineTable({
    name: v.string(),
  }),
  authorProfiles: defineTable({
    userId: v.id('users'), // one to one
    bio: v.string(),
  }).index('userId', ['userId']),
  posts: defineTable({
    title: v.string(),
    authorId: v.id('authorProfiles'), // one to many
    content: v.string(),
  }).index('by_authorId', ['authorId']), // by_ prefix works too
  comments: defineTable({
    postId: v.id('posts'), // one to many
    userId: v.id('users'), // one to many
    text: v.string(),
  }).index('postId', ['postId']),
  postCategories: defineTable({ // many to many relationship table
    postId: v.id('posts'),
    categoryId: v.id('categories'),
  }).index('postId', ['postId']),
  categories: defineTable({ ... }),
});

To use convex-helpers, first npm i convex-helpers then you can import them:

import {
  getAll,
  getOneFrom,
  getManyFrom,
  getManyVia,
} from "convex-helpers/server/relationships";

One-to-one

When each document only has one or zero related documents.

Direct reference: db.get

An arrow points from the left circle to the right circle

If you have an id of a document, you can directly access it with db.get. This is the simplest lookup.

const user = await db.get(author.userId);

Back-reference: the getOneFrom helper

An arrow points from the right circle to the left circle

To fetch a document that has a reference to the document on hand, we use an index on the other table's reference. For example we can look up an author profile from a user by querying on the index for userId:

const author = await db
  .query("authorProfiles")
  .withIndex("userId", q => q.eq("userId", user._id))
  .unique();

Using the helper from convex-helpers, you can write:

const author = await getOneFrom(db, "authorProfiles", "userId", user._id);

Note: As is, it will return null if there is no author profile for that user. If you want to throw an exception instead, use getOneFromOrThrow.

One-to-many

When each document has potentially many related documents.

Direct references: the getAll helper

A circle on the left points to 3 circles on the right

To look up a list of IDs all at once, we fetch all of the documents in parallel:

const userPromises = [];
for (const userId of userIds) {
  userPromises.push(db.get(userId));
}
const users = await Promise.all(userPromises);

If you aren't familiar with Promise.all, or want to learn about an asyncMap helper, read below.

To make this more readable, we can use the getAll helper:

const users = await getAll(db, userIds);

Note: As is, it will return null in the place of any user that doesn't exist. If you want to throw an exception instead, use getAllOrThrow.

Back-references: the getManyFrom helper

Three circles on the right point to a circle on the left

We can extend the getOneFrom helper for the one-to-many case, differing only by using collect instead of unique. To get all of the posts for an author, we can do:

const posts = db
  .query("posts")
  .withIndex("by_authorId", q => q.eq("authorId", author._id))
  .collect();

Using the helper from convex-helpers, you can write:

const posts = await getManyFrom(db, "posts", "by_authorId", author._id);

Together with getAll we can look up all the users associated with comments on a post:

const comments = await getManyFrom(db, "comments", "postId", post._id);
const userIds = comments.map(comment => comment.userId);
const users = await getAll(db, userIds);

These helpers may seem small, but they end up making for much more readable queries. See below for a complex example.

How does the by_ prefix work?

If you were reading carefully, you might be surprised that I could type just getManyFrom(db, "posts", "by_authorId", author._id) above even though the field referenced is just authorId. Both the types and the runtime will not require you to pass another argument specifying the field name if your index is just the field name with a "by_" prefix. The tradeoff here is that it doesn't allow you to use these helpers if your field itself starts with a "by_" prefix.

If you do want to have your index named something else, you can pass another argument specifying the field name. e.g. if our index was .index("by_author", ["authorId"]) then the call would need to look like getManyFrom(db, "posts", "by_author", author._id, "authorId").

This works for all of the relationship helpers that use indexes.

What about N+1?

With traditional databases, there is a common issue called the “N+1 problem” where, instead of fetching all data in one request, you end up fetching one entity, then the N entities associated with it. This is an issue primarily because the code doing the querying is executing far from the database, so you end up waiting on many network requests, and if each query is non-trivial, you may cause excess load to your database.

Wait, isn’t that exactly what the getAll helper is doing?

Yes! However, Convex’s architecture changes some key aspects, which enables us to write queries like getAll.

  1. The functions are being executed very close to the database. This cuts out the largest contributor to the wait time, especially if you’re executing requests serially (i.e. a waterfall of requests).
  2. Convex’s concurrency model (read about our OCC here) doesn’t ever lock the database for reads, and thanks to its deterministic V8 runtime, queries can be cached efficiently and automatically by Convex, while maintaining stronger default ACID guarantees than other databases.1
  3. The db.get query is fast. Using “point queries” where you’re just loading one row is not a difficult or expensive task for our database. My heuristic when writing a query is that db.get is on the order of 1 millisecond, going from function to database and back with data.

All of this together means that you can write code to fetch the data you want instead of coercing your desires into a SQL command. And by the way, the SQL query planner is doing exactly this - fetching a range of documents via an index, then doing point queries for all the associated documents. It’s just hiding it away from you, making the performance harder to predict and debug.

Don’t worry if this is a bit confusing, the good news is you can write code without having to worry about packing your queries into a single expression.

Many-to-many

For many-to-many relationships using direct references and back-references (see the Relationship Structures post for more details), the access pattern is the same as for one-to-many: you can use getAll and getManyFrom. When you structure a many-to-many relationship by using a relationship (aka join) table, however, we can combine looking up the relationship documents with looking up each referenced document.

Join table: The getManyVia helper

Three circles on the left and three circles on the right connect both ways via intermediary circles

In our schema, we used the "postCategories" table to store associations between a post and a category. To fetch all of the categories for a post, without the helper, it could look like:

const links = await getManyFrom(db, "postCategories", "postId", post._id);
const categoryIds = links.map(link => link.categoryId);
const categories = await getAll(db, categoryIds);

Using the convex-helpers utility:

const categories = await getManyVia(db, "postCategories", "categoryId", "postId", post._id);

Note: As above, it will return null in the place of any category that doesn't exist. If you deleted a category but not the entries in "postCategories" pointing to it, for example. If you want to throw an exception instead, use getManyViaOrThrow.

Mapping over async functions

Let’s take a quick detour to make ourselves a utility function called asyncMap. In javascript, there’s an easy way to turn array A into a new array B using const b = a.map(someFunction). It does roughly the equivalent of:

const b = [];
for (const item of a) {
  b.push(someFunction(item));
}

Unfortunately, when you use map over an async function, you end up with a list of promises instead of the results. So let’s define a function that will act like map but await all the promises, like:

const bPromises = [];
for (const item of a) {
  // Start running each async function
  bPromises.push(someFunction(item));
}
const b = [];
for (const item of bPromises) {
  // Wait for each function to finish
  b.push(await someAsyncFunction(item));
}

The asyncMap helper

A simplified version of it which behaves like the above code, uses Promise.all:

async function asyncMap(iterable, asyncTransform) {
  const promises = [];
	for (const item of iterable) {
		promises.push(asyncTransform(item));
	}
  return Promise.all(promises);
}

For example:

const b = await asyncMap(a, someAsyncFunction);

This creates all of the promises without waiting on any of them, so they can run in parallel. For those familiar with promises and async-await patterns, you are likely used to this pattern. We could even simplify it to a one-liner:

const asyncMap = (list, asyncTransform) => Promise.all(list.map(asyncTransform));

However, I prefer the for-loop version as it supports iterables like Set, which don’t have a .map function.

To use the version in convex-helpers:

import { asyncMap } from "convex-helpers";

// getAll equivalent
const users = await asyncMap(userIds, doc => db.get(doc));
// or even
const users = await asyncMap(userIds, db.get);

// getManyVia equivalent
const categories = await asyncMap(
  await getManyFrom(db, "postCategories", "postId", post._id),
  (link) => db.get(link.categoryId)
);

A note on index naming

The helpers so far have leveraged the pattern of naming the index the same as the field you're indexing. This helps avoid having to type out overly-duplicative information. However, you may want a different name, especially for an index that has multiple fields. For example, if you want the postCategories table to be able to check whether a post already has a category, you might change the "postId" index to:

defineSchema({

  postCategories: defineTable({ // many to many relationship table
    postId: v.id('posts'),
    categoryId: v.id('categories'),
  }).index('postId_categories', ['postId', "categories"]),
})

In this case, you can pass an extra argument to the helpers (and their TypeScript type will force you to):

// if there were only one category for a post
const link = await getOneFrom(db, "postCategories", "postId_categories", post._id, "postId");
// get all the postCategory link documents.
const links = await getManyFrom(db, "postCategories", "postId_categories", post._id, "postId");
// many via join table
const categories = await getManyVia(
  db, "postCategories", "categoryId", "postId_categories", post._id, "postId"
);

Thankfully the TypeScript types will prompt you to pick a table name, then an index name, then an argument that matches the type of the index's first field, then the field name if it doesn't match the index.

Come together: joining data with functions

The beauty of writing the database queries in code is that you can compose functions to get the flexibility you want, while having full control over the order of queries (db.query) and direct lookups (db.get).

As a reminder, here are all the helper functions defined above:

import {
  getAll,
  getOneFrom,
  getManyFrom,
  getManyVia,
} from "convex-helpers/server/relationships";

// one-to-one via back reference
const author = await getOneFrom(db, "authorProfiles", "userId", user._id);
// one-to-many direct lookup
const users = await getAll(db, userIds);
// one-to-many or many-to-many via back references
const posts = await getManyFrom(db, "posts", "by_authorId", author._id);
// many via join table
const categories = await getManyVia(db, "postCategories", "categoryId", "postId", post._id);

With these, we can implement all sorts of lookups and joins, all in javascript!

Let’s write a query to:

  1. Look up all posts I’ve written (associated with my “author profile”).
  2. Include the associated comments in a “comments” field of each post.
  3. Add the categories associated with each post via a join table “postCategories” and put them in a “categories” array on each post.
  4. Sort the posts by the number of comments.
  5. Get the comment users, but only the distinct users, and return them separately since there might be a lot of duplication.
const author = await getOneFrom(db, 'authorProfiles', 'userId', user._id);
const barePosts = await getManyFrom(db, 'posts', 'by_authorId', author._id);
const commenterIds = new Set();
const posts = await asyncMap(barePosts, async (post) => {
  const comments = await getManyFrom(db, 'comments', 'postId', post._id);
  comments.forEach((comment) => commenterIds.add(comment.userId));
  const categories = await getManyVia(
    db, 'postCategories', 'categoryId', 'postId', post._id
  );
  return { ...post, comments, categories };
});
posts.sort((a, b) => b.comments.length - a.comments.length);
const commentUsers = await getAll(db, commenterIds);
return {posts, commentUsers};

No query planning, no SQL, and no table scans. And it’s all just code, so you can write your own helper functions to make it even more readable, and trust that you know what it’s doing under the hood.

Summary: the beauty of layering

By leveraging some helper functions, we were able to reconstruct various operations to combine data. Unlike SQL, however, we were explicit about the operations, rather than trusting a query planner and guessing at which indexes to define. In Convex, you can solve many problems with function abstractions, rather than pushing that complexity to the database layer. And thanks to the proximity to the database, these queries are very fast, so you don’t have to compromise on speed to have the ergonomics of writing in Javascript.

This applies to many other areas in Convex as well - writing authorization in functions rather than a clunky DSL, writing “middleware” as wrapper functions, and more. By providing powerful primitives and guarantees about execution purity, Convex gives you a solid foundation on which to layer behavior.

As always, let us know what you think in Discord, and if you come up with your own patterns for readable, composable querying.

Footnotes

  1. Most databases, including Postgres, default to “read committed” isolation, which is a weaker guarantee than “serializable” isolation, which Convex provides by default.

Build in minutes, scale forever.

Convex is the backend application platform with everything you need to build your project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.

Get started