Bright ideas and techniques for building with Convex.
Profile image
Ian Macartney
a month ago

Functional Relationships: Helpers

Code for joining queries with helpers for one-to-many, many-to-many, and more.

In the Relationship Structures post, we looked at how to structure one-to-one, one-to-many and many-to-many relationships using a relational database, and what those queries look like in Convex.

In a SQL-based database, you might be used to the JOIN operator, which connects fields from multiple tables into a single flat result. With Convex, we chose to instead expose predictable primitives that you can compose to fetch data, without the sometimes-unpredictable black-box of a query planner. To read more about our thoughts on SQL, read this post. To read more about our indexes, read the docs.

In this post, we’ll look at some helper functions to help write code to traverse relationships in a readable, predictable, and debuggable way. The code is here (including fancy typescript types) if you want to use it in your project. By the end, we’ll be able to compose functions to execute a complex query involving the SQL equivalent of selecting, joining, grouping, sorting, and fetching distinct documents.

The examples will reference this schema:

defineSchema({
  users: defineTable({
    name: v.string(),
  }),
  authorProfiles: defineTable({
    userId: v.id('users'), // one to one
    bio: v.string(),
  }).index('by_userId', ['userId']),
  posts: defineTable({
    title: v.string(),
    authorId: v.id('authorProfiles'), // one to many
    content: v.string(),
  }),
  comments: defineTable({
    postId: v.id('posts'), // one to many
    userId: v.id('users'), // one to many
    text: v.string(),
  }).index('by_postId', ['postId']),
  postCategories: defineTable({ // many to many relationship table
    postId: v.id('posts'),
    categoryId: v.id('categories'),
  }).index('by_postId', ['postId']),
  categories: defineTable({ ... }),
});

One-to-one

When each document only has one or zero related documents.

Direct reference: db.get

If you have an id of a document, you can directly access it with db.get. This is the simplest lookup.

const user = await db.get(author.userId);

Back-reference: the getOneFrom helper

To fetch a document that has a reference to the document on hand, we can assumes the index is the field name prefixed with "by_". You could alternatively pass an optional parameter of another index name to use, or encode your own naming convention:

async function getOneFrom(db, table, field, value) {
  return db
    .query(table)
    .withIndex("by_" + field, q => q.eq(field, value))
    .unique()
}

For example we can look up an author profile from a user by querying on the index for userId:

const author = await getOneFrom(db, "authorProfiles", "userId", user._id);

Mapping over async functions

Let’s take a quick detour to make ourselves a utility function called asyncMap. In javascript, there’s an easy way to turn array A into a new array B using const b = a.map(someFunction). It does roughly the equivalent of:

const b = [];
for (const item of a) {
  b.push(someFunction(item));
}

Unfortunately, when you use map over an async function, you end up with a list of promises instead of the results. So let’s define a function that will act like map but await all the promises, like:

const bPromises = [];
for (const item of a) {
  // Start running each async function
  bPromises.push(someFunction(item));
}
const b = [];
for (const item of bPromises) {
  // Wait for each function to finish
  b.push(await someAsyncFunction(item));
}

The asyncMap helper

A simplified version of it which behaves like the above code, uses Promise.all:

async function asyncMap(iterable, asyncTransform) {
  const promises = [];
	for (const item of iterable) {
		promises.push(asyncTransform(item));
	}
  return Promise.all(promises);
}

For example:

const b = await asyncMap(a, someAsyncFunction);

This creates all of the promises without waiting on any of them, so they can run in parallel. For those familiar with promises and async-await patterns, you are likely used to this pattern. We could even simplify it to a one-liner:

const asyncMap = (list, asyncTransform) => Promise.all(list.map(asyncTransform));

However, I prefer the for-loop version as it supports iterables like Set, which don’t have a .map function.

So how would we use this utility?

One-to-many

When each document has potentially many related documents.

Direct references: the getAll helper

To look up a list of IDs all at once, we can define this helper to make our code more readable:

async function getAll(db, ids) {
  return asyncMap(ids, db.get);
}

For example we can look up all the users associated with comments on a post:

const comments = await getManyFrom(db, "comments", "postId", post._id);
const userIds = comments.map(comment => comment.userId);
const users = await getAll(db, userIds);

Back-references: the getManyFrom helper

We can extend the getOneFrom helper for the one-to-many case, differing only by using collect instead of unique:

async function getManyFrom(db, table, field, value) {
  return db
    .query(table)
    .withIndex("by_" + field, q => q.eq(field, value))
    .collect()
}

For example looking up all the posts for a certain author:

const posts = await getManyFrom(db, "posts", "authorId", author._id);

These helpers may seem small, but they end up making for much more readable queries. See the bottom of the post for a complex example.

What about N+1?

With traditional databases, there is a common issue called the “N+1 problem” where, instead of fetching all data in one request, you end up fetching one entity, then the N entities associated with it. This is an issue primarily because the code doing the querying is executing far from the database, so you end up waiting on many network requests, and if each query is non-trivial, you may cause excess load to your database.

Wait, isn’t that exactly what the getAll helper is doing?

Yes! However, Convex’s architecture changes some key aspects, which enables us to write queries like getAll.

  1. The functions are being executed very close to the database. This cuts out the largest contributor to the wait time, especially if you’re executing requests serially (i.e. a waterfall of requests).
  2. Convex’s concurrency model (read about our OCC here) doesn’t ever lock the database for reads, and thanks to its deterministic V8 runtime, queries can be cached efficiently and automatically by Convex, while maintaining stronger default ACID guarantees than other databases.1
  3. The db.get query is fast. Using “point queries” where you’re just loading one row is not a difficult or expensive task for our database. My heuristic when writing a query is that db.get is on the order of 1 millisecond, going from function to database and back with data.

All of this together means that you can write code to fetch the data you want instead of coercing your desires into a SQL command. And by the way, the SQL query planner is often doing exactly this - fetching a range of documents via an index, then doing point queries for all the associated documents. It’s just hiding it away from you, making the performance harder to predict and debug.

Don’t worry if this is a bit confusing, the good news is you can write code without having to worry about packing your queries into a single expression.

Many-to-many

For many-to-many relationships using direct references and back-references (see the Relationship Structures post for more details), the access pattern is the same as for one-to-many: you can use getAll and getManyFrom. When you structure a many-to-many relationship by using a relationship (aka join) table, however, we can combine looking up the relationship documents with looking up each referenced document.

Join table: The getManyVia helper

To codify the join table pattern, we can define a helper function:

async function getManyVia(db, table, toField, fromField, value) {
  return asyncMap(
    getManyFrom(db, table, fromField, value)
    (edge) => db.get(edge[toField])
  );
}

For example, fetching all the categories associated with a post, using the “postCategories” join table:

const categories = await getManyVia(db, "postCategories", "categoryId", "postId", post._id);

Come together: joining data with functions

The beauty of writing the database queries in code is that you can compose functions to get the flexibility you want, while having full control over the order of queries (db.query) and direct lookups (db.get).

As a reminder, here are all the helper functions defined above:

async function asyncMap(list, asyncTransform) {
  const promises = [];
	for (const item of list) {
		promises.push(asyncTransform(item));
	}
  return Promise.all(promises);
}
// direct lookup
async function getAll(db, ids) {
  return asyncMap(ids, db.get);
}
// one-to-one via back reference
async function getOneFrom(db, table, field, value) {
  return db.query(table).withIndex("by_" + field, q => q.eq(field, value)).unique()
}
// one-to-many or many-to-many via back references
async function getManyFrom(db, table, field, value) {
  return db.query(table).withIndex("by_" + field, q => q.eq(field, value)).collect()
}
// many-to-many via join table
async function getManyVia(db, table, toField, fromField, value) {
  return asyncMap(
    await getManyFrom(db, table, fromField, value),
    (edge) => db.get(edge[toField])
  );
}

With these, we can implement all sorts of lookups and joins, all in javascript!

Let’s write a query to:

  1. Look up all posts I’ve written (associated with my “author profile”).
  2. Include the associated comments in a “comments” field of each post.
  3. Add the categories associated with each post via a join table “postCategories” and put them in a “categories” array on each post.
  4. Sort the posts by the number of comments.
  5. Get the comment users, but only the distinct users2, and return them separately since there might be a lot of duplication.
const author = await getOneFrom(db, 'authorProfiles', 'userId', user._id);
const barePosts = await getManyFrom(db, 'posts', 'authorId', author._id);
const commenterIds = new Set();
const posts = await asyncMap(barePosts, async (post) => {
  const comments = await getManyFrom(db, 'comments', 'postId', post._id);
  comments.forEach((comment) => commenterIds.add(comment.userId));
  const categories = await getManyVia(
    db, 'postCategories', 'categoryId', 'postId', post._id
  );
  return { ...post, comments, categories };
});
posts.sort((a, b) => b.comments.length - a.comments.length);
const commentUsers = await getAll(db, commenterIds);
return {posts, commentUsers};

No query planning, no SQL, and no table scans. And it’s all just code, so you can write your own helper functions to make it even more readable, and trust that you know what it’s doing under the hood.

Summary: the beauty of layering

By leveraging some helper functions, we were able to reconstruct various operations to combine data. Unlike SQL, however, we were explicit about the operations, rather than trusting a query planner and guessing at which indexes to define. In Convex, you can solve many problems with function abstractions, rather than pushing that complexity to the database layer. And thanks to the proximity to the database, these queries are very fast, so you don’t have to compromise on speed to have the ergonomics of writing in Javascript.

This applies to many other areas in Convex as well - writing authorization in functions rather than a clunky DSL, writing “middleware” as wrapper functions, and more. By providing powerful primitives and guarantees about execution purity, Convex gives you a solid foundation on which to layer behavior.

As always, let us know what you think in Discord, and if you come up with your own patterns for readable, composable querying.

Footnotes

  1. Most databases, including Postgres, default to “read committed” isolation, which is a weaker guarantee than “serializable” isolation, which Convex provides by default.

  2. Technically, IDs are currently objects which don’t play nicely with a Set. To do this today you’d need to store the ID objects keyed by their .id property. However, soon IDs will be strings and this code will just work. Apologies for the short-term simplification.

Build in minutes, scale forever.

Convex is the backend application platform with everything you need to build your project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.

Get started