Bright ideas and techniques for building with Convex.
Profile image
Ian Macartney
10 months ago

Relationship Structures: Let's Talk About Schemas

A web of relationships

Relationships are a key part of a relational database like Convex. But how do you use them? In this post we’ll look at some patterns for structuring relationships in the Convex database. See this post on Relationship Helpers for code examples to simplify and compose queries.

Typically, relationships are characterized as being:

  • One-to-one: Every document of type A corresponds to at most one document of type B. For example, you may store a user’s preferences separately from the main user document.
  • Many-to-one: Every A document can relate to zero or more B documents, but each B document is related to at most one A document. For example, a conversation may have many messages, but each message is only in one conversation.
  • Many-to-many: Every A document can relate to zero or more B documents, and every B document can relate to zero or more A documents. For example, a post may have many tags, and every tag can be on many posts.

One-to-one

To structure a one-to-one relationship, you can have one document hold an ID to another, or have both hold IDs. With the example of storing a student health record as a separate document from the student, this allows you to fetch a student without loading the health record, while still having fast access to it in situations where you need it.

Direct reference

An arrow points from a left circle to a right circle

This is the simplest relationship:

defineSchema({
  students: defineTable({
    ...
  }),
  healthRecords: defineTable({
    studentId: v.id("students")
    ...
  }),
});

You can then access the student from a health record like:

const student = await ctx.db.get(record.studentId);

You can create a health record associated with a student like this:

const healthRecordId = await ctx.db.insert('healthRecords', {
  studentId: student._id,
  ...
});

This works well, but what if you have a student and want to look up the health record?

Back-reference

An arrow points from a right circle to a left circle

Another way to use a single reference is to have an index on the field with the referenced ID:

defineSchema({
  students: defineTable({
    ...
  }),
  healthRecords: defineTable({
    studentId: v.id("students")
    ...
  }).index("studentId", ["studentId"]),
});

We create the record in the same way as direct references, but now we can look up the record from the student:

const record = await ctx.db.query("healthRecords")
  .withIndex("studentId", q => q.eq("studentId", student._id))
  .unique()

This uses the index to find the record pointing to this student. Since we expect there to only be one, we use unique() so it will throw an error if we end up with more than one.

See the Relationship Helpers post for a getOneFrom helper.

What if we want to easily look up in both directions without a schema?

Why not both?

For one-to-one relationships, it’s possible to store an ID in both places:

defineSchema({
  students: defineTable({
    ...
    healthRecordId: v.optional(v.id("healthRecords")),
  }),
  healthRecords: defineTable({
    ...
    studentId: v.id("students"),
  }),
});

When you create the related document, you can update both documents to have each others’ IDs.

const studentId = await ctx.db.insert("students", {...});
const healthRecordId = await ctx.db.insert('healthRecords', {studentId, ...});
await ctx.db.patch(student._id, { healthRecordId });
//code to create both objects

This ensures you can always do a “point lookup” (which are very fast) without using an index. However, keep in mind that duplicating this state could lead to inconsistencies if you update one record but forget to update the other. It’s up to you to keep both records in sync. Thankfully, since Convex mutations are transactions, you can update both in the same mutation and not worry about any other queries or mutations reading one but not the other.

Next, let’s look at what happens when you have many documents holding the same reference.

One-to-many

When one document is related to many documents — for instance, one user may author many posts — you have a one-to-many relationship.

Direct references (simple)

A circle on the left points to three circles on the right

If you only have a few references, you might find it easier to store an array of IDs instead of querying back-references with an index (which we’ll see next). For instance, let’s say you have a competition where each participant can submit up to 5 works of art. You can directly store links to the art on the participant:

defineSchema({
  submissions: defineTable({...}),
  participants: defineTable({
    ...
    submissionIds: v.array(v.id('submissions')),
  }),
});

Then you can directly get each participant’s art without using an index:

const submissions = await Promise.all(participant.submissionIds.map(db.get));

See the Relationship Helpers post for a getAll helper to fetch an array of IDs, and more information on Promise.all and map.

However, this comes with some important design considerations:

  1. You can’t index an array in Convex, so you won’t be able to query for the participant who submitted a given submission ID. You’d have to scan all participants and check the submissionIds array in each.1

    const participants = await ctx.db.query('participants').collect();
    for (const participant of participants) {
      if (participant.submissionIds.find(id => id.equals(submission._id))) {
        return participant;
      }
    }
    
  2. You are limited to 8192 entries in an array, and we recommend having on the order of 10 items in arrays, so you should only use this where there’s a natural limit to how many references there will be.

Back-references (scalable)

Three circles on the right point to a circle on the left

Similar to one-to-one relationships, you can use back-references to keep track of all the posts referring to an author. You can think of this as a many-to-one relationship, as many posts hold a reference to one author. This scales better than the array, since no one document is getting larger every time there’s another post by the same author. You can define an index on those references to look up which posts have a certain author:

posts: defineTable({
  authorId: v.id("users"),
  ...
}).index('authorId', ['authorId']);

Similarly to one-to-one, this allows you to look up the author of a post directly. We’ll similarly use the index for the back-reference (author → posts). The difference: instead of unique(), we use collect() to get all of the posts:

const posts = await ctx.db.query("posts")
  .withIndex("authorId", q => q.eq("authorId", author._id))
  .collect();

See the Relationship Helpers post for a getManyFrom function to codify this.

You create the post with a link to the author like so:

await ctx.db.insert("posts", {..., authorId: author._id});

Many-to-many

Some relationships don’t allow you to just store a single ID. In a group messaging application, for example, keeping track of which users are in which channels requires a many-to-many relationship. A user can be in many channels and a channel can have many users. In these situations, there are a few options, but the most common one is to use a table that has a reference to both.

Relationship table (scalable)

Diagram

There is a lot of terminology you could use here: a “join” table, an “edge,” or an “association” table to name a few. The premise is simple, however:

defineSchema({
  users: defineTable({
    ... 
  }),
  channels: defineTable({ 
    ...
  }),
  channelUsers: defineTable({
    channelId: v.id("channels"),
    userId: v.id("users"),
    // Other fields related to the user, channel pairing
    // E.g. notification preferences or when they were last seen
  })
    .index("channelId", ["channelId"]) // Get users in a channel
    .index("userId", ["userId"]), // Get channels for a user
});

You have a table where every document represents a pairing of a user and a channel. To find the users in a channel, it’s similar to the one-to-many lookup above, combined with a db.get on each associated record:

const channelUsers = await ctx.db.query("channelUsers")
  .withIndex("channelId", q => q.eq("channelId", channel._id))
  .collect();
const userIds = channelUsers.map(channelUser => channelUser.userId);
const users = await Promise.all(userIds.map(db.get));

See the Relationship Helpers post for a getManyVia helper function to make this easy.

To add a user to a channel:

await ctx.db.insert('channelUsers', { channelId, userId });

To figure out if a user is in a channel, you have 3 options:

  1. Fetch with the channel index and look for the user.

    const channelUsers = await ctx.db.query("channelUsers")
      .withIndex("channelId", q => q.eq("channelId", channel._id))
      .collect();
    const isInChannel = !!channelUsers.find(cu => cu.userId.equals(user._id);
    
  2. Fetch with the user and look for the channel. I’d prefer this one since users are more likely to have a lower limit of how many channels they can be in, versus how many users can be in a channel.

    const channelUsers = await ctx.db.query("channelUsers")
      .withIndex("userId", q => q.eq("userId", user._id))
      .collect();
    const isInChannel = !!channelUsers.find(cu => 
    cu.channelId.equals(channel._id);
    
  3. To look it up without iterating an array, you could have one of the indexes contain both fields:

    .index("channelId", ["channelId"])
    .index("userId_channelId", ["userId", "channelId"]),
    

    This would allow you to check directly:

    await ctx.db.query("channelUsers")
      .withIndex("userId_channelId", q =>
        q.eq("userId", user._id).eq("channelId", channel._id)
      )
      .unique(); // null if it's not found
    

Direct References (simple)

While the relationship table is the most flexible and performant, there are situations where you can get away without a relationship table + indexes, similar to the one-to-many direct references approach. I’ll repeat the same caution about this approach:

This comes with some important design considerations:

  1. You can’t index an array in Convex, so you won’t be able to query for the documents that reference some ID. You’d have to scan all documents and check the array in each.
  2. You are limited to 8192 entries in an array, and we recommend having on the order of 10 nested elements, so you should only use this where there’s a natural limit to how many references there will be.

Directed graph without a reverse lookup

Diagram

One example when this could make sense is storing references on a user to a small number of “roles,” where you have a cap on how many roles a user can have. This is presuming that you commonly need to look up the roles for a user, but don’t need to look up users by role. And when you do want to look up by role, you’re ok using a paginated query to scan the user table (e.g. from an admin page).

defineSchema({
  roles: defineTable({
    description: v.string(),
    permissions: ...,
  }),
  users: defineTable({
    roles: v.array(v.id('roles')),
    name: v.string(),
    ...
  }),
});

Here you can follow the direct reference approach for one-to-many to access roles for each user.

Directed graph with a reverse lookup (solving design consideration 1)

Diagram

When you want to be able to look up in both directions, and still avoid making a relation table, you could store an array of IDs on both documents. This is similar to one-to-many with direct references, except having an array on both sides instead of a single ID on one side. As an example, consider a small school where each teacher might teach a handful of subjects (e.g. “8th grade math”) and each subject might have only a handful of teachers.

defineSchema({
  subjects: defineTable({
    teachers: v.array(v.id('teachers')),
    title: v.string(),
  }),
  teachers: defineTable({
    subjects: v.array(v.id('subjects')),
    name: v.string(),
  }),
  ...
});

As long as you keep the lists up to date (and in case it wasn’t clear, they won’t stay up to date automatically), you can easily look up a teacher’s subjects and a subject’s teachers, without having to define and use an intermediary table.

await ctx.db.patch(subject._id, { teachers: [ teacher._id, ...subject.teachers ] });
await ctx.db.patch(teacher._id, { subjects: [ subject._id, ...teacher.subjects ] });

Reminder: this should only be used in situations where the lists will be small and you’re willing to keep them in sync.

Summary

In this post we looked at how to structure various relationships. See the post on Relationship Helpers for code you can use, along with other helper functions. If you’re interested in talking about the approaches here, join us in our Discord.

Footnotes

  1. Note: To solve for (1), you could also store the participantId in the submissions table, so you have bi-directional references. This is similar to the one-to-one “Why not both” section - and comes with the same caution that you should make sure to keep them in sync.

Build in minutes, scale forever.

Convex is the backend application platform with everything you need to build your project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.

Get started