3 months ago

Efficient COUNT, SUM, MAX with the Aggregate Component

Counting, summing, and finding the min or max value are some of the most confusing aspects for people new to Convex.

Before I became an official Convex Fanboi, I flagged this as one of the things I don't like about Convex when I first encountered it.

If you're coming from the SQL world like me, you're probably used to running SELECT COUNT(*) FROM someTable. But Convex doesn't offer this functionality. Why not?

I just created a video explaining the reasons behind this decision, so definitely check out that video if you want a deeper dive.

The TL;DW is that performing aggregates like COUNT or SUM requires scanning all rows in the database, which creates serious performance issues as your data grows. Convex deliberately forces you to think about this challenge upfront by making you handle aggregation yourself.

In the solutions section of the video, I briefly mentioned using the Aggregate Component and offered to create an entire video on just that topic if there was interest.

Well…

… the people have spoken.

If you were one of those people who voted, or if you've looked at the Aggregate Component before and quickly backed away, or if you're completely new to Convex and have no idea what I'm talking about, this post is for you!

Setup

Alright, so I'm going to assume you have a Convex project running already. If not, check out one of the QuickStart guides for how to get started there.

Now you're going to want to do npm install @convex-dev/aggregate then tell Convex you want to use the component by creating a convex.config.ts file in your convex directory and adding this code:

1import { defineApp } from "convex/server";
2import aggregate from "@convex-dev/aggregate/convex.config";
3import migrations from "@convex-dev/migrations/convex.config";
4
5const app = defineApp();
6app.use(aggregate, { name: "aggregateByScore" });
7app.use(aggregate, { name: "aggregateScoreByUser" });
8app.use(aggregate, { name: "music" });
9app.use(aggregate, { name: "photos" });
10app.use(aggregate, { name: "stats" });
11app.use(aggregate, { name: "btreeAggregate" });
12
13app.use(migrations);
14
15export default app;
16
17

I've included more configurations here than you would typically need. For this post, I'm using the same component multiple times with different names to help you easily identify each component in the Convex dashboard.

Now that you have the component set up, let's take a quick look at how to use it. Let's imagine we're building a game that will show a leaderboard like this

Our goal is to display the player's name, score, and rank on the leaderboard.

If you'd like to try these demos yourself, they're all available in the example folder of the Aggregate Component repo or you can play with them online at https://aggregate-component-example.netlify.app/

First, we need to define a standard Convex table to store our scores:

1import { defineSchema, defineTable } from "convex/server";
2import { v } from "convex/values";
3
4export default defineSchema({
5  leaderboard: defineTable({
6    name: v.string(),
7    score: v.number(),
8  }),
9});
10

So now we will need a way to insert our score.

1export const addScore = mutation({
2  args: {
3    name: v.string(),
4    score: v.number(),
5  },
6  returns: v.id("leaderboard"),
7  handler: async (ctx, args) => {
8    const id = await ctx.db.insert("leaderboard", {
9      name: args.name,
10      score: args.score,
11    });
12    return id;
13  },
14});
15

So here we define our addScore mutation and push a record into our leaderboard table.

Okay now to display the table how are we going to do that?

Well I guess we could do something like this:

1export const pageOfScores = query({
2  args: {
3    offset: v.number(),
4    numItems: v.number(),
5  },
6  handler: async (ctx, { offset, numItems }) => {
7    const scores = await ctx.db.query("leaderboard").collect();
8
9    const page = scores
10      .sort((a, b) => b.score - a.score)
11      .slice(offset, offset + numItems);
12      
13    return page;
14  },
15});
16

In this approach, we retrieve all rows from the table, sort them, and then extract just the page of items we need.

This works for small datasets, but it won't scale well as our database grows to thousands or millions of scores. The inefficiency comes from having to read every single row just to display one page of results.

What we really want is something like this:

1export const pageOfScores = query({
2  args: {
3    offset: v.number(),
4    numItems: v.number(),
5  },
6  handler: async (ctx, { offset, numItems }) => {
7    const scores = await ctx.db
8      .query("leaderboard")
9      .orderBy("score", "desc")
10      .skip(offset)
11      .take(numItems);
12
13    return scores;
14  },
15});
16

Here we order by the score descending then skip by our offset and take just what we need.

Well if you are an experienced Convex user you will already know Convex doesnt have an orderBy nor a skip.

Well I lie Convex does have an orderBy if you use an index

1  .withIndex("by_score", (q) => q.lte("score", someScore))
2      .order("desc")
3

but it definitely doesn't have a skip.

Related to this issue though is the one of Ranking our scores.

Think about this for a second: how would you efficiently rank these scores?

When I add a new score, how do I determine its rank?

I suppose I could loop through all existing scores to find which ones are above and below mine—that's my rank. But then everyone ranked below this new score would need their rank updated since they've been bumped down the leaderboard.

This doesn't sound efficient at all, especially as our data grows.

Fear not! This is exactly where the Aggregate Component saves the day!

To start using it, we need to add a TableAggregate at the top of our leaderboards.ts file like so:

1const aggregateByScore = new TableAggregate<{
2  Key: number;
3  DataModel: DataModel;
4  TableName: "leaderboard";
5}>(components.aggregateByScore, {
6  sortKey: (doc) => -doc.score,
7});
8

Here we tell the component that we want to sort by the score from the leaderboard table, which will be a number.

Note that I'm using a negative value for the sortKey. This is because the aggregate component sorts in ascending order (smallest to largest), but we want to rank from largest to smallest. Using the negative value makes our intention clearer, as we'll see shortly.

Now when we insert a new score, we must also add it to the aggregate component like so:

1export const addScore = mutation({
2  args: {
3    name: v.string(),
4    score: v.number(),
5  },
6  returns: v.id("leaderboard"),
7  handler: async (ctx, args) => {
8    const id = await ctx.db.insert("leaderboard", {
9      name: args.name,
10      score: args.score,
11    });
12    const doc = await ctx.db.get(id);
13    await aggregateByScore.insert(ctx, doc!);
14    return id;
15  },
16});
17

For now we are going to assume there are no existing scores in the table. We will talk a bit later in this post about what happens if you want to use the Aggregate component with an existing table.

Now we are inserting data into our aggregate component we can use it like so:

1export const pageOfScores = query({
2  args: {
3    offset: v.number(),
4    numItems: v.number(),
5  },
6  handler: async (ctx, { offset, numItems }) => {
7    const firstInPage = await aggregateByScore.at(ctx, offset);
8
9    const page = await aggregateByScore.paginate(ctx, {
10      bounds: {
11        lower: {
12          key: firstInPage.key,
13          id: firstInPage.id,
14          inclusive: true,
15        },
16      },
17      pageSize: numItems,
18    });
19
20    const scores = await Promise.all(
21      page.page.map((doc) => ctx.db.get(doc.id))
22    );
23
24    return scores.filter((d) => d !== undefined);
25  },
26});
27

Here's how it works: First, we use the aggregate component to find the score at our desired offset. Then we use that score in our paginate query to define our page's starting point and specify how many items we want. Finally, we look up each item in our leaderboards table and return the results.

What we're essentially doing is finding the score at a specific rank. With our leaderboard scores ordered in descending order, we can efficiently retrieve 5 (or any number of) items for display.

Simple, right? This approach highlights why the aggregate component is so valuable. Without it, fetching page 100 would require scanning through 496 rows before reaching the data we need.

This manual scanning would be not only slow but also inefficient because of Convex's transactional nature. If any of those 496 values changed—even just a player's name—it would invalidate our entire page.

The Aggregate Component cleverly stores data in a way that lets it bypass unnecessary document reads and jump directly to where you need to be.

I'm getting ahead of myself though. We'll explore HOW the Aggregate Component works later in this post.

For now, let's explore another useful capability: finding a score's rank. What if instead of looking up the score at a given rank, we want to do the reverse—determine what rank a particular score would have?

1export const rankOfScore = query({
2  args: {
3    score: v.number(),
4  },
5  handler: async (ctx, args) => {
6    return await aggregateByScore.indexOf(ctx, -args.score);
7  },
8});
9

We can do this easily with the indexOf function on the aggregate component. Just remember to invert our score as we did before.

When we test it out, we can enter a score and immediately see the correct rank.

Dont forget as this is Convex everything is fully reactive, so any time we add or remove scores, the rankings update instantly.

If we wanted to display statistics for a specific player such as their average or highest score we can also implement that efficiently with the aggregate component.

1const aggregateScoreByUser = new TableAggregate<{
2  Key: [string, number];
3  DataModel: DataModel;
4  TableName: "leaderboard";
5}>(components.aggregateScoreByUser, {
6  sortKey: (leaderboardTableDoc) => [leaderboardTableDoc.name, leaderboardTableDoc.score],
7  sumValue: (leaderboardTableDoc) => leaderboardTableDoc.score,
8});
9

First, we need to modify our aggregate. Instead of using just the score as the sortKey, we're now using an array containing both the player name and score. We've also added a sumValue parameter so we can track the total points a player has earned.

Now, to calculate the average score for a player, we can do this:

1export const userAverageScore = query({
2  args: {
3    name: v.string(),
4  },
5  handler: async (ctx, args) => {
6    const count = await aggregateScoreByUser.count(ctx, {
7      bounds: { prefix: [args.name] },
8    });
9    if (!count) return null;
10    const sum = await aggregateScoreByUser.sum(ctx, {
11      bounds: { prefix: [args.name] },
12    });
13    return sum / count;
14  },
15});
16

Here we use the aggregate to count the scores belonging to this player and sum their scores, then simply return the average.

Don't forget that because of how the Aggregate stores these values, this lookup is super fast and efficient, it doesn't have to scan all the rows.

If we want to show the highest score a user has ever achieved, we can use the max function

1export const userHighScore = query({
2  args: {
3    name: v.string(),
4  },
5  handler: async (ctx, args) => {
6    const item = await aggregateScoreByUser.max(ctx, {
7      bounds: { prefix: [args.name] },
8    });
9    if (!item) return null;
10    return item.sumValue;
11  },
12});
13

Interestingly, we need to access the value through the sumValue property.

If we take this for a spin now, we can enter my name, it works! We can see my average and highest score. Just like before we can add a new score with my name and these values instantly update.

Other Demos

Now that we understand the basics, what else can we do with the Aggregate Component?

If your data has clear categories (like photo albums in this demo), you can use namespacing to efficiently segregate your data. This approach reduces the number of reads and writes when working with a specific namespace.

1const photos = new TableAggregate<{
2  Namespace: string;
3  Key: number;
4  DataModel: DataModel;
5  TableName: "photos";
6}>(components.photos, {
7  namespace: (doc) => doc.album,
8  sortKey: (doc) => doc._creationTime,
9});
10

Just tell the component what namespace to use for a given doc.

Then use that namespace when calling into the component

1export const photoCount = query({
2  args: { album: v.string() },
3  returns: v.number(),
4  handler: async (ctx, { album }) => {
5    return await photos.count(ctx, { namespace: album });
6  },
7});
8

We could also use the aggregate to do efficient randomization like in this demo.

We click this button to get a random song.

Or we could show a page of a randomized playlist.

1const randomize = new TableAggregate<{
2  DataModel: DataModel;
3  TableName: "music";
4  Key: null;
5}>(components.music, {
6  sortKey: () => null,
7});
8

To do this, we need to "null" our sortKey. This causes the aggregate component to order values in its structure by the document ID, which is effectively a random GUID.

1export const getRandomMusicTitle = query({
2  args: {
3    cacheBuster: v.optional(v.number()),
4  },
5  handler: async (ctx) => {
6    const randomMusic = await randomize.random(ctx);
7    if (!randomMusic) return null;
8    const doc = await ctx.db.get(randomMusic.id);
9    if (!doc) return null;
10    return doc.title;
11  },
12});
13

Using the random function, we can efficiently retrieve a random song.

The final demo displays various statistics from the aggregate. As I add more latency values, you can see the numbers update in real-time.

This demo showcases a lower level API called the DirectAggregate

1const stats = new DirectAggregate<{
2  Key: number;
3  Id: string;
4}>(components.stats);
5

With this approach, we don't tie the aggregate to a table. Instead, we directly specify what our key and value should be.

When we report a latency

1export const reportLatency = mutation({
2  args: {
3    latency: v.number(),
4  },
5  returns: v.null(),
6  handler: async (ctx, { latency }) => {
7    await stats.insert(ctx, {
8      key: latency,
9      id: new Date().toISOString(),
10      sumValue: latency,
11    });
12  },
13});
14

we insert those values manually.

1export const getStats = query({
2  args: {},
3  handler: async (ctx) => {
4    const count = await stats.count(ctx);
5    if (count === 0) return null;
6
7    const mean = (await stats.sum(ctx)) / count;
8    const median = (await stats.at(ctx, Math.floor(count / 2))).key;
9    const p75 = (await stats.at(ctx, Math.floor(count * 0.75))).key;
10    const p95 = (await stats.at(ctx, Math.floor(count * 0.95))).key;
11    const min = (await stats.min(ctx))!.key;
12    const max = (await stats.max(ctx))!.key;
13    return {
14      count,
15      mean,
16      median,
17      p75,
18      p95,
19      max,
20      min,
21    };
22  },
23});
24

Retrieving data from it is just like using the TableAggregate.

We can perform the same operations like count, sum, min, max and so on.

Potential Issues & Triggers

Okay before we pop the hood of the aggregate component and have a look at how it does its magic lets just quickly talk about an issue with the Aggregate Component.

1export const addScore = mutation({
2  args: {
3    name: v.string(),
4    score: v.number(),
5  },
6  returns: v.id("leaderboard"),
7  handler: async (ctx, args) => {
8    const id = await ctx.db.insert("leaderboard", {
9      name: args.name,
10      score: args.score,
11    });
12    const doc = await ctx.db.get(id);
13    await aggregateByScore.insert(ctx, doc!);
14    await aggregateScoreByUser.insert(ctx, doc!);
15    return id;
16  },
17});
18

You may have noticed an operational issue with the Aggregate component. Whenever we add a document to a table like this leaderboard score, we must also remember to insert it into the aggregate.

1export const removeScore = mutation({
2  args: {
3    id: v.id("leaderboard"),
4  },
5  handler: async (ctx, { id }) => {
6    const doc = await ctx.db.get(id);
7    await ctx.db.delete(id);
8    await aggregateByScore.delete(ctx, doc!);
9    await aggregateScoreByUser.delete(ctx, doc!);
10  },
11});
12

And the same goes for when we remove a score like this.

Or if we update a score like this

1export const updateScore = mutation({
2  args: {
3    id: v.id("leaderboard"),
4    name: v.string(),
5    score: v.number(),
6  },
7  handler: async (ctx, args) => {
8    const oldDoc = await ctx.db.get(args.id);
9    if (!oldDoc)
10      throw new Error(`Score with id '${args.id}' could not be found`);
11
12    await ctx.db.patch(args.id, {
13      name: args.name,
14      score: args.score,
15    });
16
17    const newDoc = await ctx.db.get(args.id);
18    if (!newDoc)
19      throw new Error(`Updated score with id '${args.id}' could not be found`);
20
21    // Update both aggregates with the old and new documents
22    await aggregateByScore.replace(ctx, oldDoc, newDoc);
23    await aggregateScoreByUser.replace(ctx, oldDoc, newDoc);
24  },
25});
26

we need to remember to update the aggregates because the values being changed could affect the aggregate.

This required coupling is necessary but potentially error-prone. If you or a team member forgets to include the aggregate lines somewhere, you can easily end up with aggregates that are out of sync or corrupted.

Fortunately, there's a solution to this issue in the form of triggers and custom functions.

Triggers, if you aren't familiar, are a library provided by the convex-helpers repo that allow you to wrap your database access. This means whenever you make a change, it can also "trigger" other things to happen.

Let me show you what I mean.

1const triggers = new Triggers<DataModel>();
2
3triggers.register("leaderboard", aggregateByScore.trigger());
4triggers.register("leaderboard", aggregateScoreByUser.trigger());
5

First, we create a triggers object and register it on our leaderboards table, attaching both aggregates as shown.

Next, we import another library from convex-helpers called custom functions.

This allows us to create a custom mutation function that looks like this

1const mutationWithTriggers = customMutation(
2  mutation,
3  customCtx(triggers.wrapDB)
4);
5

We can now replace our mutations with this mutationWithTriggers and remove all those places where we manually insert or update our aggregates. Our triggers will automatically update the aggregates whenever the leaderboard changes, which greatly simplifies the logic in our functions.

Very nice!

Dashboard + Triggers Issue

While we're discussing issues with the Aggregate Component, I want to point out another potential pitfall you should be aware of.

As you probably know, you can open the Convex dashboard for any project, go to the data tab, and change a value, like a player's name, and it instantly updated in the leaderboard.

But now if we try and delete this row from in the UI:

Boom! Error!

The problem is that dashboard changes don't trigger your code, so the triggers aren't fired. This leaves your aggregate corrupted because the keys no longer match.

Unfortunately, there's no easy workaround for this issue right now other than being extremely careful with dashboard edits.

We've been discussing potential platform-level triggers internally, but they're still quite a way off.

Existing Data & Migrations

By now, you should have a good understanding of what the aggregate component is and how to use it in your new project.

But what if you already have existing data? Can you incorporate it into your project?

Fortunately, the answer is yes.

1export const migrations = new Migrations<DataModel>(components.migrations);
2export const run = migrations.runner();
3
4export const backfillAggregatesMigration = migrations.define({
5  table: "leaderboard",
6  migrateOne: async (ctx, doc) => {
7    await aggregateByScore.insertIfDoesNotExist(ctx, doc);
8    await aggregateScoreByUser.insertIfDoesNotExist(ctx, doc);
9    console.log("backfilled", doc.name, doc.score);
10  },
11});
12

By pulling in the migrations component, we can create a migration that will iterate over every row in the leaderboard table and populate our aggregates in a safe way.

If you're interested, there are handy step-by-step instructions on the sequence you should follow in the docs.

How does it work?

Okay now if you're anything like me, at this point you're wondering how this wizardry is performed.

Well, I started to go deep on this topic and began putting together a whole demo that shows how the Btree evolves over time as you add and remove items.

But the hours started turning into days so I reluctantly decided to park this little demo for now.

If you would like to see me dive a bit deeper into this in another video then come find me on Discord or X :)

The TL;DR is that the aggregate component works by storing data in a highly efficient data structure called a Balanced Search Tree or BTree. This is the same type of data structure typically used in database indices like those in Postgres or MySQL.

By constructing a BTree and storing it in the Convex database, you can transform reads from an O(N) operation to an O(log(N)) operation.

We're essentially building our own custom index that we can control.

Final Thoughts

Alright, now I promised a spicy take at the start so here it is.

Convex chose not to include aggregates in their core offering and instead offloaded it into user-land or components like this one for some very good reasons that I elaborate on in my dedicated video on the topic.

By making users handle the aggregation of their data themselves rather than leaving it up to the whims of the almighty Query Planner, you end up with more predictable performance and costs.

On the other hand, it's admittedly a pain to manage simple counts and other basic aggregations yourself, whether manually via code or through the Aggregate component.

Unless you use triggers, it's very easy to forget to update the Aggregate component when you make a mutation to the underlying data, leaving your aggregate in a corrupted state that you then have to migrate your way out of.

Or worse, if you or one of your teammates changes a value in the dashboard as I showed previously, this leaves your aggregate in a corrupted state.

These kinds of easy-to-make mistakes are, in my opinion, antithetical to the typical ethos of "no footguns" found in the rest of Convex.

So I do wonder if there's something more Convex should do here, perhaps bringing some of these aggregations to the platform level, or maybe putting more work into platform-level triggers?

Well anyways I think we will have to leave that discussion for another day!

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started