a year ago

An Object Sync Engine for Local-first Apps

In our previous post, we explored nine dimensions for categorizing sync engines across their data models, systems requirements, and programming model. We took an expansive view where everything from BitTorrent to Linear to Valorant fit within this framework. This post focuses on what we believe is a sweet spot for local-first web applications: the object sync engine.

Companies like Linear (with their sync engine), Figma (with LiveGraph), and Asana (with LunaDB) have all independently developed object sync engines for their applications. More recently, Replicache has shown that it’s possible to create a generic one that's flexible enough for many types of apps.

We're currently working on our own object sync engine at Convex, which will make the platform a great batteries-included option for building local-first apps. In this post, we’ll break down the sync engine's components, discuss the prior art with LunaDB¹, Linear, and Replicache, and outline what we’re building at Convex.

A common niche: The object sync engine

An object sync engine syncs an object graph between clients and a centralized server. Each object is a piece of application metadata, so it's small but has a lot of internal structure. For example, tasks in Linear have a title, description, status, pointer to a current project, optional pointer to an assignee, and so on.

The application stores this object graph in a local store, and the app's UI directly reads and writes to the local store as the user interacts with it. All three of the apps we studied authoritatively store the object graph in a centralized server, and sync engine handles keeping the two in sync.

These two stores often differ, with the local schema diverging from the authoritative server schema. Developers implement each logical mutation, or change to the data model, twice: once for the authoritative server change and once for an optimistic update to the local store. After the server applies its changes, it syncs them down to the client in the same format as the local schema.

Let's start by analyzing this niche across the nine dimensions from “A Map of Sync.”

Dimension	Choice	Rationale
Size	~100MB	Interactive UIs need to do local data reads and writes at “memory-speed.” 100MB is a reasonable amount of memory to use for a rich web app, and supporting an order of magnitude higher would require spilling from memory to secondary storage.
Update rate	1Hz	Rich object graphs often don’t need to change at interactive (e.g. 60Hz) update rates. Figma, for example, uses LiveGraph for these slower updates and Multiplayer for the higher update rate parts of their documents.
Structure	High	General purpose frameworks need to support rich object graphs with sophisticated cross-object relationships.
Input latency	500ms	Collaborative apps are, by definition, not competitive, so 500ms of input latency is often good enough.
Offline support	High	The local-first ideal of making the network optional makes it easy to make apps with great UX. However, general frameworks, like Asana's LunaDB, provide escape hatches for operations like full text search that can only execute on the server.
Concurrent clients	Unlimited	It should be possible (and affordable) for an app to become successful and have many clients interacting with the data model at any point in time.
Centralization	Server-authority	We believe that it’s much easier to build apps with great UX and DX with centralized infrastructure. In our opinion, anything less than serializability is way too hard of a programming model for most apps, and serializability often implies some degree of centralization.
Flexibility	High	Application logic is often sophisticated in this category, so the sync engine can't assume too much about the products built on top.
Consistency	High	Similarly, frameworks can’t assume too much about what types of anomalies a product can tolerate, so strong consistency guarantees simplify app development.

The three main pieces

Every sync engine with a centralized server has three main pieces: a local store, a server store, and a sync protocol for connecting the two. We'll look at how Replicache and LunaDB implement each of these pieces and then talk about what we're working on at Convex.

Local store

One of the key ideas of local-first is “No spinners”: apps should be able to read and write to their data without ever blocking on the network. Put another way, “your data next frame or your money back.” Since data fetching in our app can’t ask the server, it has to ask some local data store.

In addition to “No spinners,” local-first apps should work offline and not require the network. In practice, offline support implies persistence, where a user can close the app and reopen it, and the app’s data will still be present. On the web, IndexedDB is a common choice for persistence, but there’s also some exciting recent work for utilizing the newer OPFS APIs. Then, since persistent data lasts across user sessions, local stores need to handle versioning, where data is written by one code version and read by another.

Local stores often need to handle concurrency, where multiple threads within the local application want to read and write to the local store at the same time. Web apps have to handle concurrency too: multiple tabs on the same origin can share access to the same IndexedDB and OPFS instances.

Prior art

Both LunaDB and Replicache use IndexedDB for their local store. Replicache uses BroadcastChannel for coordinating updates across multiple tabs for the same origin, and they have a lightweight multiversioning scheme for handling tabs that are on different versions of the app.

Convex

We’ll also use IndexedDB for our first release. We eventually want our sync engine to work across Web, mobile, and desktop, so we’ll probably switch to some form of SQLite in the future.

Server store

Centralized sync apps store the authoritative copy of their data in a server store. Apps often do this for a few reasons:

Ease of use: It’s often a lot simpler to maintain data invariants on a single server at a single version (or a fleet of servers at mostly the same version). Reasoning about large sets of clients on different versions where the developer can’t force them to update is hard.
Durability: We’ve mostly figured out how to store data in centralized data centers and never² lose it. If we store data in one of these systems, we can tell users that their data is safe, even if they lose their local device.
Storage: While consumer devices’ storage has gotten bigger and faster over time, datacenter computers have virtually unlimited spinning disk storage and cheap, fast, and reliable SSD storage. For example, just the task data for large Asana deployments can exceed clients’ local storage capacity.
Compute: Data center computers can also run compute that developers may not want to run on consumer devices. This could be just for battery consumption, but jobs like search indexing and AI workloads often require specialized hardware.
Networking: Unfortunately, the public Internet isn’t all that reliable or fast. Low-latency apps often use a centrally managed network, only using the public Internet for the last mile to their users.

In addition to storing data, most sync applications implement some form of notifications to push updates to interested clients.

Prior art: LunaDB, Linear

LunaDB uses a custom application server and stores its data in MySQL. A custom invalidator service tails the replication log and streams invalidations to the relevant sync servers to then broadcast to clients.

https://asana.com/inside-asana/worldstore-distributed-caching-reactivity-part-1

Linear has a similar architecture where the client queries a GraphQL server and keeps its local store up-to-date with a separate sync server that tails the Postgres replication log.

Prior art: Replicache

Replicache is backend-agnostic, so you bring your own server store. Their todo-nextjs demo runs the backend APIs on Vercel, stores the data in Supabase, and uses Supabase Realtime for pushing updates to clients. Under the hood, Supabase stores its data in Postgres and also tails Postgres’s logical replication log for realtime.

Convex

Unlike Postgres, Convex was designed from the beginning as a reactive database, so it has first-class support for executing a query and efficiently subscribing to its changes. Reads and writes to the database are specified as JavaScript functions that execute as serializable transactions.

https://stack.convex.dev/how-convex-works

Sync protocol

Sync platforms glue clients’ local stores and the server store together with a sync protocol. The protocol has a few responsibilities:

Initial sync: How does the client efficiently download the sync protocol’s data set on an initial page load?
Incremental sync: How does the client efficiently keep its local store up-to-date as the server store changes? How does the client efficiently catch back up after it’s been offline for a long period of time?
Mutations: How do changes to the local store propagate to the server? How do clients know when to apply and rollback optimistic updates?

Prior art: LunaDB

LunaDB’s sync engine is based off DDP, Meteor’s sync protocol. The client registers interest in “subscriptions,” where each subscription is associated with a query that returns a set of objects. The server deduplicates object updates across subscriptions and pushes updates to clients over a WebSocket.

Prior art: Replicache

Replicache’s sync protocol is just three methods:

/pull: Download data from the server store, optionally passing in a cookie to only download changes since a previous request. The server also informs the client which mutations have been applied on the server for each client, so the client can know when to apply optimistic updates.
/push: Submit a batch of client mutations to the server.
/poke: Poke the client to signal a server-side change.

Convex

Convex’s WebSocket protocol syncs a set of queries and coordinates sending mutations from the client to the server. We’ve talked about it briefly in How Convex Works.

We have a few protocol improvements teed up for improving offline sync:

Finer-grained incremental updates: Queries currently fully reexecute their JavaScript when someone writes to a row they read. Add more opportunities for incrementally updating the query result without rerunning JavaScript and pushing smaller deltas to the client.
Client ID allocation: Clients should be able to allocate IDs for new documents without having to ask the server.
Subscription resumption: The client should be able to efficiently resume a subscription after it goes offline and reconnects.
Query chaining: Clients currently experience a waterfall when they have one query that’s dependent on the other. We'll eventually add a way for one query to be "chained" to another, but for now we currently recommend developers preload their queries with SSR.

Programming the pieces

Apps all have different requirements, so frameworks give developers opportunities for programmability. Each of the three pieces of a sync engine needs to be programmable, and different frameworks provide different levels of flexibility.

Local schema

Developers “program” the sync protocol by specifying a local schema: What is the shape of the data synced by the protocol, and what are the interfaces for reading and writing it?

All of the systems we studied for this design allow the local schema to diverge from the server’s schema. Most apps don’t want to sync the entire server database to the client. It’s typically too much state, requires error-prone access controls for protection, and leaks implementation details that might not be relevant to the client.

Prior art: LunaDB

LunaDB uses a query language that’s similar to GraphQL, so developers “program” the protocol by specifying a GraphQL schema.

1query TaskList($projectId: ID) {
2  project(id: $projectId) {
3    name
4    numTasksDone
5    totalTasks
6    tasks {
7      name
8      status
9      assignee {
10        name
11        profilePictureUrl
12      }
13    }
14  }
15}
16

If the client and server agree on this file, they know they’re referring to the same types for the data and the queries to access it.

Prior Art: Replicache

Replicache doesn’t have direct support for embedding an interface description into the protocol, but it does encourage achieving similar results through code sharing. Their TODO example app uses shared Zod validators for this purpose:

1import { z } from 'zod';
2
3export const todoSchema = z.object({
4  id: z.string(),
5  listID: z.string(),
6  text: z.string(),
7  completed: z.boolean(),
8  sort: z.number(),
9});
10

Convex

Convex's approach is somewhere between Replicache and LunaDB: Developers specify their schema in code, and the framework consumes the local schema for runtime validation.

1const localSchema = defineLocalSchema({
2  users: defineLocalTable({ _id: v.id("users"), name: v.string() })  
3  friendships: defineLocalTable({ 
4    _id: v.id("friendships"),
5    from: v.id("users"),
6    to: v.id("users"),
7  })
8    .index("by_from_to", ["from", "to"]);    
9});
10

We've designed this local schema syntax to feel familiar to developers who are used to Convex's server-side schema.

Local store APIs

On the client side, UI components need an interface for reading and writing to the local store. More powerful sync engines, like Replicache, provide programmability through transactions, where developers can inject code “into” the local database for expressing their application’s semantics.

Prior art: Replicache

All reads to the local store go through subscriptions, which specify a read transaction through an async callback.

1rep.subscribe(
2  async (tx) => (await tx.get("count")) ?? 0,
3  (count) => {
4    console.log("onData", count);
5    button.textContent = `Clicked ${count} times`;
6  },
7 };
8

The developers specifies all mutations upfront when instantiating the Replicache client class. Each mutation is an async callback that the sync engine executes transactionally.

1const rep = new Replicache({
2  ...
3  mutators: {
4    increment: async (tx, delta) {
5      const prev = await tx.get("count");
6      const next = (prev ?? 0) + delta;
7      await tx.set("count", next)
8      return next
9    },  
10  },  
11};
12

Prior art: LunaDB

Most object sync engines assume that the app loads the entirety of its dataset on initial page load, so all local queries can be served from the local store and don't need to block on the network. Replicache, configured with most of its backend strategies, takes this approach.

Linear's sync engine and Zero, Replicache's successor, support loading a dynamic data set, where local queries may access data that's not in the local store. In that case, a local query may block on the network as the framework fetches its data from the server.

Going back to blocking on the server isn't a great user experience, so these frameworks offer APIs to "preload" a query into the local store on initial page load. Then, if this preloaded query covers most UI components' queries, the app will mostly not need to hit the server.

Preloading exactly the right amount of data is hard, since it's a global property based on all of the possible UI views within the app. Preloading too little will cause spinners and waterfalls, where preloading too much wastes storage and network bandwidth.

LunaDB solves this problem by using GraphQL. Instead of specifying local queries with async callbacks (which are opaque to the sync engine), UI components specify their data dependencies using GraphQL fragments. This approach is similar to Facebook's library Relay:

1const TaskFragment = graphql`
2  fragment TaskFragment on Task {
3    name
4    status
5  }
6`;
7

The framework can then assemble all UI components' fragments into a single GraphQL query and preload exactly the required data in a single network roundtrip.

Mutations in LunaDB combine an authoritative server endpoint with an optimistic update callback.

1const changeTaskStatus = (datastore, taskId: Id<"tasks">, status: Status) => {
2  datastore.requestServerChange({
3    path: "/tasks/change_status", // server endpoint name
4    params: { id: taskId }
5  }, () => {
6    // Update the "status" field in the local store. The framework will
7    // update all GraphQL queries that access this task and rollback
8    // the update when the server mutation is fully applied.
9    datastore.setProperty(taskId, "status", status)
10		
11    // Optimistic updates can also modify "server computed values"
12    // that aren't directly computed from the local store.
13    if (status === "done") {
14      datastore.updateServerComputedValue(projectId, "numTasksDone",
15        (n) => n + 1,
16      );
17    }
18  }
19})
20

Convex

Convex’s local queries look just like queries on the server. Instead of using ctx.db to access your server-side tables, you can use ctx.localDb on the client to access your local tables.

1const isFriend = useLocalQuery(async (ctx) => {
2  const friendship = await ctx.localDb.query("friendships")
3    .withIndex("by_from_to", q => q.eq("from", user).eq("to", friend))
4    .unique();
5  return friendship !== null;
6});    
7

Mutations are similar to our existing optimistic updates but modify the local store rather than the query set. When declaring a mutation, the developer points to a server-side mutation for making the changes authoritatively and optionally specifies a local update to happen optimistically.

1const addFriend = localMutation({
2  // `api.friends.addFriend` is an equivalent function on the server
3  // that may do more work for checking access control, etc.
4  server: api.friends.addFriend,
5  // This callback's writes to the local store are rolled back once
6  // the authoritative changes from `api.friends.addFriend` have 
7  // been synced down.
8  local: async (ctx, args) => {    
9    await ctx.localDb.insert("friendships", { 
10      _id: args._id, 
11      from: args.from, 
12      to: args.to,
13    });
14  },
15)};  
16

In the language of “Architectures for Central Server Collaboration,” Convex sends mutations from the client to the server that are CRDT-ish³. Then, the server sends state changes down to the client and uses Server Reconciliation for applying optimistic updates.

Server store APIs

We’ve specified the local store’s schema and the APIs for reading and writing to the local store, and the final step is to specify how reads and writes to the local store correspond to reads and writes to the server store.

Prior art: Replicache

Replicache gives developers a few “Backend Strategies” for implementing their sync protocol on the backend. In their most sophisticated strategy, the backend provides a version number per object synced and can efficiently query which revisions have changed after a given cookie.

Convex

Server mutations in Convex are simple: they’re just regular Convex mutations. The developer can implement these differently than their local store’s optimistic updates: the server mutations may perform more authorization checks, for example.

The read side, however, is more interesting. We want sync tables to be highly programmable, where a table in the local schema may not directly correspond to a table stored on the server:

Developers may want to strip out fields from the underlying database rows or transform them in some arbitrary way with code.
They may need to enforce authorization rules, and ideally these rules are written in plain code.
A single row in sync table may actually be the result of a join between two tables on the server or union of multiple tables.

We achieve this by letting developers specify sync tables with two APIs: how does the framework fetch a single row from the local table, and how does the framework query a range on one of the local table's indexes?

1// This code runs on the backend and tells the framework how to turn
2// local queries against the local schema into server queries.
3export default localSchemaQueries(localSchema, {
4  users: {
5    get: async (ctx, _id) => {
6      // Get a single user, enforcing access control and perhaps
7      // joining in data from other tables.
8      const row = await ctx.db.query(_id);
9      ...
10      return row; 
11    },
12  },
13  friendships: {
14    get: async (ctx, _id) => { ... },
15    indexes: {
16      by_from_to: async function* (ctx, args, cursor, direction) => {
17        // Read a page of friendships from the underlying
18        // server table.
19        const q = ctx.db.query("friendships")
20          .withIndex(..);
21        for await (const row of q) {
22          yield row._id;
23        }  
24      },
25    },
26  }
27});
28

With this design, developers can join their local schema to their server tables with arbitrary code, specifying authorization rules in JavaScript and joining together and filtering the underlying server tables.

Wrapping up

Convex's object sync engine aims to let you fully program the local store, protocol, and remote store. You can get the best of both worlds: great local-first UX and server programmability. We're hard at work building it out, and we hope to have it in your hands soon! Let us know in Discord or email us if you’re interested in beta testing it.

Footnotes

We have deep experience with LunaDB since Convex engineer Sarah Shader worked on it when she was previously at Asana. ↩
AWS S3 provides 99.999999999% durability, which means it'll pretty much never lose your data. ↩
Each operation must make sense as a self-contained unit, and it should be resilient to executing against a different view of the local or server store. ↩