9 months ago

What is Sync?

In 2013, I was lucky enough to join the team at Dropbox. The energy was high, my teammates impressed me daily, and we worked on deeply interesting technical challenges together. But best of all, I learned a ton about the magic behind one of my favorite products of all time.

I was a very early Dropbox user (#864, to be precise) and a massive fan of the product. Dropbox took an activity that used to be an error-prone chore—copying files around to different computers and people—and made it completely disappear.

In founder Drew Houston’s original demo video on YouTube, the essential claim lands in the first ten seconds:

What makes Dropbox different is that it just works.

Over seven incredible years at Dropbox, my eventual Convex co-founders and I dove deep into how to make a massive distributed state management system “just work.”

“It Just Works” - Convex co-founder/CTO James Cowling in 2014, striving to keep the promises of Dropbox

We rebuilt how everything is stored on the backend and how everything is managed on users’ devices.

It ended up that the manifestation of “just works” was a million tiny decisions adding up to one particular promise: sync.

That checkmark has a power that’s hard to overstate. It basically says, “You can relax. Everything is where it needs to be. You can get back to working on your stuff.”

Sync allows the user to keep working on whatever is important to them. Hint: it’s not emailing files around with names like “Pitch (version 6).doc” or “Pitch (version 7).doc”.

Dropbox solved the sync problem completely, but only for one particular kind of data set—filesystems. So, in 2021, we took everything we learned from building sync at Dropbox and set out to build a platform that enabled developers to sync any kind of application, at any scale.

Now, after building Convex and watching other new sync systems develop, it’s time to propose a more solid definition of what sync means for application development–what it takes to be a sync platform.

Sync Platform Checklist

There are many ways to implement sync, but these are the essential invariants:

1. Integrated realtime state management

A sync platform needs to be tightly integrated with the application's state model. This ensures backend synchronization feels seamless and intuitive, enabling developers to concentrate primarily on building the app instead of adapting one state paradigm to the other.

When data changes on the backend, updates must be propagated to the application automatically and as soon as possible.

Counterexample: REST endpoints are not integrated or reactive. The results from a REST query are not automatically mapped to data stored in the app, and manual polling is required to keep them up to date.

2. Network and transport abstraction

Networks and individual requests do fail periodically. Sync platforms handle retries internally, ensure that mutations on write paths happen exactly once, and maintain their own cursoring strategy on read paths.

Counterexample: POST requests using fetch must handle network failures and reason about the idempotency of the backend controller to know if it safe or appropriate to retry.

3. Robust conflict and consistency handling

State management systems that lose data are not very useful. Sync is more than just bidirectional streaming. Sync platforms must provide a conflict model that allows the reconciliation of asynchronous changes without data loss. There’s no one right way to solve this: ACID, CRDTs, blockchain, operational transform, and other proven merge strategies are all excellent.

In addition, the sync protocol should never undermine the database's level of atomicity. If two writes are committed atomically, the application should observe all affected reads updating atomically as well.

Counterexample: “Last write wins” loses unbounded amounts of data when clients are offline for extended periods of time and so isn’t useful in practice for building serious systems. Last write wins isn’t really a conflict management strategy; it’s the lack of one.

4. End-to-end cache management

Sync platforms should be sufficiently performant to use in practice. If they utilize caching, they should leverage the rich information they have about consistency, freshness, and their specific data model to do so automatically. If an application developer needs to add their own caching layer that can introduce data staleness or inconsistencies, or if they have to manually invalidate the data cached by the platform, then the sync platform is no longer inherently sound.

Counterexample: Next 14 edge caching is tremendously complex and requires the app developer to assist in cache invalidation strategies.

Pressure test with Dropbox

Let’s exercise this model by mapping these requirements to the original, domain-specific sync solution, Dropbox.

1. Integrated realtime state management

Dropbox’s state model is… the filesystem! Users simply manipulate files as usual. Server-provided updates to those files appear directly in your directories without any explicit action needed.

2. Network and transport abstraction

Dropbox uses the network when it’s there to propagate changes. When the network is not there, it just waits until it is.

3. Robust conflict and consistency handling

Dropbox is extremely careful to avoid overwriting any file data despite asynchronous, potentially offline clients. If changes are made to file formats that cannot be automatically merged, a “conflicted copy” will appear in your filesystem so you can manually resolve the conflict.

4. End-to-end cache management

The local filesystem is the cache, and Dropbox uses this plus a local database to transfer only new changes from the server. A file opens immediately with low latency since it is local. There is no need for the user to understand how this happens.

Now: application state sync platforms

With the base case out of the way, let’s test these requirements against a few of the new platforms designed for application developers to achieve sync semantics.

While the original Firebase had some gaps that disqualify it from being a sync platform—namely, that it didn’t support anything more sophisticated than last-write-wins for conflict management—the modern Google-powered Firestore is much improved and is now a solid sync platform.

1. Integrated realtime state management

Firestore provides libraries that integrate with application state management across mobile platforms and web—specifically, Angular. Firestore provides realtime capabilities to propagate changes through subscription streams as soon as possible.

2. Network and transport abstraction

Firestore’s transactions are optimistic, deterministic functions that are automatically retried upon transport failure. Subscriptions resume when the network breaks and reconnects.

Firestore supports offline persistence to allow disconnected operation for extended periods of network loss.

3. Robust conflict and consistency handling

Firestore supports multi-document transactions with serializable isolation—a solid building block for robust state management.

4. End-to-end cache management

The combination of realtime updates and offline can provide automatic caching for Firestore apps. Users do not need to manually invalidate cached values.

Replicache has been around for a while. It’s a fantastic, simple way to bring sync to web apps written in JavaScript/TypeScript. The Replicache team has a new project coming soon, Zero. It seems even more ambitious than Replicache. They have a long history with sync technologies (noms/dolt, etc.), so they know their stuff.

1. Integrated realtime state management

Replicache models backend state as regular old JavaScript/TypeScript objects. You provide mutators (functions) that manipulate those objects. Replicache has a replicache-react package that exposes useSubscribe, a hook that nicely integrates streaming updates with React components.

2. Network and transport abstraction

Replicache’s design makes it so there is no reason for the app developer to care about ephemeral network failures or even extended network offline. The developer has to write up a little more code than with other frameworks to achieve this (via pull and poke), but this has the advantage of being very flexible and making it so Replicache can work with familiar tools and systems.

3. Robust conflict and consistency handling

By having the developer provide a mutator that can handle arbitrary conflicts and a protocol that preserves mutation ordering, Replicache provides Causal+ ordering semantics. Replicache’s mutator-sync strategy is flexible enough to even propagate atomic groups of changes to related objects consistently.

(Props to the Replicache team for urging users to fix Postgres' consistency level! Every developer should know more about how broken Postgres is out of the box…)

4. End-to-end cache management

Replicache maintains a local database of object states and histories that acts as the cache. The developer does not need to do anything!

Convex is a new, full-featured sync platform similar in breadth and ambition to Firestore. It uses serializable, opportunistic TypeScript functions on top of a relational database as the building blocks of sync.

1. Integrated realtime state management

Convex provides client libraries like convex/react (and equivalents for Vue, Svelte, Kotlin/Android, iOS/Swift, etc.) that plumb mutations and queries directly into the state management idioms of those frameworks. All types (app/server/database) are identical end-to-end and expressed in TypeScript, the language used most commonly by modern applications.

Convex pervasively tracks index read sets in its deterministic query/mutation functions to make every TypeScript query function realtime.

2. Network and transport abstraction

Convex leverages the determinism in its mutations to enqueue and automatically replay mutations despite network failures—just like Firestore. In this way, Convex ensures each change is applied exactly once without any intervention from the app developer. Query updates resume streaming values when the network reconnects—cursor management is automatic and requires no user involvement. All state updates are serialized.

Convex provides applications with optimistic update support and integrates with systems like Replicache for offline sync and local-first use cases.

3. Robust conflict and consistency handling

Convex’s foundation provides a fully serializable consistency model. Developers can opt-in to looser semantic models with lower guarantees (but easier offline ergonomics) like CRDTs and operational transform when the tradeoffs make sense.

Convex always provides streaming updates to queries in consistent transactional windows to applications. The React library, for example, ensures all updated components are re-rendered in a single pass, so there is never any UI inconsistency or local state inconsistency.

4. End-to-end cache management

Convex leverages its deterministic queries to ensure that every specific combination of (query code, parameters, and database read set) executes only once for all online clients. Automatic caching is provided in both server and client layers. Application developers do not have to participate in invalidation routines.

Broad and varied definitions of sync

These three examples are just scratching the surface. There are many other exciting platforms and libraries out there for developers who want to use sync patterns, like jazz.tools, powersync, yjs, automerge, and triplit.

Many of these feature different design choices that have tradeoffs that make them excellent tools for specific kinds of use cases and worse for others.

Here are a few of the common design variations:

Consistency, causality, and isolation

Some support extremely strong consistency levels, like serializability and ACID. These have the advantage of being able to build any kind of application.

Others use weaker models like CRDTs or operational transformation. These work in fewer applications, but they require less code to use and are “natively offline.”

Sync engine only vs. full sync platform

Some projects are simply a sync engine: libraries that implement core sync algorithms for client and server. You integrate these in your existing HTTP endpoints, wrap your favorite database, etc. Sync engines are easy to adopt incrementally and drop into an existing project, and you can continue using a foundation of trusted tools.

Other projects are specialized platforms with custom WebSocket protocols, custom databases, authz/authn, file storage, asynchronous workflow, and search. These batteries-included platforms can let your team build and ship very quickly, but they come with more risk around immaturity and lock-in. Additionally, their novel architecture can make them mostly suitable for greenfield projects.

Centrality vs. Decentralized

Some systems are designed to fully reconcile peers in a mesh without a centralized server. This allows them to work in entirely distributed ways without any trusted mediating authority, converging on a common state over time as peers exchange mutations. This is pretty amazing (and sometimes necessary), but the tradeoff is applications using these systems are often more complex, less efficient, and can only provide loose consistency models.

Other platforms rely upon a centralized server to coordinate the network’s shared state. These are simpler protocols, but there is sometimes a divide between their expressiveness when online vs. offline, and they rely upon some durable, trusted entity to remain operating.

What about local-first?

In our view, “local-first” is more of a way of working than a specific technology, and sync systems happen to be well-designed to achieve it. Sync-centric designs more easily support offline caches of optimistic changes than traditional request/response client/server models.

Watch this space

In many respects, many of the sync systems and platforms are still developing and maturing. Developers’ understanding of what is applicable to use in what circumstances is also still forming—the “best practices” are not codified.

However, sync-esque solutions are one of the most promising avenues into a future where distributed state management is significantly less onerous for application developers. Fewer headaches, more focus on your differentiated product—and more fun!

My prediction: within ten years, most applications will be built on a variation of a sync platform rather than traditional server endpoints and state management.