2 years ago

Intro to Migrations

There are as many ways to migrate data as there are databases, but here’s some basic information to set the stage.

What is a migration?

Migrations are the process of changing the shape of data in your database. They often are comprised of schema changes and data changes.

The schema changes capture the structure of your tables, i.e. the field names and types. With other databases, you need to explicitly tell it to add or change fields (columns). For Convex, you do not need to explain what changed, you merely change schema.ts into your desired shape.

The data changes are how to change your data from the existing schema into the desired one. If you’re adding a new optional field, it will be unset by default. However, if you are changing a type or removing a field, you’ll need to change the data to match your schema. In Convex, these take the form of writing a mutation script. See this post for tips.

When does a migration run?

Offline migrations

For many businesses, especially those using SQL-based databases, the easiest option is to stop serving traffic, run the migration, and then start the code that references the new schema. This explains banners you see on websites alerting you to upcoming scheduled downtime. This is called an “offline” migration since the application is not serving traffic during the migration. With some advanced techniques, you can minimize the downtime. However, it still has some risks:

If your new code has bugs, it is difficult to roll back to the previous database schema without losing data. If your old code doesn’t know about your new fields, you are left scrambling to patch your new code.
If your migration fails partway through, you have to resolve the migration before the new code can be deployed. Deploying code that changes the database schema becomes a high-stress operation.

To do this with Convex, you could:

Define a mutation that changes a batch of your data.
Configure your code to stop serving requests during the migration. This could happen via a code deployment, or by flipping a switch as discussed in a previous post on feature gating.
As quickly as possible, stop serving requests, run the migration, then deploy the code that references the new schema. See here for how to migrate batches of data using Convex mutations.

Online migrations

“Online” migrations, by comparison, allow the database to continue serving requests while you asynchronously update the data. This is preferable to avoid downtime but comes with added complexity to the developer.

You may query documents where only some of the documents have been migrated. You have to implement code to also handle the old schema in the interim.
For SQL migrations using deferred index creation, you can’t deploy code using newly-defined indexes until they have been created, requiring two deploys instead of just one.
Any rules or constraints you define may become more difficult to enforce and reason about in the interim. See this post for an example of subtle gotchas while trying to implement a seemingly simple rule in Postgres.

While this seems daunting, this becomes a requirement for large systems where downtime is not acceptable. In fact, once a database gets large enough, many migrations can only be run asynchronously. I worked for a company with over 100M users where a deployment got stuck because a migration that had been accidentally configured to run synchronously kept timing out in a loop. Because of the company policy of never rolling back migrations, we ended up incurring downtime, increasing the timeout, and failing a lot of traffic so the migration could complete. While synchronous/offline/all-at-once migrations are convenient when you’re starting a project, I’d encourage you to use online migrations whenever possible.

So how do we go about them safely?

Best practices for changing data asynchronously

To manage these challenges, there are some best practices and patterns you can adopt to make it easier. Thankfully, when schema validation is enabled, Convex guides you to do the right thing.

Create new columns or fields, rather than changing their type. If you rename or change a column’s type, you have to change all of your data between when your old and new code can execute safely.

Convex will not let you change the type to something that doesn’t conform to the data in production. Instead, you’ll need to do the (safe) transition of changing the data to a v.union of the old and new types. Once the data is all the new type, you’ll be able to change the schema to only be the new type. If you’re using typescript, this will help you write code to support both formats during the migration period, as the types will automatically be generated as a union.
Don’t delete data. Unless you’re really strapped for space, don’t risk losing critical data by dropping columns (or fields, in document databases). It sounds obvious, but it’s all too tempting to drop deprecated columns. Instead, mark the column as deprecated in code until the data has been unreferenced for a while and you’ve ensured the data is redundant.

Convex will not let you remove a field from a schema if that field still has data in the database. You can mark a column as deprecated by using v.optional to allow the field to be unset, and add a comment in the schema declaration to help other developers understand why that field existed and encourage them to not use it. If you do want to delete data, you can set it to optional, run a migration to remove the field, then remove the field from the table.
While the migration is happening, handle both the new and old data formats. Deciding whether to read or write the old or new format is discussed below.

When using TypeScript with Convex, the types will conform to the data in the database, as they are generated from the schema, which will be rejected during deployment if it doesn’t conform to the data.
When possible, push changes to the schema separately from changes to the code. By pushing a change to allow an optional new field in the schema before adding code to write or rely on the new field, you will be able to roll back or revert the new code in case of a bug, knowing your old code and schema accommodate the optional field.
Once the migration is done, clean up the code to only reference the new format. Keep in mind, the migration isn’t done until there are no more readers or writers of the old format and every row has been mutated.

Dual read vs. dual write

To safely handle the intermediate data formats, you can dual-read or dual-write. In both cases, the migration requires all clients to be writing the new format and all old documents to have been processed. An optional step (0) here is to first deploy a schema that combines the old and new formats, with as little code changed as possible, to give you a safe point to roll back to, in case something goes wrong.

Dual writes (preferred):

Deploy code that starts writing both formats, but still reads the old format.
Migrate the data asynchronously.
Update the code to read the new format, while still writing the old format.
Update the code to only read and write the new format.

This is preferable because you can gracefully roll back to older code if the code for the new format has errors.

Dual reads:

Deploy code that reads both formats, preferring the new format if present.
Deploy code that only writes the new format.
Migrate the data asynchronously.
Update the code to only read the new format.

I’d use this approach if I don’t want to write into two locations, such as when I’m worried that having two copies of the data may lead to inconsistencies and want to ensure there’s only a single source of truth. Another upside to this approach is that, if the migration requires writing to a new table, you temporarily double your reads versus temporarily doubling your writes, which can be more expensive if the data isn’t all in one document.

However, this approach makes it hard to roll back from stage (2) to (1) if there is a bug related to using the new format. In phase (1) you deployed code that didn’t exercise the new format branch until the new code was written.

Why not both?

You can actually do both of these strategies at the same time:

Deploy code that writes both formats and reads both formats.
Migrate the data asynchronously.
Update the code to only read and write the new format.

This is more work and code branches to maintain, but it allows you to start reading the new format for newly updated rows before doing the async migration, while still having the fallback of the old format.

Summary

We covered migrations at a high level and walked through some best practices for doing them. To see how to implement migrations in Convex, check out this post.

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started