a month ago

Migrating data from Postgres to Convex

This guide will help you migrate data to Convex from Postgres on any provider (Supabase, Neon, RDS, etc).

Exporting your Postgres data

Dumping to a file using Psql

If you have relatively small amounts of data (less than a few gigabytes), you can likely just dump your data to a file from psql and import it into Convex.

First, use psql to connect to your postgres instance. To dump the table people run a command like this one. It will dump in the JSONL format.

1\copy ( SELECT row_to_json(people) FROM people ) TO '/path/to/people.jsonl';
2

Adding it to Convex

If you already have a Convex project, simply cd into the directory of that project. If you don’t have one yet, you can make a new project with.

1npm create convex@latest
2

Import the data

Once you are in the project directory, you can import your jsonl file into your convex deployment

1cd /my/convex/project
2npx convex import --format jsonLines --replace --table people /path/to/people.jsonl
3

Take a look at your new data in the dashboard

1npx convex dashboard
2

Define your schema

You can add a Convex schema to your project following guides here https://docs.convex.dev/database/schemas#writing-schemas.

Tip: there is a button in the dashboard to generate this automatically based on your data.

It might end up looking like this:

1import { defineSchema, defineTable } from "convex/server";
2import { v } from "convex/values";
3
4export default defineSchema({
5  people: defineTable({
6    id: v.string(),
7    name: v.string(),
8    pseudonym: v.string(),
9  }),
10});
11

Your favorite LLMs (eg Cursor) can be rather helpful here to generate a schema. You can dump the postgres schema from psql with \d people and feed that to a prompt to generate the Convex schema.

Reconnect your table relationships

If you have a field referring to documents in other tables, the convex-ic way to do that is to use v.id("people") as the field. Note: this field can’t be any arbitrary string. It must be the person._id value.

You’re welcome to continue using your other ID with an index on that field, but you will likely find benefit from using v.id, for instance runtime validators that ensure the IDs point to the correct tables.

Example schema:

Migrating relationships to Convex IDs generally looks like:

Import the data with existing primary keys (e.g. an id field)
Add an index on that field to look it up by id.

Update relationship fields (e.g. a authorId: v.string() field) to be v.union(v.string(), v.id("users")). At this point your schema looks like:

1import { defineSchema, defineTable } from "convex/server";
2import { v } from "convex/values";
3
4export default defineSchema({
5  people: defineTable({
6    id: v.string(),
7    name: v.string(),
8    pseudonym: v.string(),
9  }).index("id", ["id"]),
10  blogPosts: defineTable({
11    authorId: v.union(v.string(), v.id("people")),
12    content: v.string(),
13  }),
14});
15

Write a function to get the document by either id:

1import { QueryCtx } from "./_generated/server";
2import { Id } from "./_generated/dataModel";
3
4async function getPersonById(ctx: QueryCtx, id: string | Id<"people">) {
5  const convexId = ctx.db.normalizeId("people", id);
6  if (convexId !== null) {
7    return ctx.db.get(convexId);
8  } else {
9    return ctx.db
10      .query("people")
11      .withIndex("id", (q) => q.eq("id", id))
12      .unique();
13  }
14}
15

Write a migration to walk the table and update the authorId references:

1export const updateAuthorId = migrations.define({
2  table: "blogPosts",
3  migrateOne: async (ctx, doc) => {
4    const author = await getPersonById(ctx, doc.authorId);
5    if (!author) throw new Error("Can't find author");
6    if (author._id !== doc.authorId) {
7      await ctx.db.patch(doc._id, { authorId: author._id });
8    }
9  },
10});
11

See the migration docs for setting up and running migrations.

Update your schema to use the v.id validator:
```
1blogPosts: defineTable({
2  authorId: v.id("people"),
3  content: v.string(),
4}),
5
```
Note: thanks to schema validation, this will only successfully deploy if all blog posts have an authorId that is a valid Convex ID. Unless you disable schema validation, the deployed Convex schema is guaranteed to match the data at rest!

Alternative: Streaming import using Airbyte

If you have a larger amount of data, or want to continuously stream data, you can try out the Airbyte integration. Check out more details https://www.convex.dev/can-do/airbyte

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started