Mike Cann's avatar
Mike Cann
4 days ago

Building Convex OS, a Browser-Based React App with Real-Time Sync

A few weeks ago I built Convex OS, a Windows XP desktop styled React app that runs entirely in the browser and stores almost all of its state in Convex. Every window position, running process, and file you drag onto the desktop lives in the database, which means opening a second tab gives you the same desktop, and moving a window in one tab moves it in the other.

I had seen a wave of web-desktop interfaces showing up online and wanted to try the pattern myself. The interesting part is that an OS metaphor is mostly an exercise in state management, and Convex's reactivity model makes multi-tab continuity something you get for free rather than something you have to engineer. The full source is public if you want to read along, and most of what follows is easier to understand with the repo open in another tab.

What Is a Browser-Based Operating System

A browser-based operating system is a desktop-style UI rendered inside a web browser, where windows, files, and application state are managed by web technologies rather than a native OS kernel. It looks and behaves like a desktop environment, but the "kernel" is JavaScript and the "filesystem" is whatever backend you wire up.

The category splits into roughly four flavors:

  1. Cloud-backed thin clients that stream a remote desktop to the browser
  2. In-browser emulators that boot a real OS inside WebAssembly
  3. Web-desktop UI layers that simulate the desktop metaphor with web primitives
  4. Remote-streaming containers that pipe a containerized desktop session over the network.

Convex OS sits in the third category, a web-desktop UI layer with a real-time backend doing the state work. I'm not emulating an x86 chip or booting a Linux kernel in WASM; I'm building a desktop UI whose state happens to be reactive across tabs, a different exercise with a different set of trade-offs. The hard parts aren't graphics or kernel emulation but how you model windows and processes as rows in a database, and how that database keeps every open tab in agreement about what is on the screen.

The distinction matters because the techniques you reach for are different. If you're emulating a real OS, your time goes into compatibility layers and performance work. If you're building a web-desktop UI layer, your time goes into schema design, sync, and UI ergonomics. I'm firmly on the second path.

Why I Built Convex OS

I had been seeing web-desktop homepages crop up in my feeds and wanted to try the pattern. Windows XP felt like the right aesthetic because the look is well-supported by existing CSS libraries, and because, honestly, the nostalgia helps when you're showing the project to other developers.

From a Convex perspective, the part that drew me in was that nearly all of the UI state belongs in the database. Windows have positions, files have upload progress, and processes have running state. If those things live in Convex, then multi-tab sync isn't a feature I have to build; it's a property of the system. That inversion, where the thing that would normally be the hardest engineering problem becomes the thing you get for free, is what made the project worth doing.

There is also a smaller motivation worth naming. I wanted to see how far the "everything is a row" approach could be pushed before it strained. An OS metaphor is a useful stress test for that approach, because it has a lot of state, a lot of cross-entity relationships, and a lot of interactions that traditionally live in a client-side store. If Convex can hold all of that without me reaching for Redux or Zustand, the pattern is doing real work.

The Stack With React, Vite, and Convex

The frontend is React with Vite. There's no SSR because the whole thing is a stateful single-page app, and trying to server-render a desktop you're about to hydrate into a fully reactive state machine adds work without giving anything back.

Authentication is handled by Convex Auth with username and password. I skipped social login because the demo is meant to be poked at quickly, not signed into seriously. Styling uses xp.css for the authentic XP chrome, and the icon set is borrowed from a public Windows XP icons project. Layout primitives like Box, Flex, Horizontal, and Vertical are loosely modeled on Basarat's General Layout System, which I find easier to reason about than ad-hoc flex containers.

Vite is doing the work I expect a build tool to do and not much else, which is the point of choosing it. The fast refresh matters more than usual here because so much of the development loop is "open two tabs, change a thing in one, see the other update." Anything that slows down that loop slows down the whole project.

What Lives in Convex Versus What Stays Client-Side

Almost everything goes in Convex. Windows, processes, files, and agent threads are all stored as rows in the database, because they're exactly the kind of state that should survive a refresh and appear on every tab.

The exceptions are small and motivated by layout rather than persistence. The taskbar button DOM positions, for example, live in React refs because they're needed for the minimize animation, and they're tied to where elements actually render rather than to anything I want to remember across sessions. Treating those as ephemeral and everything else as persistent has been a clean line to hold.

The rule I have been using is that if a piece of state would be wrong on a fresh tab, it belongs in Convex. If it would only be wrong on a fresh frame, it belongs in a ref or in component state. That heuristic has held up across every feature I have added, which suggests it's doing real classification work rather than just being a convenient slogan.

How Multi-Tab Sync Works in Convex OS

Multi-tab sync in Convex OS works because Convex queries are reactive by default. Any state stored in the database, whether that's window positions, view states, or file upload progress, propagates to every subscribed client automatically. Opening a second tab shows the same desktop, and moving a window in one tab moves it in the other in real time.

This matters more than it first sounds, because state continuity is what makes an OS metaphor feel like an OS. If each tab opened a fresh session, the desktop would feel like a website that happens to have draggable windows. Because the database is the source of truth and every client subscribes to the same queries, the desktop feels like one running system observed through multiple windows into it.

I only wrote queries and mutations, and the sync is what the platform does. That deserves a closer look, because it's the single largest reason the project was tractable as a side project rather than a months-long engineering effort.

In a more traditional stack I'd have to pick a sync strategy, wire up websockets, handle reconnection, dedupe events, and reconcile conflicting updates across tabs. Each of those is a real piece of work and each one can go wrong in subtle ways. Here, the reactivity model means a mutation runs once, the database updates, and every subscribed client gets the new state in the same shape. The cost I pay is that I have to model my state as queries and mutations, which is a constraint I'd want anyway.

There is one small subtlety that comes up with drag operations. Dragging a window fires a lot of position updates, and you don't want to write every intermediate position to the database. The compromise I settled on is to update local state during the drag and write to Convex on drag end, which keeps the visual experience smooth in the dragging tab and updates the other tabs in a single step when the drag finishes. This is the kind of place where some client-side state earns its keep, even in a system designed around persistence.

The Schema, Four Tables That Run the OS

The whole OS runs on four tables. Each one corresponds to a concept you would recognize from a real desktop environment, and the schema definitions are where most of the design work happened. Getting the shape of these tables right was the part of the project that paid off the most over time, because every later feature either fit cleanly into the schema or revealed where the schema needed adjusting.

Files and Modeling Upload State as a State Machine

The files table stores name, size, type, position on the desktop, and an uploadState field. That last field is a discriminated union with four variants: created, uploading, uploaded, and errored.

I chose a discriminated union rather than a flat row with nullable fields for a specific reason. In the created state there is no storage ID yet, because the file has been registered but not uploaded. In the uploaded state there is a storage ID, and the field is non-null. If I modeled both states with the same nullable storageId, every consumer would have to ask "is this defined right now?" and infer the answer from context. With a discriminated union, the available fields are explicit per state, so the consumer reads the state tag and knows which fields it can rely on.

Drag-and-drop uploads flow through this state machine. The UI reads the current state and renders a progress bar for uploading, a preview for uploaded, and an error chip for errored, all driven by the file storage APIs. One honest caveat: drag-and-drop misbehaves in some Firefox-based browsers, specifically Zen, while working fine in Chrome. I haven't tracked down the root cause yet, and I expect it's a quirk of how those browsers report drag events rather than anything Convex-specific.

The state machine approach also makes recovery easier. If a tab is closed mid-upload, the row stays in uploading and a future client can decide to retry, abandon, or surface the half-uploaded file as something the user needs to deal with. Modeling that as a state rather than as a set of independent booleans means the logic for "what do I do with this file" stays in one place.

Processes, One Row Per Running App

The processes table holds one row per running app. Image preview, text preview, the in-OS Internet Explorer, each is a process row with app-specific state attached.

That app-specific state is where things like "which file is being previewed" or "the URL history for the browser" live. Treating each running app as a row makes it trivial to enumerate what is open, kill processes, or reason about per-app state without a parallel client-side store. It also means that when you open a second tab, the running apps are already there, and the system feels continuous rather than restarted.

The shape of the per-app state varies by app, which is another place a discriminated union earns its keep. The browser process carries history and a current URL. The image preview process carries a file reference. Modeling these as variants of a single processes row, rather than as separate tables per app, keeps the enumeration logic simple and lets new apps slot in without schema churn.

Windows With Position, Size, and View State

The windows table stores x, y, width, height, title, icon, and a viewState of open, minimized, or maximized. A window belongs to a process.

Right now the relationship is one process to one window, but the schema allows a process to own multiple windows, and I expect to lean on that later. The taskbar groups windows by process, so minimizing or maximizing iterates over the process's windows rather than acting on a single row. That indirection is cheap to put in early and would be painful to add later.

The split between processes and windows mirrors how a real desktop environment works, where one app can own several windows but they share underlying state. I almost collapsed the two tables in an early draft because I was not yet using the one-to-many relationship, and I am glad I did not. The first time I wanted to add a second window to an existing process, the schema was already shaped to allow it.

Message Metadata for Attachment Chips

The messageMetadata table is the smallest of the four and exists for a specific reason. The Convex agent component doesn't currently support arbitrary metadata attached to messages, so attachment references for the AI agent live in their own table and are joined for display.

If the agent component grows that capability later, this table goes away. For now it's the cleanest way to associate file references with agent messages without bending the agent component into a shape it wasn't designed for. I prefer adding a small, well-scoped table to working around a component's limitations inside the component itself, because the table is easy to delete later and the workaround wouldn't be.

Meet Sheffy, the In-OS AI Agent

Sheffy is a small AI agent UI inside the OS, a nod to Clippy. It is built on the Convex agent component, which handles threads, message history, and tool calls. If you want a broader look at the patterns behind building AI agents with Convex, the Stack article covers the memory and threading model in more depth.

Threads, attachments, and previews are all persisted in Convex, so the conversation survives refreshes and follows you across tabs the same way the rest of the OS does. You can drag a file from the desktop into the chat, ask Sheffy about it, and get a response that references the attachment. The plumbing between the file row, the message metadata, and the agent component is where most of the implementation work went.

The interesting design decision here was how much of the agent's state to expose through the OS abstractions versus through the agent component's own primitives. I ended up using the agent component for everything it natively supports and putting a thin layer of messageMetadata rows on top for the attachment references. That split keeps each side doing what it's good at, and the join at display time is cheap because both lookups are indexed.

Sheffy is also where the multi-tab story gets a little surreal in a way I didn't expect. Because the thread lives in Convex, you can start a conversation in one tab, switch to another tab, and continue it as if you had never moved. The agent doesn't know which tab you're in, and the UI doesn't need to, which is the right answer but is still slightly uncanny the first time you see it.

The In-Browser Internet Explorer and Why Iframes Fight Back

There is a working "Internet Explorer" in Convex OS, and it's exactly what it sounds like: an iframe wrapped in XP chrome with a URL bar and back and forward buttons. It works, but it's also a toy.

Back and forward buttons are limited because iframes restrict cross-origin navigation control, so the browser can't inspect or fully drive the history of an embedded page. The bigger problem is that many sites refuse to embed at all because of Content Security Policy headers, which is the correct security behavior on their part. Personal blogs and sites that allow framing load fine, but most major sites don't. I'm not going to pretend this is a real browser because it isn't, but it's a useful demonstration of how a process can own a window that renders arbitrary web content.

The reason I kept the feature anyway is that it exercises the schema in a useful way. A browser process needs to remember its URL history, its current page, and the size of its window, and all of that needs to survive a refresh. Every one of those concerns is already covered by the existing tables, which validates the schema design more than any synthetic test could.

How to Organize Convex Backend Code at Scale

Because Convex queries, mutations, and actions can't call each other directly, the recommended pattern is to extract logic into plain helper functions and call those from the public functions. Convex OS takes this a step further with a hierarchical model layer, where each public function is a thin auth wrapper that delegates to a scoped model object exposing the operations for that user.

In practice this means convex/my/files.ts is a short file. It uses a custom myQuery builder that handles authentication, then delegates to model.ts where the real logic lives. The model layer exposes something like filesForUser(userId), which returns an object with .list(), .find(id), .get(id), and .getInState(state). The user is bound once, when the scoped object is created, instead of being threaded through every helper call.

That hierarchical scoping is the part I find most useful. Without it, helper functions end up taking (db, userId, fileId) tuples over and over, which is verbose and easy to get wrong. With it, the scope is established once and every operation inside that scope inherits it.

Cross-model composition then becomes straightforward. Focusing a process calls into the windows model, lists the windows owned by that process, and focuses each of them, without ever crossing the query or mutation boundary. The composition happens in plain TypeScript, where it belongs.

This is experimental, not prescriptive. I'm still figuring out where the rough edges are, and I'd be interested to hear from anyone who tries a similar pattern. The rough edges I've noticed so far are mostly about ergonomics (the scoped object pattern adds a small amount of indirection that takes a minute to learn), but the payoff in eliminated parameter threading has been worth it on every model I've written so far.

If you're starting smaller, you probably don't need this. Plain helper functions called from your public functions will carry you a long way, and the model layer only starts paying for itself when you have several models that compose against each other. For a project the size of Convex OS, that threshold is reached early; for a simpler app it might never be.

What I Would Build Next

There is a list of things I wanted to add and did not get to. A working clock in the taskbar, Minesweeper, Notepad with persistent documents, and screen savers are the obvious ones. Sheffy could use more tools too, things like OS control, file manipulation, and browser navigation, so the agent can act on the OS rather than just talk about it.

Each of those is interesting in a slightly different way. The clock is trivial. Minesweeper is a self-contained app that would test how well the process/window split holds up for something with real per-app state and event handling. Notepad with persistent documents would introduce a documents table and exercise the file schema in a different direction. Screen savers would test idle detection and full-screen window states, which are corners of the system I have not pushed on yet.

The Sheffy tool work is the one I am most curious about. The agent component already supports tool calls, so wiring Sheffy up to mutations that open windows, move files, or navigate the browser is mostly a question of defining the tools and writing the system prompt that teaches the agent when to use them. That feels like the right next demo: an agent that can drive the OS, not just chat about its contents.

I ran out of time on this project. The source is public and the schema is set up for most of what I just described, so if any of it sounds fun, the repo is the right place to start.

FAQ

Q: What is a browser-based operating system? A: A browser-based operating system is a desktop-style UI rendered inside a web browser, where windows, files, and application state are managed by web technologies rather than a native OS kernel. The category spans cloud-backed thin clients, in-browser emulators, web-desktop UI layers, and remote-streaming containers. Convex OS is a web-desktop UI layer with a real-time backend.

Q: Can you build a browser-based OS with React and Convex? A: Yes. React handles rendering the windows, taskbar, and apps, while Convex stores nearly all of the state, including window positions, processes, files, and agent threads. Because Convex queries are reactive, the OS metaphor stays consistent across tabs without any custom sync code.

Q: How does Convex handle real-time multi-tab sync? A: Convex queries subscribe clients to the database and push updates whenever the underlying data changes. Any tab subscribed to the same query sees the same state, so moving a window or uploading a file in one tab is reflected in every other tab automatically.

Q: How should I model file upload state in Convex? A: Model it as a discriminated union with explicit states like created, uploading, uploaded, and errored. This makes the fields available in each state explicit, so consumers don't have to guess whether a storageId or progress value is defined. The UI can then render directly off the state tag.

Q: How do you organize Convex queries and mutations in a larger app? A: Extract logic into plain helper functions and call them from your public functions, since queries, mutations, and actions can't call each other directly. A hierarchical model layer, where a scoped object like filesForUser(userId) exposes the operations for that user, keeps helpers compact and avoids threading the same parameters through every call.

Putting Convex OS Into Practice

If you're building any stateful React app with a multi-window UI, and that includes dashboards, multi-pane editors, and collaborative whiteboards as much as it includes an OS metaphor, put the window and pane state in Convex from day one. The reactivity you get for free is worth more than any client-side state library will give you, and the multi-tab continuity comes along with it at no extra cost.

The deeper lesson from this project is that an OS metaphor is a useful forcing function for state design. Because the metaphor demands persistence, multi-window coordination, and per-app state that survives across sessions, it pushes you to model state correctly from the start rather than retrofitting it later. Even if you never build a desktop UI, the discipline of asking "would this still be correct on a fresh tab" is a good test to apply to any piece of state you're about to write down in a client-side store.

If you want to poke at the result, try the live demo at the link in the intro, and read the source on GitHub at the repo linked there too. If you want to start something similar, the Convex quickstart is the fastest way in, and the patterns described here should give you enough scaffolding to skip a few of the design decisions I had to work through the hard way.

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started