Wayne Sutton's avatar
Wayne Sutton
2 months ago

How to Design Your Stack: An Interview With Sherry Jiang, Peek Founder

When Sherry Jiang rebuilt Peek, her consumer AI fintech, she migrated from one century of backend thinking to the next. The old stack was Postgres on AWS with the usual middleware around it. The new stack is Convex, and the rebuild took a week and a half once the team cut what they didn't need.

This article is about why that migration made sense, what a real-time backend for consumer AI apps is, and why an AI-native product has to feel less like a database and more like a game engine. The conversation is drawn from Wayne Sutton's interview with Sherry on designing an AI tech stack for consumer apps in 2026.

Why modern backend means something different in 2026

A real-time backend for consumer AI apps is one where state changes in the database propagate to the client automatically, without a separate websocket layer, queue, or sync service stitched together by hand. The reactive model treats the UI as a live view over server state, so when an agent writes something, the user sees it immediately. That's the shape modern consumer AI products need, because the agent and the user are co-editing the same surface in real time.

Most teams default to Postgres with a queue, a websocket layer, a cache, and a vector store bolted on. That stacked-services pattern is a legacy of how engineering orgs used to be split, with backend specialists owning one half and frontend specialists owning the other. The lines between those roles have blurred over the last few years, so the tooling that assumes a sharp split between them is starting to feel out of step. Sherry calls Convex "a very modern solution" because it collapses those layers into something a single developer, or a single agent, can hold in their head.

Sherry Jiang describing Convex as a very modern solutionSherry Jiang describing Convex as a very modern solution

The shift matters because the kind of app being built has changed. A CRUD dashboard from 2018 could survive on a relay of services because users tolerated a refresh button and the occasional half-second pause. A consumer AI app can't because:

  • The agent is writing to state continuously
  • The user is reading from state continuously
  • The user feels any latency between those two things as awkwardness rather than as engineering.

The architecture has to reflect that, which means the database, the sync layer, and the client transport need to share a model rather than translate between three of them.

There's a second reason the stack matters more now than it used to. The agent writing the code is also a stakeholder in the codebase. If the backend is spread across four services with four contracts, the agent has four times the surface area to misunderstand. If the backend is a single coherent system with one language and one schema definition, the agent reads it the same way a new engineer would and writes correct code more of the time.

The old Postgres world feels like passing notes back and forth

Sherry borrowed a metaphor she'd seen on Twitter to describe the legacy pattern. Three or four layers sitting between the database and the client means every state change is a relay race of notes being passed back and forth. Every handoff adds latency and demands its own contract, which is just another place for a bug to live. For a CRUD app that's annoying. For an AI product where the agent is updating state continuously, it's a structural problem.

The note-passing metaphor lands because it's literally what's happening at the wire:

  1. The database notifies a sync layer
  2. The sync layer publishes to a queue
  3. The queue feeds a websocket gateway
  4. The gateway pushes to the client
  5. The client's cache is reconciled against the new payload.

Every one of those hops is a place where a developer has written code to translate between formats, retry on failure, and decide what counts as "fresh." The reactive model collapses the whole relay into a single subscription contract, so the developer writes the query once and the framework handles the rest.

What game-engine responsiveness means for an AI app

Consumer apps that work, the ones people open out of habit, manufacture joy. TikTok's video swap is instant. Duolingo throws confetti. Reward points tick up the moment you earn them. Sherry frames this as game-engine responsiveness, where the system has to give the user a visible reaction the instant they act. An AI product that pauses for half a second after a vulnerable disclosure breaks the spell the same way a real friend would if they paused awkwardly. The interaction collapses, and the user stops sharing.

Game engines solve this problem with tight render loops and shared memory between the simulation and the display. Consumer AI apps need the same property at a higher level of abstraction, which is what a reactive backend provides. The model is that the UI is always rendering the latest version of server state, and the server is always pushing new state as soon as it changes. There's no polling, refresh, or waiting for the next chat response to surface something written three seconds ago. The result feels less like a web app and more like a piece of software running locally on the user's machine.

What Peek is and why behavioral AI needs real-time UX

Peek, a consumer AI fintech, is building behavioral intelligence around the way people spend money. Sherry spent a decade in consumer fintech before founding it, including work on Google Payments in India, where her team ran behavioral-science experiments around variable rewards. Scratch cards with 35 to 65 percent win rates were one of the levers they used to make saving and spending feel like a game rather than a chore.

That background is the reason Peek behaves more like an attentive friend than a dashboard. It talks to you, learns what you care about, and reflects it back so you can see yourself more clearly. The product class demands real-time state because the emotional contract with the user depends on it.

Sherry Jiang on earning your first $100,000 through behavioral consumer financeSherry Jiang on earning your first $100,000 through behavioral consumer finance

The variable-rewards work matters as context because it shaped how Sherry thinks about engagement. A money app that hands you a number and expects you to feel bad about it will always lose to one that turns the same data into a moment of recognition or surprise. Peek inherits that thesis, which means the surface has to react to the user the way a thoughtful friend would, and the underlying state has to support that reaction without lag.

Building a scaffolded identity through conversation

Peek's internal term for the things it remembers about a user is artifacts. You tell Peek something about how you think about money, Peek extracts an artifact, and the artifact appears in your interface immediately. Seeing the trace appear is what makes you willing to share the next thing. The loop is conversational on the surface and reactive underneath, so the backend has to push the new artifact to the UI the moment it's written.

The reason that loop works is the same reason note-taking apps with live previews feel different from note-taking apps with save buttons. The user is forming a model of what the system knows about them in real time, which is only possible if the system shows its work as it happens. If the artifact takes two seconds to appear, the user has already moved on, and the chance to anchor the next disclosure is gone. The reactive backend is what makes the difference. Without it, the identity never feels like a real conversation, just a form being filled out.

Why latency ruins emotionally charged interactions

Sherry's analogy lands hard. If you share something vulnerable with a friend and they pause for a beat too long before responding, you feel the pause and don't share the next thing. An AI agent operates under the same social physics. A two-second wait between a user message and a visible response damages the relationship, not just the performance metrics. Real-time is what makes this product class possible in the first place.

This is where the comparison to B2B AI sharpens. A sales-ops copilot that takes three seconds to summarize a deal is fine, because the user is in a working posture and treats the delay as compute time. A consumer agent that takes three seconds to acknowledge a disclosure about debt is broken, because the user is in a vulnerable posture and treats the delay as social withdrawal. The same engineering latency carries different meaning depending on the emotional register of the surface, which is why consumer AI can't borrow B2B's tolerance for round-trip time.

How Peek migrated from Postgres and AWS to Convex in a week and a half

Peek rebuilt its backend in roughly a week and a half. The precondition was a hard look at the existing surface area and an honest cut of what the new product no longer needed. Behavioral artifacts and the memory system were built fresh on Convex, whereas categorization and Plaid ingestion were ported across directly. The whole sequence was designed to protect the integrity of users' financial data while the underlying system changed beneath it.

A week and a half is fast enough that it's worth being honest about why. The team wasn't translating ten years of accumulated business logic. They were cutting what wasn't core, rebuilding what was reactive, and porting the parts that were closer to standard pipeline work. The number isn't a benchmark anyone else should expect to hit without doing the same pruning first.

Step one cut what you don't need

The previous version of Peek had net-worth tracking and balance features that weren't core to the spending-and-behavior product Sherry was building next. Migrating dead surface area is the moving-house mistake, where you pack the boxes you should have thrown out and pay to move them. Cutting first is good practice independent of which backend you're moving to. It just happens to be especially valuable when the target is a reactive system where every table you keep is a table you'll model intentionally.

The cut went deeper than features. It also touched the implicit assumptions baked into the old schema, since each one of those would have had to be re-justified in the new system. Net-worth tracking, for example, implied a particular shape of account snapshots and a particular cadence of refresh that didn't match how the new product would think about user state. Carrying that shape forward would have meant the new schema inherited the previous product's mental model, which is the opposite of what a rebuild is for.

Step two rebuild what's actually real-time

Once the surface was pruned, the truly reactive parts of Peek, the artifact extraction and the memory system, were rebuilt rather than translated. Reactive state is something you design for from the schema up, so a lift-and-shift wouldn't have produced the right shape. Categorization logic and the Plaid integration moved over more directly because they're closer to traditional ingestion work. The split was deliberate, with the live-feeling parts of the product getting native treatment and the pipeline parts getting a clean port.

The distinction between rebuild and port is worth slowing down on, because it generalizes. Anything that the user is supposed to feel in real time should be designed for the reactive model from scratch. Anything that runs on a schedule, processes a feed, or pulls from a third-party API can usually be moved with minimal change, since the latency profile of that work isn't user-facing in the same way. Drawing that line through your codebase before the migration starts is what keeps the timeline honest.

Protecting data integrity in fintech

Peek deals in real money, so a categorization error that shows a user $4,000 in spending instead of $400 is a trust event, not a bug. Sherry sequenced the migration so the spend-accuracy guarantees stayed intact end to end, with the new system reconciling against the old before customer-facing surfaces switched over. The same logic applies in any domain where the data itself carries trust. The migration plan has to treat correctness as a precondition rather than something to verify after launch.

The old system kept running while the new system was filled with the same inputs and asked to produce the same outputs, with any divergence flagged for inspection. The new system only got to drive the UI once both sides showed the same number for the user's last six months of spending. That sequencing turns a risky cutover into a series of small checks, each one able to fail loudly without the user ever seeing a wrong number.

Why Convex works for vibe coders and 13-year-olds alike

An LLM-friendly backend matters because the AI agent writing your code is now a coworker, and coworkers need a codebase they can navigate. Sherry has taught over a thousand people to build apps zero-to-one, and her bootcamp uses Convex because newcomers ask the right questions out loud. They want to know why the database lives in a different place than the code that queries it, and they haven't yet absorbed the historical reasons that became normal. The answer, when you're on Convex, is that it doesn't.

The bootcamp signal is more interesting than it first sounds. Beginners don't have priors about how backends are supposed to be structured, so their confusion tracks genuine cognitive load rather than habit. When a thousand of them keep asking the same questions about a stack, you've learned something about the stack. When a different stack stops generating those questions, you've learned something about that one too.

Sherry Jiang on the best growth levers in the productSherry Jiang on the best growth levers in the product

LLM-readable backends are a design choice

When the schema is defined in TypeScript next to the functions that read and write it, an agent reading the repo has the whole picture in one place, with no separate console, migration tool on a different surface, or out-of-band SQL to reconcile against the application code. You can see exactly this design principle in how Convex's query engine works, where the absence of certain SQL affordances is a deliberate choice in service of a single coherent model.

The single-language property compounds. An agent that already understands the TypeScript in your queries also understands it in your actions, your schema, and your client. There's no moment where the agent has to switch dialects, learn a new ORM, or hold two type systems in its head at once. That property is what lets the agent generate code that compiles and runs the first time more often than not, since the surface it's writing against is the same shape everywhere it looks.

Removing the cognitive overload of middleware

A beginner, or an agent acting like one, gets overwhelmed by middleware before they get overwhelmed by logic. Convex skips the separate MCP wiring, the SQL console in a different tab, and the queue debugger that lives outside the editor. Sherry's youngest student is 13 years old and shipping apps in 48 hours. That's only possible when the surface area you have to hold in working memory matches the surface area of the problem you're solving.

The middleware tax is invisible to engineers who've already paid it. If you've spent five years learning to keep a queue, a cache, a websocket layer, and a SQL console in your head simultaneously, you don't notice that you're doing it anymore. A 13-year-old does, because they haven't built the muscle yet. They ship working apps in 48 hours because so much of the standard backend stack turns out to be incidental complexity rather than essential, not because they're unusually precocious. Removing that incidental layer makes the same problem solvable by people who would have bounced off the older stack entirely.

Sherry's AI tech stack with Cursor, Codex, and a loyalty to what's good

Sherry has used Cursor since April or May of 2024, back before agent mode and before in-editor internet search. She stayed with it because every iteration kept being incrementally better than the alternative she'd test against. Her current model preference is GPT-5 Codex on medium for most of her work. The tools matter less than the evaluation discipline. For every new model or AI tool, she asks how much of the launch is real and how much is hype, and only switches when there's enough real there to outweigh the hype. The same loyalty applies to Convex. She's stayed because it keeps delivering.

The evaluation discipline is more transferable than the specific tool choices, because the choices themselves will be different in six months. The question Sherry asks of any new release is whether it's better than what she's already using on the work she's doing, not whether the demo is impressive or the changelog is long. That filter is what keeps her from cycling through tools every time a new one trends, and it's what makes the loyalty meaningful when it does land. That kind of loyalty only counts when the tool keeps clearing the bar, otherwise it's just inertia dressed up as conviction.

The same logic applies to backends. The traditional relational stack earned its place over twenty years of CRUD apps, and for a lot of products it still clears the bar. The question for any team building consumer AI in 2026 is whether their AI tech stack is serving the work they're doing or working against the product they're trying to ship. Sherry's read is that for an agent-driven, behaviorally-rich, real-time consumer surface, the older stack stops clearing the bar, and the reactive model starts to.

Where to start if you're building consumer AI

If you're building a consumer AI product where the UI needs to reflect agent state changes in real time, model your AI tech stack around a reactive backend first and add inference around it, rather than the other way around. Peek's migration worked because the team cut surface area before they moved, treated the live parts of the product as a fresh design problem, and protected data integrity through the cutover. The same playbook works for any consumer AI app where joy and immediacy are part of the contract. You can spin up a working backend in about ten minutes and feel the difference yourself, then explore more Stack articles on how teams are shipping with reactive infrastructure. If you want to compare notes with other builders, the Convex Discord is where most of the consumer AI conversations are happening.

The broader pattern worth carrying away from Peek's story is that the constraints of the product shape the constraints of the stack, not the other way around. A behavioral AI fintech where the user's willingness to disclose depends on real-time reflection can't be built on a backend that treats latency as an engineering metric, and a consumer surface that needs to feel like a game can't be built on a stack that thinks of itself as a database with adapters. Choosing the right foundation is the choice that makes the rest of the product possible, which is why Sherry treated the migration as foundational rather than as plumbing.

The other lesson is about who the codebase is written for now. Five years ago, the answer was the team of engineers maintaining it. Today, the answer includes the agents writing alongside them, which means the codebase has to be legible to a reader who doesn't have the history. A backend where the schema, the queries, and the client are written in the same language and live next to each other is a backend an agent can navigate. A backend stitched together from four services with four contracts is a backend that even a careful human gets lost in. The shape of the team has changed, so the shape of the code should change with it.

Frequently asked questions about real-time backends for consumer AI

Q: What is a real-time backend for consumer AI apps? A: It's a backend where state changes in the database propagate to the client automatically, without a separate websocket, queue, or sync service stitched on. Convex is built around this reactive model, so when an agent or a user writes data, the UI updates immediately.

Q: How long does it take to migrate from Postgres to Convex? A: Peek completed the migration in about a week and a half after pruning surface area that wasn't core to the new product. The migration window depends heavily on how much you cut first and how much of the existing system was reactive versus traditional CRUD.

Q: Why does consumer AI need real-time more than B2B AI? A: Consumer products run on dopamine, and the emotional contract with the user collapses if the interface pauses after the user shares something. The same way a friend pausing awkwardly after a vulnerable disclosure ends the conversation: a half-second delay in an AI agent breaks the loop that makes users want to keep engaging.

Q: What makes a backend LLM-friendly? A: Schema-in-code, a single language across the stack, no out-of-band consoles to context-switch into, and a file structure the agent can navigate end to end. When the database definition lives next to the functions that read it, the agent reads the whole picture in one pass and writes correct code on the first try more often.

Q: Is Convex good for beginners or non-developers? A: Yes. Sherry Jiang's bootcamp has taught over a thousand students to ship apps zero-to-one on Convex, including a 13-year-old who built a working app in 48 hours. The lower middleware overhead means new builders spend their attention on product logic instead of on plumbing.

Q: What's the right stack for a consumer AI app in 2026? A: A reactive backend like Convex, an AI-native IDE like Cursor, and a current frontier model such as GPT-5 Codex. Evaluate new tools by whether they're incrementally better than what you're already using, not by whether they're trending, and stay loyal to the pieces that keep delivering.

Q: How should I sequence a backend migration for a fintech product? A: Start by cutting features that aren't core to the next version of the product, since migrating dead surface area is the most common mistake. Rebuild the reactive parts of the product from the schema up rather than translating them, and port the pipeline-style work directly. Run the old and new systems side by side and reconcile their outputs before switching customer-facing surfaces, so any divergence in financial data is caught before a user sees it.

Q: What does game-engine responsiveness mean for a consumer AI product? A: It means the user gets a visible reaction the instant they act, the same way a game responds to a button press. For an AI agent, that translates to the UI reflecting new state the moment the agent writes it, with no polling, no refresh, and no perceptible delay. The standard is whether the interaction feels emotionally continuous to the user, not whether engineering latency hits a target.

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started