Jamie Turner's avatar
Jamie Turner
18 days ago

Why Convex Limits Transactions and How Concurrency Control Shapes Your Database

The Physics of Contended Writes

Every database has a speed limit on consistent operations against a single contended record. Convex enforces a 1-second and 1MB transaction cap and surfaces optimistic concurrency control (OCC) errors when transactions race on the same record. Postgres and MySQL face the same physics, but their default isolation level hides the problem by silently losing writes instead of erroring.

This guide walks through the trade-off between optimistic and pessimistic concurrency control, why Convex chose OCC, what the read committed isolation level actually does in Postgres, and the concrete patterns (staleness, aggregation, components) that let you scale past contention without lock-style workarounds.

What Record Contention Is and Why It Slows Databases Down

Record contention is what happens when multiple transactions want to read or write the same row at the same time. Without contention, every database looks fast. With contention, a queue forms, so how that queue is managed determines whether the system stays online under load.

The Embarrassingly Parallel Case Where Every Database Looks Like a Champ

Imagine a kingdom of nobles, each working their own plot of land and recording the harvest in their own ledger. Hundreds of nobles can write to hundreds of ledgers in parallel, because no two writers touch the same record. This is the embarrassingly parallel case, and any database (relational, document, key-value) handles it well. Throughput scales with hardware because there's nothing to coordinate.

Most benchmarks live in this world, telling you almost nothing about how a database behaves when the workload concentrates.

Tax Day When Everyone Wants the Same Record

Now imagine tax day. Every noble in the kingdom sends an accountant to the castle, and every accountant needs to update the king's single ledger. Suddenly there's one record and many writers, so a queue forms at the door. The king's ledger has become a single point of failure for throughput, because no matter how fast each accountant is, only one can touch the ledger at a time.

This is the part of the workload where database design choices stop being academic. The question is no longer "how parallel can we get?" but "how do we order and validate concurrent writes to the same record without losing data?" Every answer involves a trade-off, and that trade-off is the difference between pessimistic and optimistic concurrency control.

Pessimistic Concurrency Control Locks First and Asks Questions Later

Pessimistic concurrency control (PCC) assumes conflicts will happen, so it prevents them by locking records before reading or writing. A transaction holds the lock for as long as it runs, which means other transactions wanting the same record must wait. In SQL this surfaces as SELECT FOR UPDATE and row-level locks under serializable isolation.

How Pessimistic Locking Works

Back at the castle, the rule under PCC is that an accountant who wants to update a ledger first takes physical possession of it. They walk to the king's chamber, pick up the ledger, carry it back to their desk, compute the new total, write the update, and only then return the ledger to its shelf. Anyone else who wanted that ledger waits outside the door.

This works. As long as transactions are short and well-behaved, locks are released quickly and the queue moves. The cost is paid in coordination, because every reader and writer is forced through a serial choke point at the contended record.

Pessimistic concurrency controlPessimistic concurrency control

The Deadlock Problem

Pessimistic locking introduces a failure mode that doesn't exist under OCC: deadlocks. Accountant Ashcroft holds Hartwell's ledger and wants Ashcroft's; accountant Hartwell holds Ashcroft's ledger and wants Hartwell's. Neither can proceed, because each is waiting on a lock the other holds. The database has to detect the cycle and abort one of the transactions, which means the application has to handle the retry.

Deadlocks aren't exotic. Any transaction that touches two records in a different order than another concurrent transaction is a candidate. As schemas grow and code paths multiply, the combinations multiply with them.

Why Pessimistic Systems Get More Fragile as They Scale

The deeper problem with PCC is blast radius. Picture Duke Cashington's junior accountant, who walks into the castle, picks up a busy ledger, and then sits down to do something slow. Maybe they're computing a complicated tax. Maybe they got distracted. Maybe their process hung. While they hold that lock, every other accountant waiting on that ledger is also stuck, and any transaction that needs a second lock those waiters already hold is stuck behind them.

A single slow or stuck transaction can freeze a meaningful slice of the system. In practice, a significant share of early-growth-stage database incidents trace back to one query holding a lock too long. The system was fine until one transaction misbehaved, and then everything that touched the same hot record went down with it.

PCC has been studied since the 1970s, and the trade-offs were spelled out clearly in the foundational OCC paper by Kung and Robinson in 1981, "On Optimistic Methods for Concurrency Control" (ACM TODS). That paper proposed a different model entirely.

Optimistic Concurrency Control Reads Computes Validates and Commits

Optimistic concurrency control (OCC) assumes conflicts will be rare, so it lets transactions run without locks and validates at commit time. If two transactions touched the same record and one already committed, the second one fails and the application retries. The classical model has three phases:

  • read
  • validate
  • write

How OCC Works in Practice

Back at the castle under OCC, no accountant carries a ledger anywhere. Instead, they walk up to a window, read the current value of the ledger, and go back to their desk to compute. When they're ready to commit, they return to the window and say "I read version 47 and want to write version 48." If the ledger is still at version 47, the write succeeds. If someone else got there first and it's now at version 48, the accountant's transaction is rejected and they start over.

This is what an OCC error is at the database level: a transaction lost the race to commit against another transaction that touched the same record. Data is not lost and there are no locks, so no deadlock is possible because no transaction ever waits on another.

Why OCC Wins in Real Systems Even Though It Looks Less Efficient on Paper

On paper, PCC is more efficient under high contention because OCC has to do work it might throw away. In practice, OCC tends to keep systems online as they scale, and the reason is back-pressure. The database is the scarce stateful resource, whereas application servers are stateless and scale horizontally. Under OCC, when contention spikes, the failed transactions bounce back to the application tier, which can retry, queue, or shed load. Under PCC, the contention manifests as locks held inside the database itself, which is the one place you can't easily add capacity.

A stuck OCC worker doesn't block anyone, because it doesn't hold a lock. A stuck PCC worker can freeze every transaction that touches the same record, and every transaction queued behind those. The compositional power of OCC, which keeps failure local to the transaction that lost the race, is why Convex chose it.

Optimistic concurrency controlOptimistic concurrency control

Which Systems Use OCC

OCC is standard in modern distributed databases. Convex uses it, as well as FoundationDB and TiKV. Postgres can be configured for serializable isolation, which gives you OCC-style behavior, but it's rarely used in production because it's awkward to opt into and the default isolation level papers over the problem in a different way (more on that below).

Optimistic and Pessimistic Concurrency Control Side by Side

DimensionOptimistic (OCC)Pessimistic (PCC)
Conflict assumptionConflicts are rareConflicts are likely
When conflict is detectedAt commit timeAt read or write time
Locking behaviorNo locks held during transactionLocks held for duration of transaction
Performance under low contentionExcellent, no lock overheadGood, but lock acquisition has cost
Performance under high contentionHigher abort/retry ratesHigher queue/wait times
Deadlock riskNoneReal, requires detection and resolution
Retry behaviorApplication retries on conflictDatabase queues; some retries on deadlock abort
Failure-mode blast radiusLocal to the failed transactionCan cascade across all transactions touching the locked record

The table makes the trade-off visible, but the practical answer is shaped less by raw throughput and more by what happens when something goes wrong. Under PCC, a single slow transaction can take the system down. Under OCC, a single slow transaction loses its own race and the rest of the system keeps moving.

Pessimistic and optimistic lockingPessimistic and optimistic locking

What Postgres Actually Does by Default and Why It Is Worse Than You Think

Postgres defaults to the read committed isolation level, not serializable. Under read committed, two SELECT statements inside the same transaction can read from two different committed snapshots of the database, which means a transaction can silently lose updates without any error. Most developers assume Postgres protects them from this. It doesn't, unless you explicitly opt in to stricter isolation or use SELECT FOR UPDATE.

Read Committed Is Not Snapshot Isolation

Read committed guarantees only that you won't see uncommitted writes from other transactions. It doesn't guarantee that the data you read at the start of your transaction will still be there at the end, and it doesn't guarantee that two reads of the same row inside one transaction will return the same value. This is looser than snapshot isolation, looser than repeatable read, and dramatically looser than serializable.

MySQL's default is similar. Both databases ship with the looser default because it's faster and because most workloads don't hit the failure mode often enough to notice.

The Lost Update Problem in Action

Here's the scenario that should be more widely known:

  • Three accountants each want to credit $1,010 to the king's account, which currently holds $0.

  • Under read committed, all three transactions can read the balance as $0 at roughly the same time.

  • Each computes the new balance as $0 + $1,010 = $1,010.

  • Each writes $1,010.

The transactions all commit successfully with no error. The kingdom received $3,030 in tax revenue but the ledger shows $1,010. Two thirds of the day's collections just disappeared. There's no exception, no log line, no retry. The data is gone.

This is the lost update problem, and it's the default behavior of the two most popular open-source relational databases. Most application code doesn't hit it because real conflict rates are low, but when it does hit, the failure is silent.

Fixing It With SELECT FOR UPDATE and the Trade-Offs

The standard fix in Postgres is SELECT FOR UPDATE, which takes a row-level lock for the duration of the transaction. That fixes the lost update problem by opting you back into pessimistic locking, with everything that implies:

  • Queue waits
  • Deadlock risk
  • The lock-hold blast radius described earlier.

You can also set the transaction isolation level to serializable, which uses OCC-style validation, but the ergonomics are awkward and most application frameworks don't default to it.

Most teams running Postgres in production have never audited their code for read-committed lost updates, and most of them are fine. The failure rate is low. But "low failure rate with silent data loss" is a different risk profile than "visible error you can retry," so the choice between them should be deliberate.

Postgres read-committed behaviorPostgres read-committed behavior

Why Convex Surfaces OCC Errors and Transaction Limits

Convex caps transactions at 1 second of execution time and 1MB of read/write data, and it surfaces OCC errors when your transaction races another write to the same record. These limits are the database honestly telling you that a transaction has hit a physical constraint that every database has, so you can address it at the application layer instead of discovering it as a production incident.

The 1-Second and 1MB Caps Are Honest Not Arbitrary

Any transaction that runs for a long time, in any database, is paying a cost. Under PCC it holds a lock for that whole duration, which means every other transaction wanting the same record is queued behind it. Under OCC it accumulates more chances to lose the commit race, so longer transactions on hot records have higher abort rates by construction. Convex bounds the problem at the front end by capping how long and how large a transaction can be, which prevents a single misbehaving query from cascading into a system-wide slowdown.

You can read the precise wording in the Convex transaction limits documentation, but the design intent is the same: enforce the constraint early so it can't hide. A 30-second transaction in a database that allows them isn't a feature; it's a future incident.

What an OCC Error Actually Means

An OCC error in Convex means your transaction read a record, did some work, and then tried to commit, but another transaction committed a write to the same record first. Your transaction is aborted and your client (or the Convex runtime) can retry. The data is consistent. Nothing was lost. The error is the database telling you "this record is contended; the work you did is stale."

Diagnosis is straightforward. Identify the record that multiple transactions are writing to. The treatment is everything we discuss in the next section. For the full error model and remediation patterns, the Convex documentation on OCC write conflicts covers the diagnostic flow, and the design rationale here is similar in spirit to the one explained in why Convex omits .select() and .count(), which surfaces opinionated constraints honestly rather than hiding them.

How to Speed Up Contended Transactions

The fix for a contended transaction is almost never "make the database lock harder." It's one of three patterns:

  • Shorten the critical section
  • Introduce staleness through aggregation
  • Use a pre-built component that handles the pattern for you.

Three contention remediesThree contention remedies

Option One Make the Operation Faster

The cheapest fix is to do less work inside the transaction. Read fewer rows, write fewer rows, move computation out of the transaction where possible. A transaction that takes 50ms loses far fewer OCC races than one that takes 500ms, and it holds far fewer locks for far less time if you're on PCC. Most contention problems are partially solved by shrinking the critical section before you reach for anything more sophisticated.

Look at what's inside the transaction. Network calls and slow third-party APIs should never be there. Heavy computation that doesn't need the transactional view of the data should be moved out, run separately, and only the final write should re-enter the transaction. The smaller the critical section, the less surface area you have for races.

Option Two Introduce Staleness Through Aggregation

The deeper fix is aggregation. Back at the castle, instead of every accountant writing directly to the king's ledger, you appoint a tax collector for each region. The accountants write to their regional collector. The collectors batch the totals and write to the king on some interval. The king's ledger now sees one write per region per minute instead of one write per accountant per second.

The leaves of the tree are eventually consistent, but the root is correct. You've traded some staleness for a dramatic reduction in contention at the hot record, and most read paths can tolerate the staleness because they read the rolled-up total rather than the live stream of writes. This pattern works in any database. It's the underlying mechanism behind sharded counters, leaderboard buckets, and most "high write throughput" systems you have seen described as scaling miracles.

The mental shift is admitting that you don't actually need the absolute latest value at the hot record for most reads. You need a value that's correct as of some recent moment. Once you accept that, the design space opens up considerably.

Option Three Use Convex Components

Many of these patterns are already implemented as Convex Components so you don't have to build them yourself. Sharded counters, aggregations, rate limiters, and similar building blocks ship as composable modules. If your application is hitting OCC errors on a hot record, the right move is usually to swap the direct write for a component that already handles the staleness/aggregation pattern correctly, rather than inventing your own.

Building these primitives from scratch is harder than it looks. A naive sharded counter is easy, but one that handles compaction, rebalancing, and read-time aggregation efficiently isn't. Reaching for a component that already encodes those decisions is usually the right call.

When to Choose Optimistic and When to Choose Pessimistic Concurrency Control

The choice comes down to your contention profile and what failure mode you can tolerate.

If your workload is read-heavy with rare write contention, use OCC. The overhead of locks is wasted and the retry rate under OCC will be negligible. This describes the majority of application workloads.

If you have predictable high-contention hotspots and very short critical sections, PCC can be more efficient on paper, but watch for deadlocks and the blast radius of any single slow transaction. In practice, OCC plus aggregation almost always beats raw PCC for the same workload, because aggregation removes the contention rather than serializing through it.

If you're on Convex and hitting OCC errors, don't look for lock-style workarounds. Shorten the transaction, introduce aggregation, or adopt a pre-built Convex component that already solves the pattern. Lock-style fixes aren't available in Convex by design, because the same fragility patterns that pushed the industry toward OCC in distributed systems apply here.

As a rough heuristic from the literature, OCC outperforms PCC when conflict rates are below roughly 10 to 20 percent, and PCC wins above that. Real-world operational fragility shifts the practical answer further toward OCC, because the failure mode of OCC (visible retry) is easier to recover from than the failure mode of PCC (cascading lock waits).

Frequently Asked Questions

Q: What is the difference between optimistic and pessimistic concurrency control? A: Pessimistic concurrency control (PCC) locks records before reading or writing them, so conflicting transactions wait in a queue. Optimistic concurrency control (OCC) lets transactions run without locks and validates at commit time, aborting the loser of any race. PCC trades latency for guaranteed serialization, whereas OCC trades occasional retries for the ability to keep the database server unlocked and the failure mode local to each transaction.

Q: Why does Convex limit transactions to 1 second and 1MB? A: Long transactions are expensive in every database. They hold locks longer under PCC or lose more commit races under OCC. Convex bounds transaction length and size so a single misbehaving query can't cascade into a system-wide slowdown. The limits surface a constraint that every database has; most just hide it until production.

Q: Does Postgres really lose data by default? A: Under the default read committed isolation level, Postgres can silently lose updates when two transactions read and modify the same row concurrently. The classic case is two transactions both reading a balance, both computing a new value, and both writing it, so one write overwrites the other with no error. The fix is SELECT FOR UPDATE or serializable isolation, but neither is the default.

Q: What causes an OCC error and how do I fix it? A: An OCC error means your transaction read a record, did some work, and then tried to commit, but another transaction wrote to the same record first. Your transaction is aborted and retried. The fix is to reduce contention on the hot record: shorten the transaction, introduce aggregation so the hot record sees fewer writes, or use a Convex component that handles the pattern.

Q: When should I use serializable isolation in Postgres? A: Use it when you need strict correctness guarantees on transactions that read and then modify the same data, and when the application can handle serialization-failure retries. It's the right choice for financial logic, inventory, and anywhere a lost update would be a bug. The cost is occasional retry overhead, which is the same trade-off OCC makes everywhere.

Q: How do I handle high-contention records in Convex? A: Identify the hot record, then apply one of three patterns: make the transaction faster (fewer reads and writes), introduce staleness through an aggregation tree (sharded counters or rolled-up totals), or adopt a Convex component that already implements the pattern. Lock-style workarounds aren't the right shape in an OCC system.

Putting Concurrency Control to Work in Your Convex App

SummarySummary

Concurrency control is one of the few places where a database's design choices are visible in your application code. Convex's 1-second and 1MB caps plus its OCC error model are the database’s refusal to hide a constraint that exists in every system. If you understand the contention profile of your workload, the right pattern is usually obvious:

  • Shorten the critical section
  • Aggregate through staleness
  • Compose a pre-built component that already does it

The teams that hit OCC errors and read them as a signal that this record is contended and the application needs an aggregation layer get further faster than the teams that treat them as bugs to engineer around. The error is information. It's telling you that the shape of the workload has changed and the data model needs to change with it. Acting on that information early is cheaper than acting on it during an outage, which is the alternative path most databases offer by silently absorbing the contention until it breaks.

If you're hitting OCC errors today, the most effective move is to explore the Convex Components library and find the one that matches your contention pattern, rather than trying to engineer a lock-style workaround that doesn't fit the system. The component approach also pays compounding dividends, because the same aggregation primitives that solve your current hot record will solve the next one, and the one after that, without each team having to rebuild the pattern from scratch.

The summary is short. Every database has a speed limit on contended writes, because physics. PCC hides that limit behind a queue and risks cascading failure when one transaction misbehaves. The Postgres default hides it behind silent lost updates that most teams never audit for. Convex surfaces it as an OCC error you can see, retry, and design around. Given the choice between a visible constraint and a hidden one, the visible constraint is the one you can engineer against, and the patterns to engineer against it (shorter transactions, aggregation, components) are well-understood, portable, and available today.

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started