2 years ago

It's not you, it's SQL

Remember the NoSQL movement? The year 2010 was a simpler time, and software engineers were really starting to take over the world. Y Combinator had just funded a billion startups all making the next great social network for hamsters, wristwatch enthusiasts, and Of Monsters and Men fanfic authors.

The early engineers at these startups were busy–busy being rock stars. There was no time for CREATE TABLE. They just wanted something to hang onto their python dictionaries, man.

Pictured: dutiful reader of all Paul Graham’s essays

And that something was MongoDB. MongoDB happily took our Python dictionaries, stored them away somewhere, and sometimes even gave them back later. No hand-crafted SQL strings littering our Python codebase, and everything still worked.

It was like a veil had been lifted. “What was with all the ceremony, SQL? My controllers are so lean now, and my schema is whatever I want it to be." We paused just long enough to take a sip of our Spicy Maya Mocha from Coupa Cafe. "I mean, so what if none of my writes are ever actually confirmed by my new database? These are just hamster-likes and wristwatch-enthusiast-pokes! We can lose a few and still get to our Series B.”

MongoDB user experience circa 2010.

With the rise of MongoDB, NoSQL was off like a rocket. SQL’s days were assuredly numbered.

Google trends, probably blowing your mind right now with how quickly the MongoDB hype escalated.

Teams everywhere rejoiced, ecstatic with their novel and consequence-free database choice–for approximately 4 months until their project exploded.

The author of this post may or may not have been responsible for one of the above debacles.

Graciously did the database elders glide onto the scene, wise and serene. With open arms did they allow the newly humbled upstarts to re-devote themselves to the one true SQL way.

I'll save you some reading and summarize these comment sections: "I told you so."

And we’ve used SQL ever since, never wanting more, never straying, ever confident that 1970s academics anticipated exactly what all software would need for time eternal.

Right. So what’s the problem?

The silly narrative above feels like I’m picking on early MongoDB, and admittedly I am. But MongoDB was right. Developers desperately want something better than SQL.

What went wrong in this early movement wasn’t the suggestion to stop using SQL-the-language, but the immaturity of the contemporary implementations of the NoSQL platforms.

Here’s where we actually ran into problems:

Data pretty much always have relationships, and databases need to support them
Real multi-document ACID transactions are essential
No matter how great your programming model is, if your database doesn’t, like, save data, perform consistently, and so on, no one can use it for anything important

But the "let’s-use-less-SQL" part was a pretty good idea.

Now I don’t hate SQL–in fact, at Dropbox I ran the storage and databases group which managed 10,000+ MySQL servers supporting the better part of a billion users. It can be done. But can doesn't mean should.

So without further ado, here are the top three reasons why SQL sucks:

SQL Sucks #1: Blindfolded data model gymnastics

SQL’s data model is perfect–if it was created in a vacuum fifty years ago! Oh, it was? Hmm. That makes sense because it’s not our application’s data model at all. And that discrepancy gets more painful every day as our application tools and workflows get better.

For one, writing adapter layers that translate our application’s data model into SQL’s native “carefully formatted strings” is a fragile, error-prone waste of time. And these days, we want real types, with auto-complete and type checking and squiggly red lines when we’re naughty.

Also, it’s time to talk, SQL. It’s time to admit you picked the wrong fundamental data structure for your records. We don’t want a flat array of typed fields. We’re still in love with nested, hierarchical information. Our dictionaries, our objects, our structs, our hashes, our documents. Whatever we call them, MongoDB was right: documents have won, and when it comes to you, SQL, they are bolted on at best.

Jeopardy meme "I'm sorry, the answer we were looking for was 'documents'"

So, the next generation of databases needs to support typed, relational documents. Not merely typed top-level fields with an unstructured JSON data type. We want to be able to type the whole nested hierarchy of fields and use that structural information to make interacting with our stored data as easy and error-free as possible.

Tab! Tab! Tab! PostgreSQL demonstrates its field autocomplete feature.

This a reasonable expectation because it’s exactly what we can do in the rest of our application. Why should our databases get away with any less?

Ouch. What are my options?

New, document-aware databases. MongoDB! CouchDB. Firebase. Lots of bold new databases that have better support for documents, albeit not always typed and relational.
So… many… ORMs. Can someone else write the SQL and the data mapping for us so we can pretend it all doesn’t exist?

Convex’s take

Database query construction in Convex. Slightly more helpful, no?

In Convex, the application and database types are automatically equivalent because the entire data pipeline from frontend to database uses the same exact types and definitions. Autocomplete and type-checking are ubiquitous. There is no adapter code between languages because everything we write is just TypeScript.

SQL Sucks #2: Writes are confusing and fragile

At first, there was SELECT. Well, at first, relational algebra, and then SELECT with all its bells and whistles and joins and subqueries to bring usable relational algebra to the masses. We were reading data like it was going out of style.

And then some impoverished grad student (probably) cleared their throat and asked quietly, “but how will the data get in there?” And in a moment of ~~impulsiveness~~ scrappiness, INSERT, UPDATE, and DELETE were bolted onto the side of the thing.

A crab with a large claw labeled 'select' and a small claw labeled 'insert'

Okay, that’s probably not exactly how it went down. But it definitely feels like INSERT and friends are in serious afterthought territory.

Command(s)	PgSQL Docs Line Count
INSERT + UPDATE + DELETE	1177
SELECT	1517

Unlike SELECT, these operations don't feature JOINs or subqueries or any other magic that brings together tables¹. They’re simple record-level constructs that feel out of place compared to SELECT. And while they don't need those features, they should excel at:

Expressing dependencies for writes
Allowing multiple things to be changed together atomically

But on their own, INSERT and UPDATE don’t help with these things at all. Instead, you need to use a convoluted, context-dependent distributed transaction protocol to get the write semantics you want.

You see, to these databases, your app is this whole other thing. The database doesn’t understand the mysterious ways you decide what to write, so you need to tell it pretty explicitly. Achieving the behavior you want is quite subtle and confusing and even when you do it “right” it leads to applications that break easily.

Let’s check out an example. We’ll model a simple app with users and blog posts, and a denormalized post count we track on the user record. First, the transaction starts:

1BEGIN;
2

Then, we need to express “I’m going to issue some writes that assume the data I read stays the same until the transaction ends so I don’t miss any other update.” Without this, we could undercount our posts.

Time to grab giga-claw SELECT and ask it to do yet another job for us with SELECT FOR UPDATE. (This is a simplification, but masochistic and/or pedantic readers are welcome to read the world's most painful footnote.²)

1SELECT post_count, ... from users where ... FOR UPDATE;
2

Our app does const new_post_count = post_count + 1³, and then it's time to issue our writes:

1INSERT INTO posts VALUES(...);
2UPDATE users SET post_count = new_post_count WHERE ...;
3

And finally, we need to tell the database server to COMMIT the transaction. Here’s a recap of the whole thing:

1BEGIN;
2SELECT post_count, ... from users where ... FOR UPDATE;
3INSERT INTO posts VALUES(...);
4UPDATE users SET post_count = new_post_count WHERE ...;
5COMMIT;
6

So what’s wrong with this? A confluence of two unfortunate things.

SQL RDBMSes are using pessimistic locking (usually²). So when you SELECT FOR UPDATE, you’re locking the row(s) (kinda²) that came back from that request until this transaction ends.
Each one of these requests involves shuffling SQL to and fro on a TCP socket, which is slow. And worse, it's failure-prone. If the network suddenly disconnects after our SELECT FOR UPDATE, that row will stay locked until the request times out, or the socket closes, etc. Sometimes this can be minutes.

This can effectively freeze up our entire database as our threads or processes are blocked on a SQL statement that wants to modify the locked record.

Oh yeah, also problematic: the need for this damn footnote². See, even this particular instance of tricky write behavior is only accurate in specific configurations on specific database systems. Otherwise, you have a different set of fun lessons to learn.

Ouch. What are my options?

Don’t support this ACID stuff at all! Dynamo-style databases, early MongoDB, etc.
Use stored procedures in SQL RDMBSes. This works but isn’t a pattern modern developers love. The languages and workflow fit even less naturally with our apps than plain old SQL.
Rely on data structures like CRDTs with automatic but limited transaction behavior.
Set aggressive maximum query times, TCP socket timeouts, to bound the worse-case and ROLLBACK abandoned transactions with hung locks.

Convex’s take

In Convex, all write operations on the database are done in embedded functions that utilize Optimistic Concurrency Control when there is conflict. This means if any record we read is overwritten by another transaction while ours is running, ours will simply be re-run until it has a conflict-free pass. Then all the transaction’s writes will be atomically committed.

No writer is ever blocked, and the whole transaction finishes very quickly because there is no need to shuttle the steps of this transaction across a network.

At first glace, this might seem exotic and complex. But what it means in practice is Convex developers can write their transaction functions the obvious way and everything will just work.

Here’s our SQL transaction written on Convex instead:

1export default mutation(async ({ db }, email, post) => {
2  // Get the user by email
3  const user = await db.query("users")
4    .filter(q => q.eq(q.field("email"), email))
5    .first()!;
6  // Insert a post and increment the users's post count
7  post['user'] = user._id;
8  await db.insert("posts", post);
9  await db.patch(user._id, {num_posts: user.num_posts + 1});
10});
11

SQL Sucks #3: Reads are too powerful

Well, but at least SELECT is great, right? Unfortunately… it depends. It’s either great or really, really bad. Very specifically, SQL is great when throughput matters more than consistent, dependable latency.

That means SQL is phenomenally great for OLAP–analytical processing, or “big data”. When you have many TB or PB of data to scan, and you want to take the execution time of a sophisticated mega-query down from two hours to three minutes, let SQL rip. You can express a super powerful query with joins, aggregates, subqueries, window functions, etc. And then an incredible query planner with millions of dev-hours invested into it will fuse things and flatten things and do memory-efficient distributed joins and other amazing tricks you and I could only dream of. Your query throughput goes way up and you’ve saved yourself minutes or hours, along with lots of money.

The problem is the databases that back user applications–OLTP databases–explicitly do not want any of that. Minutes or hours are not a thing in OLTP. Really only milliseconds matter, and if you want a world-class user experience, probably not much more than a few tens of milliseconds.

These two things are both excellent, but at very different jobs.

This means in practice every OLTP query eventually is required to use an index or two, and any joins need to be very simple–perhaps one level of fetching a small set of indexed records through a foreign key.

In this world, a human is waiting on the result. And therefore we need a total backend request time of less than 100ms. These constraints do a funny thing when it comes to our priorities. It matters a lot more that the slowest queries in our system are not too slow than the average query is faster.

Let’s make this concrete for a second. Imagine we want all our requests to finish in less than 100ms. Here are two sample distributions of our application database’s query timings:

Scenario 1 - Less throughput but less variance

This is a good distribution for OLTP, because it has a low standard deviation and a low p99. Because the slowest responses were fast enough, every request succeeded.

Scenario 2 - More throughput but more variance

This distribution has a lower average query time (mean is 10.5 vs. 12.6), which means it’s probably cheaper to run than the previous one as it provides more “queries per second”. But it’s significantly less desirable for OLTP because the slowest outliers risk failing the request, causing retries, queuing and congestion, etc. By the way, those queries to the far right? Those are probably clever JOINs.

What OLTP really wants is a lack of surprises, and therefore OLTP likes it when it’s always very obvious when we’re asking the database to do something expensive.

SQL is completely antithetical to this goal. Like… what’s wrong with this SQL?

1SELECT * FROM users WHERE email = 'billg@microsoft.com';
2

Study the query really closely. I’ll give you a minute…

...

… did you catch it? Here’s the error:

1CREATE UNIQUE INDEX users_email ON users (email);
2

You almost could have missed it! 😄

Seriously though, what you might not know is that until we added the index, the relational database system was table scanning–meaning reading every row in the table and manually checking to see if it matched our given email address.

Well, not ideal, but simple enough. All fixed.

Unfortunately, not quite. Something else you might not know is that the query planner does not promise to use that index just because it exists. As your table changes, the query planner might suddenly decide to change a particular query’s strategy, reverting to a table scan instead of an index scan. Now, these query planners are Really Smart™, but every now and then they misfire and make a query devastatingly slower. Like, catch-up-with-your-old-friend-Pager-Duty slower.

Interestingly enough, the MySQL project has responded to the needs of frazzled operational teams, and now has a FORCE INDEX clause available in their SELECT statement, but PostgreSQL makes it a lot harder to turn off its query planner because… principles?

Meme Meme

Ouch. What are my options?

Most teams at scale, even if they’re using a SQL RDBMS, essentially subset SQL down to the point where it’s a glorified key/value store with better types and relationships. The few joins they run online are very carefully optimized, and some data is denormalized. So they’re not really using SQL, they’re using a mature storage engine and replication infrastructure.
Stores like simple key/value, Redis, systems with a bound complexity on their operations

Convex’s take

Maybe you haven’t used Convex yet, but let’s see if you can spot which one of these queries is using an index and which isn’t:

1// Find email by index?	
2const user = await db.query("users")
3	.filter(q => q.eq(q.field("email"), email))
4	.first();
5

1// Or is it me?
2const user = await db.query("users")
3  .withIndex("by_email", q => q.eq(q.field("email"), email))
4  .first();
5

I bet you got this one right.

Something’s in the way

At the end of the day, for application databases, SQL’s biggest issue is it’s just not adding much value as an intermediary between our application and our records on disk. The more we can just model, modify, and aggregate our data exactly one way end-to-end, the better off we are.

With systems like Convex, we can share types, use the same data structures and helper functions, and can know exactly how our application will be accessing its data.

So, let’s just thread our application and types all the way through and use a single, powerful, developer-centric set of tools and workflows to manage them. And maybe “NoSQL” won’t end up being a bad word after all.

Footnotes

Technically you can use JOIN with these operations, but the actual mutation is on a single table. ↩
Okay, hi. Welcome to the footnote from hell. First of all, if you didn’t know about SELECT FOR UPDATE, there’s a good chance your code is wrong. Your database transaction alone doesn’t guarantee your writes are based on your reads, but many applications subtly depend on this being true. If this is you, you may have incorrect records in your database due to data races, but it’s very difficult to know for sure. Now some outraged among you may be whipping up a tweet akin to: “Wrong, SQL hater! Because my database/transaction opts into serializable isolation!” That’s fantastic, but (1) this is not the default on the most popular SQL implementations, (2) most developers don’t even know about the different isolation levels, and (3) even if they did, if they went that route they now need to litter their application with transaction conflict retries after ensuring that retrying is actually sound with respect to side effects. Oh also, SELECT FOR UPDATE only locks rows actually returned, so if your WHERE clause had a predicate that caused a range scan, you might have corruption anyway if a new row gets inserted into that range… depending on your application semantics… but only on PostgreSQL, not MySQL, and… UGGGHHH-let’s just stop here. If you didn’t know all this before and you thought databases just worked, sorry. Now that you do know this… sorry. ↩ ↩² ↩³ ↩⁴
Okay sure, a modification as simple as this toy example could have just been an inline UPDATE. But imagine there's something more complex being computed we need our application to help with. ↩

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started