Operational maturity is the umbrella term I like when thinking about scalability, security, observability, and other important aspects of a serious product. Similar to scaling, it isn’t a destination, but a continual process. There is no one checklist that an app goes through once. Rather, you should understand where you are in the journey, what the biggest risks are, and what incremental steps are available.
This post will cover various areas of operational maturity, and link to posts outlining steps to take as your app develops. The advice will specifically reference Convex but the concepts are generally applicable.
I’ve worked on teams and products all along this spectrum, from launching a new GCP product for startups like Clockwork, to greenfield products for established companies like The New York Times, to managing the Dropbox infrastructure responsible for file previews—involving hundreds of servers in multiple data centers handling millions of image requests per day targeting three nines of availability. I can assure you, all of these did not—and should not—have the same level of operational maturity.
1. Prototyping: YOLO
When you’re first building your app or bootstrapping your company, you want to move as quickly as possible. If you’re spending a lot of time thinking about load balancing, connection pooling, data architecture for future-proof sharding, or Kubernetes, then you likely aren’t thinking enough about the human problem you’re trying to solve. For reassurance, I have a heuristic that for every order of magnitude increase in users, an app often gets re-architected or re-written at some layer. Your database schema in your first commit to your git repository does not need to be the data model you launch with. Don’t let perfect be the enemy of good.
Tips include: version control, liberal logging, interactive database queries, auto-reload, auto-deploys, keeping your stack simple, snapshotting data, seed scripts, deferring auth, loose schemas, manual migrations, and more.
2. Observing your app
When your app is running, observability allows you to see what is happening, how your product is being used, what is going wrong, and help you debug the “why” behind it all. It is a critical piece of running an app in production, where you don’t have debug access to all of the devices interacting with your software.
You can start with simple logs, and incorporate dedicated tools over time like Axiom, Sentry, Plausible, PagerDuty, Databricks, and more.
3. Testing for peace of mind
From a pragmatic standpoint, testing allows you to validate behavior, catch regressions in performance or functionality, and ultimately give you peace of mind. When you have high confidence in your testing, you will feel confident shipping more frequently.
From end-to-end tests to unit tests, from manual to automated strategies, there are a lot of options to choose from when deciding what to up-level next. Often-overlooked aspects of testing are how you test subjective changes in production, and testing your app from outside of your own ecosystem. The latter helps to catch issues with hard-to-test parts of your stack like networking and configurations that only exist in production.
4. Protecting your app from yourself
Even if your code is well tested, you can still make mistakes in how you interact with the powerful tools at your disposal. The source of many major internet outages have been from someone mis-typing a command, for instance running a destructive—like deleting a table—in production when they meant to run it against a development instance. Over time, you’ll need to invest in safer processes around changing code, configuration, and data in production.
Some areas of investment include:
Deployments: push-time checks for environment variable definitions, checking for accidental deletion of large indexes, isolating your production deployment from staging and development workflows, and avoiding breaking and inefficient schema changes.
Migrations: codifying mutations in code, verifying them against seed data, validating a dry run, and opt-in automation.
Scoped data changes: authenticated, authorized, audit-logged changes to production data through dedicated admin interfaces.
5. Hardening your app
Your app needs protection from more than your own mistakes. When you launch to production, you’ll need to consider how clients might misbehave. A backend needs to protect against bad input, or requests that try to access or modify other users’ data. As your customers start to rely on your site, you’ll need to refine your authentication and authorization story.
As your app grows from dozens to thousands to millions of users, the performance and reliability of your app become more important. This can include considerations for organic growth such as:
Using work stealing pattern when running your own infra and want to optimize for throughput.
Load testing your app to stay ahead of your users’ growth.
Summary
Operational maturity is an ongoing process that covers a wide range of topics. We’ve touched on many ways to level up your app, but this list is neither exhaustive nor essential. The important decisions to make are:
Where are your gaps?
What is worth investing in next?
When is the right time to take the next step and re-evaluate?
And if this has gotten you interested in how we think about the future of product development here at Convex, check out this video:
welcome back to the main stage up next we're excited to welcome James calling CTO and co-founder at convex to talk with us about how backend should be designed for front-end developers James welcome to the main stage and take it away thank you Aaron um hey everyone so I'm an infrastructure guy uh but I want to talk about why infra I think is letting down product developers and my background's in large scale distributed systems and my co-founders and I designed and built a lot of the big infrastructure Dropbox stuff like multi petabyte or multi exabyte distributed storage systems comprising a million discs across half a dozen data centers or multi-petabyte distributed databases that serve like literally millions of fully consistent transactional queries per second and now I'm a co-founder at convex which is a back end as a service that synthesizes everything we've learned about systems into platform for building applications so it's the database the compute the scheduling the data sync the storage everything you need as the backend for product development but convex is a very opinionated platform and it's designed with some strong principles that I think lead to a surprisingly good developer experience when building full stack applications and when writing this talk I was thinking back to the origin of some of these ideas and my time doing a PhD with a woman a professor named Barbara lisof at MIT so you might have heard of Barbara she's one of the most important figures in computer science and the lisov substitution principles named after her but I think part of the reason she was so influential is she started in the early days of programming language Theory and was actually the person who pioneered the idea of abstraction in programming languages and then moved into systems so distributed transactions and consensus protocols and I think the programming language world has a lot of really important lessons that can be applied to how we think about backends and how they're used by product developers so this won't actually be a deeply theoretical talk at all it's actually going to be a little bit of a provocative talk and I really want to talk about how backends should look to better serve full stack developers and the reason I'm giving this talk is because of a crisis we have on our hands it's a new crisis and I think it can be hard to recognize it if you don't know about the first crisis so I'm going to go back to the software crisis which was a real thing it was a huge topic of concern back in the late 60s and the early 70s and you can go read about it on Wikipedia But the essence of it was that as technology improved productivity actually slowed down it became harder to write programs and this is dyra talking about this he was a famously smart dude and he said that when we had a few weak computers programming became a mild problem and now we have gigantic computers programming has become an equally gigantic problem now obviously computers today are a lot more gigantic than they were back in 1972 but this was him observing that the industry was grining to a halt it was getting harder to write programs so why was this really smart person in 1972 saying hey we're kind of grinding to a halt and things are kind of largely okay right now I think the reason you don't hear about the software crisis anymore was that it was largely solved or at least ameliorated by this thing called abstraction a really really important thing so once upon a time there was just shared memory and go-to statements and Global variables and any new feature added to a program or also added to the cognitive overhead of maintaining a program but then we added information hiding and modules and functions and Abstract data types and all of a sudden everything became a lot more tractable and the real key convention here that resonates for me is that for well-designed software your relationship with a library or a system ends at its API you don't need to know how it's implemented you don't need to know what Downstream libraries or systems it uses and you don't need to know about situations where it doesn't do what it's meant to do you just interact with the invariance specified by the function and that's the end of the story so this is how we stand on the shoulders of library and systems Builders rather than being burdened by them and it's really the only way to build large systems large products and large organizations so why does this matter to you I think we have a similar crisis on our hands a platform crisis and this is just a phrase I made up just now but the landscape for developing applications in my opinion is a horrendous mess we have developers on day one of their application trying to learn how to administer kubernetes a technology that a decade ago was only the domain of multi-billion dollar companies or reasoning about data freshness and the difference between data rendered statically on a server even a server component or dynamically on a client or cach somewhere or stored within Edge database these are very complicated programs complicated problems and I'm hearing from product developers that are complaining that their relationship with the platforms that they use don't meet what I would call the abstraction test right so they have to think about a huge amount of complexity when using and interacting with their backends rather than just treating it like a function and I think this is because platform developers have forgotten about the software crisis they've forgotten the real value of abstraction which is to make problems go away and reduce the number of moving paths a developer has to think about so the issue here is the backends don't just feel like libraries they feel like complicated conditional poorly composed sets of tools so I'm going to outline outline four major bad ideas in the status quo of full stack development and how I think they can be rethought in a more principled fashion and these are areas where backends can do better to provide a simple abstraction for application developers so let's start with bad ideas in query models how applications access the database both in terms of exposing the database directly to clients shout out Firebase right there and declarative query languages like SQL which I'm sure will ruffle a few feathers so in an era before the serverless movement where it wasn't so easy to have Lambda functions running in a cloud or an API server running somewhere there was a movement towards getting rid of API servers Al together where you have client JavaScript code talking directly to the database just accessing documents in a table and I have a lot of respect for fire B for being a very pioneering platform with great developer experience in many ways but I think this decision was a mistake it was a mistake motived by by desire to make the database accessible to application developers in their native languages whether that's mobile or web but it has a great many disadvantages it exposes your private data model directly to the client it makes it hard to manage Secrets or private code it leads to thick clients which have a lot of logic in them and then later on complicated initiatives to try to speed up page load because of all that software sitting on the client and it really erodes the concept of a transaction where you want to do multiple separate things talk to Separate Tables but you want them all to happen once in a single transaction that becomes much more difficult when the client's in the loop so you force developers into a development model that's accessible but it really makes life difficult later in the life cycle of a project which is something we hear over and over again from developers on these platforms it also gives birth to well-intentioned but I think misguided initiatives like encouraging developers to adopt Road level security really early in a project I know many billion dooll companies that don't allow Road level security in their databases or don't enforce Ro level security in the databases because it can get horrendously complicated and generally it's a lot easier to perform Access Control checks on the server which you can't do when the client's talking directly to your database and now I'm going to commit sacr and say that SQL sucks today so the people who invented SQL are very smart definitely smart and I say this is someone who's who's managed literally tens of thousands of SQL service before and I like SQL as an analytics query language but for live user facing low latency transactional applications SQL to a large extent is somewhat of a successful Abomination almost no company runs handwritten SQL statements in the live path of their core services the query language is too brittle and inexpressive the query plan is unpredictable and will suddenly do something like performing a full table scan without you expecting it and even if you passionately disagree with my point here I think you probably secretly agree because you're probably using an OM to hide SQL except that this om is only providing the illusion of guarantees which I'll get to a little bit later on so I'd like to instead present what I think is the right idea for how to run a database in the modern age or database queries in the Modern Age and that's just to run code inside your database this is not just stall procedures I'm talking about real typescript functions that run in transactions directly inside your database so this is a mutation in comex it's just the program it runs as a strongly consistent asset transaction you write this code alongside your web app you push it to the server and then you can just call it from client code just like an API it can access Secrets it can talk to other functions and invokes functions like uh features like scheduling or storage but critically it's a transaction it runs as a single Atomic unit you don't have to think about race conditions you know and if it fails it'll get retried so there's no need to compromise on expressivity here because it's literally just code and you can go back to having platforms that actually function like libraries with clean abstractions so I want to move to the next domain where I think platforms fail or the abstraction test which is their type system and I have a lot of respect for mongod Tob for example for making databases accessible to web developers but the fact that [ __ ] is a largely untyped Json document store means you have to give up on all the familiar programming language guarantees you're used to so there's a reason why everyone's moving to typescript from JavaScript types are good they provide actual guarantees and while I've criticized SQL the relational model is also an incredible idea data is often related to each other and the references between these items of data need to be typed to and I probably don't need to convince you that manually formatting SQL strings is a terrible idea and has led to enormous number of bugs and vulnerabilities including SQL injection attacks stuff like where you set your email address in a form to your semicolon droptable users or whatever and the entire users table gets dropped this is the canonical argument in favor of type safety in query languages but to be a bit more controversial I think OMS are a leaky abstraction too because they provide the illusion of type safety and a convenient API but they're really slapped on top of a database like postgress that provides neither of these things they give the illusion of safety but don't actually represent the semantics of the database so this includes old ORS like SQL alchemy that I've used personally a lot and I've witnessed many bugs being written in but even with way way better fors like Prisma which is really quite good in many ways but it's just a layer on top of someone else's database at the end of the day there's no true endtoend type safety between what the developer's writing and the client application and what's actually getting read or written from the database so the Holy Grail is true endtoend type safety where the types the application developer uses are literally the types of the data that resides within the database where this a complete for everything even for the names of the tables in the database this is how things work in convex the queries are typescript and they map directly to convex types there's auto complete and type safety and you can see here in this animation that you can kind of iterate your way to success without worrying about bugs showing up so here the database and the query model are designed together so you can just think about it like programming but you might ask that if type safety is so important why are document stores like mongodb and Firebase so popular well they're popular because they're convenient they're popular because they don't force you to write a schema definition for a program you haven't even written yet and haven't even decided how to model right this is a a screenshot of some um you know SQL schema definition here with all the complexity that it comes with so this is a development velocity issue and it's an important one which I'll capture as a bad idea on its own and that bad idea is premature formalism because none of this matters if you can't move fast and develop your application to begin with fortunately we can have our cake and eat it to because conix will automatically infer the schema for your application purely based on the data that's been inserted into the database so you can start off just treating it like a Json document store like [ __ ] and it'll let you incrementally enforce schema on your schedule when it makes sense to do so type inference in convex Works in real time and it uses an algorithm that computes the simplest possible type based on the data in the table so if you insert a string into a field the schema will be string and if you then insert a floating Point number into that same column the schema will be string Union float and then if you go delete that original string Row the schema will revert to float to be the simpler type so this happens automatically and precisely without slowing down the database but what can you do with this well you can then go and enforce that schema when needed when you want that type safety so this is a screenshot from the convex dashboard for one of my applications showing code that we can copy and paste into our application and have endtoend type safety for these two tables called items and cards and you can see by the way there's a typed reference here from an index from an item in a cart to info about that item so there's type safety even for references between these tables endtoend typing allows you to forget about talking to tables don't exist or data that doesn't match a schema and allows you to develop as quickly as you would with a regular typescript app because that's what it is it's a regular typescript application so we'll move from the query model to system concerns and I think this is a really big one one that's actively getting worse as we speak many of these these things are getting better every day but I think this issue is getting worse latency matters for your application of course but latency is higher by the way if you have multiple rounds of communication between your client app and the database because you can't run functions on your database but latency and through an issue nonetheless and caching is a main technique for dealing with this but it doesn't matter if if you're running a consistent database in serializable mode if you're just slapping mcache in front of it and now you have to reason about your reads getting stale for example a situation where you insert a new row in a database and you want that row to show up in the results of your application you have to now go and manually invalidate that cache or decide that you're okay showing stale results which leads to a huge amount of complexity in a lot of applications and there's this recent movement now about the edge so the Edge by the way just means means putting servers and databases in secondary data centers close to your users for latency purposes but now you have to deal with one of the hardest problems in computer science which is Cash consistency and distributed data sync and I find it amazing that we're recommending to developers to be thinking about these techniques early on in their application when they want to be focusing on building apps in the simplest possible way so what's the better way I think the better way is Dynamic consistent caching if the database knows every function that's run on it and it knows all the data ranges that are read in every query and it knows every right that happens then it can automatically cach the results of every function completely consistently and automatically you shouldn't have to be doing this manually the system should do it for you which is what convex does every function in convex is cached automatically and we' record for every function every data range that was read and then if any new right overlaps with any of these ranges we know to refresh the value in the cache this can happen at the back end but by the way this can also happen at the edge so I'm actually am a fan of the edge I think the edge is great I just think developers shouldn't have to know it exists right they should just use the back end the back end should deal with these problems and it should be fast and that's it that's the end of the story so again don't make developers think about things they don't have to think about use abstraction to hide complexity but somewhat related to caching is what to do when data changes and there's actually a bit of a movement towards statically generated content again these days with stuff like server components in react a lot of people are talking about how we're kind of going back to PHP all over again and I don't think that's necessarily a good thing generally you want your apps to be dynamic you want them to update when data changes but generally you have to use polling to do this periodically check in the server to do so which is hard and it's inefficient and it leads to stale results and so you likely end up with some form of weak or eventual consistency and as I mentioned before it doesn't matter how consistent and transactional and aset compliant your database is if that consistency is violated as soon as a client has to actually read from the database right for almost all backends that exists right now the consistency boundary ends at the database itself but I think the consistency boundary should extend all the way to the client application the client should have a fully consistent view of the data that's stored on the back end and not see stale state so let's talk about it earlier I said that convex knows when a function is cachable and it knows when the result of that function changes it can do this very efficiently so this same technique can be used to drive subscriptions every query in convex can be subscribed to and Ed to automatically render your client components whenever data changes so this is a screenshot of a little demo app I made a while back it's just a little basic shopping cart demo with a dynamic list of items on the left and a dynamic shopping cart on the right if you add an item to the shopping cart the total in stock goes down the prices update it's a it's a simple app but one that's actually quite tricky to implement most of the time in convex this is the actual query that fetches the list of items of the database this is the thing that gets subscribed to and the the operative you know code here is mostly one line of code this is line eight where it says give me the list of items in the database that have remaining stock that's the end of the query and this is the real react component that REM that renders the dynamic items list and this is the real real code you can go search on this in GitHub I've got the link down there in the bottom right uh and again the the main line of code here is really just line six says take the query that gives me the shopping cart items store store it in the state variable called items and then anytime that state variable changes react will automatically rerender that component but you might say wait a second James there's two components on the screen here there's a cart and a stock list and those two separate components have two separate queries and two separate subscriptions so what if one updates before the other this is the web right stuff happens asynchronously so the developer is going to have to think about a whole bunch of corner cases when one updates before the other and whatever an item shows up in stock when it's not really there I think the platform should take care of this for you too so convex supports something called consistent client views the platform knows you have two active subscriptions for a given client session just ensures that they all update in lock step it always sends updates to any subscriptions over a websocket in a fully consistent order so react will just update both of these automically you'll never see a situation where one component up updates before the other and this is true end to-end consistency and it's there so developers don't have to think about stuff they can just focus on building applications don't think about data races just think about calling functions this has been a fairly Rush talk with a lot of content but I really believe that developers have been let down by backend platforms and I think we have a crisis of complexity on our hands but the solution to this is abstractions that hide complexity at all times so the developer never has to think about this with the right abstractions I think developers can build apps that are faster to build faster to operate more correct scale larger a dynamic instead of static and they can do this without a back backend team I think by doing this they can avoid the platform crisis and focus on what makes their app special which is the app itself none of this is hyper itical it all exists and you can go check it out at comvex dodev
Footnotes
You can also use Zod for finer-grained runtime validation. ↩
Build in minutes, scale forever.
Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.