Stack logo
Sync up on the latest from Convex.
Ian Macartney's avatar
Ian Macartney
9 months ago

Operational maturity for production

the basics of operational maturity

Operational maturity is the umbrella term I like when thinking about scalability, security, observability, and other important aspects of a serious product. Similar to scaling, it isn’t a destination, but a continual process. There is no one checklist that an app goes through once. Rather, you should understand where you are in the journey, what the biggest risks are, and what incremental steps are available.

This post will cover various areas of operational maturity, and link to posts outlining steps to take as your app develops. The advice will specifically reference Convex but the concepts are generally applicable.

I’ve worked on teams and products all along this spectrum, from launching a new GCP product for startups like Clockwork, to greenfield products for established companies like The New York Times, to managing the Dropbox infrastructure responsible for file previews—involving hundreds of servers in multiple data centers handling millions of image requests per day targeting three nines of availability. I can assure you, all of these did not—and should not—have the same level of operational maturity.

1. Prototyping: YOLO

When you’re first building your app or bootstrapping your company, you want to move as quickly as possible. If you’re spending a lot of time thinking about load balancing, connection pooling, data architecture for future-proof sharding, or Kubernetes, then you likely aren’t thinking enough about the human problem you’re trying to solve. For reassurance, I have a heuristic that for every order of magnitude increase in users, an app often gets re-architected or re-written at some layer. Your database schema in your first commit to your git repository does not need to be the data model you launch with. Don’t let perfect be the enemy of good.

Click here for tips on getting to an MVP fast

Tips include: version control, liberal logging, interactive database queries, auto-reload, auto-deploys, keeping your stack simple, snapshotting data, seed scripts, deferring auth, loose schemas, manual migrations, and more.

2. Observing your app

When your app is running, observability allows you to see what is happening, how your product is being used, what is going wrong, and help you debug the “why” behind it all. It is a critical piece of running an app in production, where you don’t have debug access to all of the devices interacting with your software.

Learn more about observing your app in production

You can start with simple logs, and incorporate dedicated tools over time like Axiom, Sentry, Plausible, PagerDuty, Databricks, and more.

3. Testing for peace of mind

From a pragmatic standpoint, testing allows you to validate behavior, catch regressions in performance or functionality, and ultimately give you peace of mind. When you have high confidence in your testing, you will feel confident shipping more frequently.

Learn more about testing patterns

From end-to-end tests to unit tests, from manual to automated strategies, there are a lot of options to choose from when deciding what to up-level next. Often-overlooked aspects of testing are how you test subjective changes in production, and testing your app from outside of your own ecosystem. The latter helps to catch issues with hard-to-test parts of your stack like networking and configurations that only exist in production.

4. Protecting your app from yourself

Even if your code is well tested, you can still make mistakes in how you interact with the powerful tools at your disposal. The source of many major internet outages have been from someone mis-typing a command, for instance running a destructive—like deleting a table—in production when they meant to run it against a development instance. Over time, you’ll need to invest in safer processes around changing code, configuration, and data in production.

Some areas of investment include:

Deployments: push-time checks for environment variable definitions, checking for accidental deletion of large indexes, isolating your production deployment from staging and development workflows, and avoiding breaking and inefficient schema changes.

Migrations: codifying mutations in code, verifying them against seed data, validating a dry run, and opt-in automation.

Scoped data changes: authenticated, authorized, audit-logged changes to production data through dedicated admin interfaces.

5. Hardening your app

Your app needs protection from more than your own mistakes. When you launch to production, you’ll need to consider how clients might misbehave. A backend needs to protect against bad input, or requests that try to access or modify other users’ data. As your customers start to rely on your site, you’ll need to refine your authentication and authorization story.

Simple steps include:

6. Scaling

As your app grows from dozens to thousands to millions of users, the performance and reliability of your app become more important. This can include considerations for organic growth such as:

Summary

Operational maturity is an ongoing process that covers a wide range of topics. We’ve touched on many ways to level up your app, but this list is neither exhaustive nor essential. The important decisions to make are:

  • Where are your gaps?
  • What is worth investing in next?
  • When is the right time to take the next step and re-evaluate?

We’d love to hear from you in our community Discord.

And if this has gotten you interested in how we think about the future of product development here at Convex, check out this video:

Footnotes

  1. You can also use Zod for finer-grained runtime validation.

Build in minutes, scale forever.

Convex is the sync platform with everything you need to build your full-stack project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.

Get started