This is one of a series of posts on operational maturity in production. Get more tips for best practices for running apps in production here.
Observability and monitoring are umbrella terms covering the various ways to see what’s happening with your app in the wild. This can include things like logs, metrics, exceptions, events, spans, traces, and more. This post will explore progressive steps you can take to increase your ability to introspect your app in production.
Start with logs
When you’re just getting off the ground, you’ll likely get by for a while with just looking at logs. These will include:
Debug output, such as console.debug.
Exceptions with stack traces.
Notable events, such as a user signing up, or interacting with the app.
In the dashboard
You can get surprisingly far, especially if you use the tools well. The Convex dashboard has a logs view where you can filter by log type, search, and temporarily clear logs. Failures in the frontend during development will show server-side errors in the console log, but will be hidden in production to avoid leaking server state unintentionally. To see a specific error in production, you can copy the associated Request ID and search for it in the logs page.
Via the CLI
You can also stream logs into the CLI using npx convex logs. By piping it to grep or other tools, you can debug verbose output, filtering to the events you’re interested in. One command to try is npx convex logs | tee ./logs.txt which will both print out logs and save them to a file that you can inspect and filter later, without relying on your console history.
Graduating to dedicated observability platforms
The Convex logs are a great starting point, but when you’re shipping an app to production, you will likely want to use industry standards, which come with dedicated features and infrastructure for. In particular, they can give you:
Infinite history of older logs, enriched with metadata from Convex
Unified client and server exception reporting
Graphs and alerts for custom metrics
Dashboards for insights and debugging
Trends and triage tools with AI-backed clustering
Persisted audit logging
Here are a set of actions you can take to leverage these platforms as you mature, in roughly the order to worry about them:
Persist your logs to Axiom
It’s useful to debug historical events and this is the easiest way to incrementally develop around a logs-centric approach. Axiom and DataDog allow you to stream in logs and work with them as events, and Convex will enrich them with information about the server function. It will also send logs about each function invocation, including the endpoint, its status, how long it took, and how much data and file storage it read & wrote.
See the docs for setting up log streaming here. All you need to do is copy a key and some details from your Axiom/DataDog account into the Convex dashboard.
Extract metrics from logs for dashboards
One amazing thing about Axiom is that you can turn a console.log into events that you can plot in graphs and set alerts on. You can also make dashboards from the logs sent for every function invocation, showing errors per endpoint, or percentiles on timing. Using Axiom to turn logs into “wide events,” you can do very powerful things without littering proprietary metrics calls in your codebase.
hey my name is rakib and I'm one of the engineers at convex who built our new log streaming feature and so I'm going to tell you a little bit about that for Conta convex has been supporting larger and larger customers and one of the most important things in production ready backends is being able to Monitor and observe your application so what this means for convex is we wanted users to be able to monitor the performance and support more complex query and storage patterns with logs that you generate in your convex functions uh so log streaming takes those logs as well as a bunch of other metadata and events that happen in your back end and sends it out to your favorite logging destination we currently support three log streams we started with data dog um axom which was a user request and web hooks which is kind of like our escape hatch where we post log events to whatever URL you configure and if there's another log stream that you'd like to see here please let us know on Discord or send us an email and we'll be happy to build it so one one of the tricky things about getting log streams to work in convex was really thinking about some of the consistency guarantees we wanted to provide um it's challenging because logs are some of the highest throughput events that are generated by uh many backends because it's really easy to just busy Loop and generate a ton of logs and so we had to consider what are the delivery guarantees and uh ordering guarantees we could provide on this and to see some of our decisions you can check our log streams page in the documentation um but we we took a lot of inspiration from CFA and we also looked at other log streaming systems um or I guess other stream processing systems like Apachi storm and so that was a really gratifying and really interesting part of the engineering effort for log streams so to show you some of the cool things you can do with log streams I'm going to set up log stream in axom for our project AI town here I'm on my AI Town dashboard in convex dashboard and I've navigated to the settings and then the log streams tab here I want to configure an axom log Stream So I I can uh open up the axom configuration model and I can see that I need a data set name an API key and I can specify a list of attributes so let's do that copy this go back to data sets now I can just call this AI Town put an API key and maybe I want to add an attribute here call it project anaton these attributes will just show up on all the payloads that get sent to Axiom as you can see the axium sync is verified and now active so if I go navigate to the streams page click AI town I can see that my function logs are all being streamed in the structured format to axom so I can do all kinds of cool things here so maybe I want to find which functions are erroring uh as a as a way to debug so I can look at the topic every log event in con has a topic and specifically I'm interested in the execution record so let's start with that let's look for data. status equals failure so this is any convex function which throws an error an uncaught error will generate this uh like failure event and any convex function will generate after executing will generate an execution record log event so I can see here that oh there's a bunch of functions that are erroring I can see specifically this run agent batch has this uncut server error so maybe this is something that I should probably go and debug uh there's a bunch here as well and another cool thing you could do is actually generate dashboards here I can go to this dashboards Tab and create a new dashboard I'm going to call this AI Town test and specifically I'd like to visualize which functions on my conx back end are slower and take a finer look at how I can improve the performance of those after so I'm going to create a chart here where I'm going to be plotting the percentiles of the execution times so as you can see uh they already give a suggestion for this which is nice all right and maybe I want to group by the function names themselves so oh it's right here data doore function path this will actually identify the functions so now I can save this and maybe make this a little bit bigger and as you can see here I can see that these are the latencies of my functions I can see that specifically it's it's this run agent back batch function that's taking much larger like a lot more time than all the other functions uh so maybe I need to take a final finer green look at that and of course you can set whatever latencies you want and another cool thing you can do is actually generate monitors so I'm not going to actually set this up right now but uh you can create alerts based on these metrics as well so maybe I want to create some kind of alert like you know function milliseconds and if there's any function that takes longer than 2 seconds then maybe this is something that I'd want to Monitor and I just need to generate a query here here I can just do data dot let's say that if the average of the execution time all seconds gets too high create this monitor what this is telling me is that if any of my functions if the average execution time of my functions exceeds 2 seconds within any 5minute period it'll alert me and I can set alert like monitors like who I want to alert whether it be email or whatever and so this is a great way to get visibility and into your convex deployments thanks for checking out this video log streaming has been a pleasure of a feature to work on so make sure if you have convex Pro to give this feature a try
Report your exceptions to Sentry
The baseline concern is whether your app is working. If your app is throwing exceptions, you almost certainly want to know about it and quickly diagnose what’s wrong. Reporting exceptions to Sentry allows you to see errors grouped by stack trace, and see metadata about exceptions, to figure out what is causing the issue. One tip is to integrate it with your company’s Slack or other messaging tool, so you get notified immediately about issues.
See the docs for reporting server exceptions to Sentry here. It’s as easy as pasting in your DSN URL to the Convex dashboard. You can use the same Sentry configuration for reporting client-side errors, allowing you to see all of your errors in one place.
Set up web analytics with Plausible
A dedicated platform like Plausible for looking at website traffic, including referrers, campaigns, and other insights, will help you see changes in website usage which can both indicate issues, but more importantly help you understand how users are interacting with your product. If no one is visiting the pricing page, that’s good information, even if there aren’t any software bugs.
Set up paging and on-call duties with PagerDuty
Once you have your exceptions and metrics, use PagerDuty to call and text you during an incident. Configure Axiom and Sentry to send alerts to PagerDuty, and set up PagerDuty to always break through your Do Not Disturb settings, so you’re never wondering whether there’s an issue you’re missing.
As your team scales, share the responsibilities and set up schedules in PagerDuty that can be traded around, with a secondary person to respond if the primary doesn’t acknowledge the issue after a short amount of time. One useful tip is to sync the oncall schedule with Slack in an #oncall channel, so anyone at the company can go to that channel to see who is oncall right now.
This responsibility can also extend to responding to support emails and async customer requests, though that is often decoupled to a “product on-call” role that is eventually part of a customer support effort.
The team I ran at Dropbox had the expectation to respond to an issue within 5 minutes or it would escalate to the secondary, then the whole team. This required the active primary and secondary to carry their laptops and a hot spot wherever they went. Your needs will change over time, and should be an ongoing conversation between engineering and product to support the business and promises you make to customers, without over-burdening the team.
Persist important events to tables
In addition to emitting logs for events, you might want to have more structured data to do analytics on or as part of some business workflow, for instance capturing every time a user creates a new team. You might do some offline processing to find qualified leads for sales, or later define some workflow logic around when to send various engagement emails. Wanting data in a standard, durable, consistent, query-able format is a sign that you want a database in the loop. By making an “events” table, you can write structured events with a schema, and query them later.
Inspecting your data in the dashboard
At first, you may be fine just using the Convex dashboard to inspect your data. You can use the data page’sfilters to find relevant documents. You can also use the live query editor in the function runner. You can also run custom internalQuery functions from the CLI to generate reports.
However, as your needs grow, you’ll likely want to query your data with an analytics-optimized query interface like SQL.
Inspecting data from a snapshot export
You can export your data and inspect it locally for one-off analytics. Unzip the snapshot and use jq for basic command-line inspection and manipulation on any of the tables.
When you want to do more complex investigation in SQL, including queries joining tables, use DuckDB to run SQL commands on your json data directly:
Stream tables to a dedicated analytics tool like BigQuery
Once your events are in a table, you can use Convex streaming export to export various tables to a dedicated tool like BigQuery on an ongoing basis. Analytics (OLAP) databases are optimized to do large queries efficiently, relative to transactional (OLTP) application databases like Convex. From the analytics tools, you can build complex data pipelines to learn about your data and connect it with other products such as a CRM. If you end up generating actionable data that you want to incorporate back into your application, you can stream that data into a Convex table using streaming import.
Summary
By setting up dedicated tools, you can get actionable data to help understanding errors, performance, user behavior and allow you respond quickly as data changes.
Get more tips for best practices for running apps in production here.
Build in minutes, scale forever.
Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.