Mike Cann's avatar
Mike Cann
15 days ago

Why Convex doesn't let candidates use AI in coding interviews

A conversation with James, CTO of Convex, on hiring senior engineers in 2026.

The AI coding interview has become one of the most contested formats in technical hiring. Should you let candidates use AI in interviews? At Convex, the answer is no, and not for the reason you might assume. We don't ban AI tools to gatekeep, prove a point, or pretend the job doesn't involve them. We ban them because interviews are already a crude approximation of how someone thinks, and adding more tools adds more noise to the signal we're trying to read. The inverse question matters too: should engineers be required to use AI on the job? Our answer there is also unconventional. No mandate, no ban, just accountability for the quality of what you ship.

This article is the long version of that answer, drawn from a recent conversation with James, Convex's CTO. It covers how we actually run our hiring loop, why we don't use scorecards, what a great deep dive interview looks like, and why "really lame companies" are the ones forcing AI usage from the top down.

The short answer: interviews aren't the job

"Interviews don't look anything like the job." That's James's framing, and it's the load-bearing sentence for almost every hiring decision Convex makes.

An interview is a two-hour window into a person who will, if hired, work with you for years. You're measuring how they think under mild pressure, how they communicate when they're confused, and whether their taste lines up with yours, not how they ship features. AI tooling is great for shipping features but not for revealing how someone thinks, so we leave it out of the room on purpose.

The same logic, run in reverse, is why we don't have an AI policy at work. The job is shipping features, and whatever helps you do that well is fair game. The two settings reward different things, and pretending otherwise is how hiring loops drift away from measuring anything real.

You can hold both positions at once without contradiction. In fact, holding both is the only consistent answer once you accept that interviews and the job are different activities with different goals.

Meet James: from Barbara Liskov to Dropbox to Convex

James did his PhD at MIT under Barbara Liskov, the computer scientist behind the Liskov Substitution Principle and foundational work on consensus protocols and abstract data types. His training was in distributed systems and programming language design, which he describes less as "writing compilers" and more as "treating language design as a forcing function for thinking clearly about abstractions."

After MIT, he spent years at Dropbox running infrastructure. The team was small, around seven engineers, responsible for exabytes of data and millions of transactions per second. That experience set his bar for what a high-ownership engineering team feels like, and it shows up directly in how Convex hires today.

James's opinions on hiring aren't theoretical. They come from a decade of building, breaking, and staffing systems where the cost of a bad hire compounds quickly. When a seven-person team is on the hook for exabytes, you stop believing that headcount fixes anything, and you start believing that hiring is the highest-leverage decision a technical leader makes.

Culture is decision-making, not perks

Culture is not your lunch policy or offsite cadence but how decisions get made when no one is watching. That's the working definition we use at Convex, and it informs every other hiring choice in this article.

A real value is one that constrains behavior. "Move fast and break things" is a real value because it forces tradeoffs. "We value stable infrastructure" is meaningless because no one is going to argue for unstable infrastructure. If your stated value can't lose an argument, it isn't a value.

Integrity is another one we treat as load-bearing. At Convex it shows up as transparent postmortems. When something breaks, the writeup is honest about what happened, why, and what we missed. That's a cultural artifact, not a process artifact. The process didn't produce the honesty; the culture did, and the process just gave it a place to live.

"Culture is successful once it sustains itself."

The implication is that early hires don't just inherit culture, they author it. Every debrief, postmortem, and code review either reinforces what you say you are or quietly undermines it. There's no neutral move.

Why "why" beats "how" every time

There's a hierarchy that matters when teams disagree: values, then why, then what, then how. The further down the stack you argue, the less productive the argument.

Concrete example from inside Convex: two sub-teams once disagreed on whether to ship a new API quickly or hold it for more reliability work. On the surface it looked like a "how" disagreement, but it wasn't. They were misaligned on why the API existed in the first place, and which user the next version was for. Once that was named, the "how" resolved itself in about ten minutes.

You see this pattern everywhere once you start looking for it. A "how" fight that won't resolve is almost always a "why" fight in disguise. The fastest path through is to stop arguing about the surface and ask, out loud, what problem we think we're solving and for whom.

How to resolve conflict: go up the stack

When a meeting is stuck, end the meeting. Almost no real conflict resolves in the room it surfaces in. Go up the stack, find the values mismatch or the missing shared "why," and the rest tends to fall out.

Caveat, because this is the kind of advice that breaks if you copy paste it: this works on small, high-trust, high-ownership teams. It does not scale to a 500-person org without serious modification. James is direct about that: Convex's playbook isn't universal. It's the playbook that fits where we are now, and we'll rewrite it when the org demands it.

How Convex actually hires

A typical Convex hiring loop is roughly seven interviews by the time you reach debrief: two phone screen coding rounds, two on-site coding rounds, an architecture interview for senior candidates, a deep dive interview, and team conversations, with the debrief being the most important piece.

Here's how that compares to a more conventional scorecard-driven loop:

DimensionConvex's loopTypical scorecard loop
ScoringNo rubric. Written feedback only.Numeric rubric per competency.
Decision mechanismLive debrief, moderated.Average of scores, sometimes with a hiring manager veto.
Interviewer trainingCalibrated through debriefs over time.Calibrated through rubric definitions.
Optimizes forClarity of thought, taste, judgment.Consistency at scale.
Breaks atHyper-growth past a few hundred engineers.Senior and staff-plus hires where signal is qualitative.

Both approaches make sense for different stages. We've chosen the one that fits where Convex is now and what we're hiring for. If you're staffing a 2,000-person org with a hiring funnel measured in tens of thousands of candidates, you probably need rubrics. We don't, yet, and we're going to keep our current loop until it breaks.

Why there are no scorecards

We deliberately avoid rubric-driven scoring because rubrics produce lowest-common-denominator decisions. A rubric measures what's easy to measure. It doesn't measure clarity of thought, mental agility, or taste, and those are the qualities that separate a good Convex engineer from a great one.

Without a rubric, calibration depends on people, and people are expensive to calibrate. This approach doesn't scale to a thousand-engineer org running a hiring funnel of tens of thousands of candidates per quarter. We're aware of that, but we're not trying to be that company.

The other failure mode of rubrics is subtler. Once a number exists, it gets averaged. Once it gets averaged, the conversation moves from "is this person right for us" to "is this score above the bar." Those are different questions, and the first is the one we want to answer.

What the debrief is really for

The debrief does two jobs at once: the obvious one is making a hire-or-no-hire decision, and the less obvious one, arguably the more important, is training the interviewers' taste for what "good" actually means at Convex.

Every interviewer reads every other interviewer's written feedback before the meeting. The conversation is moderated, and James personally runs every debrief right now. That's not permanent; the plan is to grow a small set of trusted moderators over time, not to write the moderation into a checklist.

The reason it can't be checklisted is the same reason we don't use rubrics: a checklist captures the steps but not the judgment, which is trained by watching it happen, repeatedly, with someone whose calibration you trust.

What a great deep-dive interview looks like

The deep dive is where most candidates either land the offer or reveal they're not a fit. We ask candidates to walk us through a system they built, where the trap is the candidate who can describe the architecture in detail but has no idea who used it or why it was built.

If a senior candidate can't tell you who their system's users were, what problem it solved for them, or why the company funded the work, they tend not to do well at Convex. Engineering here is a problem-solving discipline. The system is downstream of the problem.

The candidates who shine in the deep dive talk about tradeoffs they made and would make differently today, constraints that shaped the design, and what they'd tear out if they could start over. That's the signal. The architecture diagram is the artifact. The reasoning behind it is what we're actually evaluating.

A useful tell: ask a candidate why they didn't pick the obvious alternative. The strong ones already considered it, can name the reason they rejected it, and can also name the conditions under which they'd reverse the decision. The weaker ones treat the question as an attack on the design.

Why AI in the coding interview adds noise

We don't let candidates use AI in coding interviews because coding interviews are already a low-resolution view of how someone thinks, and AI tools blur that view further. This isn't a moral position; every smart engineer we know uses Claude or similar tools daily, and that's not what we're measuring.

What's being measured is how you reason out loud, how you handle ambiguity, and how you respond when your first idea is wrong. AI assistance compresses all of that into "the candidate typed a prompt and the prompt worked." We learn nothing useful from that.

A Michelin star chef can use a chopping machine. That doesn't mean you evaluate the chef by watching them operate the chopping machine.

The chef analogy isn't dismissive of AI; quite the opposite. The chopping machine is genuinely useful in the kitchen, but it's not the thing you film when you're trying to understand whether someone has taste. The same logic applies to AI in interviews: the fact that the tool is great at the job is exactly why it's a poor instrument for measuring the candidate.

Whiteboard coding, on purpose

Several of our rounds are whiteboard coding, with no computer in the room. The computer is a distraction in a setting where the point is to watch a human think. We want to see the false starts, the corrections, the moment a candidate notices their own bug (and fixes it without being told). A laptop, an IDE, and an AI assistant all hide that.

If you're preparing for a Convex interview, the goal is to make your reasoning visible. Talk through what you're considering and name the assumption you're making. When you spot a flaw, say so and fix it out loud. The candidates who do this well almost always get offers, even when their final code has a bug in it.

Why work trials don't work for senior hires

Work trials are useful for spotting fast coders and useless for evaluating senior architectural thinkers. A two-day trial will tell you whether someone can ship a small feature but not whether they can shape the next two years of a codebase.

The signals that matter for senior hires take months to surface. Strategic judgment, ability to set technical direction, taste in tradeoffs, ability to mentor, ability to push back on bad decisions from leadership. None of that shows up in a paid trial week. You can fake all of it for a week. You can't fake any of it for a quarter.

You can't hire a CTO via a work trial. The same logic, attenuated, applies to staff and principal engineers. Past a certain seniority, the value the person creates is in the decisions they prevent the team from making, and those decisions don't show up on a Jira ticket.

Some teams use work trials as a way to avoid making a real hiring call, turning the trial into a hedge. If the person works out, great. If not, no commitment was made. That hedge is comfortable for the company and miserable for the candidate, and it tends to attract candidates who have less leverage in the market, which is the opposite of what you want when you're hiring senior.

AI at work: no mandate, just accountability

Convex has no AI mandate and no AI ban. The policy is simple: you're accountable for the quality of your output, regardless of how you produced it.

We're skeptical of companies mandating AI use from the top. James's words, kept verbatim because they're the right words: those tend to be "really lame companies." Mandating a tool is a leadership signal that you don't trust your engineers to make their own tooling decisions. If your engineers are good, they'll adopt the tools that make them better. If they're not, mandating Cursor won't fix it.

The flip side is also true. "Oops, Claude wrote it" is not a defense for shipping bad code. If you put your name on a PR, you own it. The AI didn't write it; you did, with help. That framing matters because it preserves the thing that actually keeps a codebase healthy, which is a clear answer to the question "who is responsible for this code." The answer is never the model.

Why Convex doesn't have an AI policy

We don't have an AI policy because we don't need one; the accountability model already covers it. Code quality, test coverage, design judgment, all of those are evaluated the same way they always were. The provenance of any individual line is interesting but not load-bearing.

This is also why we don't track AI usage metrics. "Percent of code written by AI" is a vanity number. It tells you nothing about whether the code is good, whether the system is maintainable, or whether the engineer understood what they shipped. The metrics that mattered before AI still matter now: does the thing work, is it clear, can someone else maintain it, did we learn something from building it.

Code review is a growth tool, not a correctness tool

Slightly contrarian point. Code review at Convex is for compliance and growth, not for catching bugs in senior engineers' work. Tests catch bugs, while reviews catch drift, transfer context, and grow the next set of senior engineers on the team.

If you're relying on code review to find correctness issues in your senior engineers' PRs, your testing story is the actual problem. Fixing it is harder than tightening review, and that's exactly why teams reach for review first. They're treating the symptom because the cure is expensive, but it's still the wrong move.

The AI angle on this is straightforward. AI-generated code shifts more weight onto tests and onto the author's judgment. The reviewer's job doesn't change much. If the tests are good, the review is for context and growth. If the tests are bad, no amount of human review is going to save you, AI-generated or not.

Advice for engineers who want to work here

The short answer: interviews aren't the job.

"You don't become a world champion sprinter by doing sprints once a week."

Build daily. Pick something small and ship it. Contribute to open source, including Convex Components and Convex Helpers if that's where your interest lands. Hang out in the Convex Discord, answer other developers' questions, and notice which problems show up over and over. The ones that repeat are the ones worth understanding deeply, because they're the ones the platform hasn't yet solved well.

The best signal you can send a hiring manager isn't a polished resume. It's a track record of thinking in public. A blog post explaining a tradeoff you made. A small library that solves one thing well. An issue thread where you helped someone debug a problem that wasn't your problem. Those artifacts are worth more than a year of LeetCode grinding, because they show the thing the deep dive is trying to surface, which is whether you can connect a system to the problem it's solving.

A few specific things we look for when we read a candidate's public work. Does the writeup name a tradeoff, or does it just describe what was built? Does the code have tests, and do the tests cover the interesting cases or just the obvious ones? When the candidate disagrees with someone in a thread, do they argue from values and reasoning, or from authority and tone? None of these are dealbreakers, just tells.

If you're earlier in your career, go listen to how AI is changing what it means to be a junior developer.

Honest closing note. If you do all of that and don't get a job at Convex, you've still become a much better engineer. That's a fair trade either way. We hire from a small pool, and the pool gets smaller the more senior the role. You can do everything right and still not match what we need at the moment we're hiring. That's the part of the process that no amount of rubric design can fix.

What "taste" actually means

The word "taste" shows up a lot in this article. Let's be concrete about what it means, because it's the kind of word that becomes a synonym for "we hired the people we liked" if you're not careful.

Taste, as we use it, is the ability to predict which design decisions will hurt later. It's a working model of how systems decay, what kinds of complexity are load-bearing versus accidental, and where the cheap moves are that buy disproportionate clarity. Engineers with taste tend to delete more code than they add, name things in ways that don't need updating six months later, and spot good abstractions one layer earlier than the rest of the team.

You can't test for taste with a coding round, but you can sometimes hear it in the deep dive, when a candidate says "we shipped it this way and within a quarter it was clearly the wrong call, here's what I'd do now." You can usually hear its absence, too, in candidates who describe systems they built without ever describing what was wrong with them.

Hiring as a forcing function for clarity

One thread ties the rest together. The reason James moderates every debrief, the reason we write feedback instead of scoring it, the reason the deep dive is the deep dive, is that hiring forces you to articulate what you actually believe about engineering. You can't hire well without saying out loud what "good" means, and you can't do that without arguing about it, repeatedly, with people you respect.

Most companies skip this step. They adopt a rubric off the shelf, run candidates through it, and back into a definition of "good" that's whatever the rubric happened to measure. The result is a team that is internally consistent and externally generic, staffed with strong engineers but not shaped by any particular point of view about the work.

We'd rather have the point of view, even at the cost of a hiring loop that doesn't scale forever. When the point of view is wrong, we'll know, because the engineers we hired will tell us.

A note on the AI hiring debate

The broader AI coding interview debate has split the industry into two camps that are mostly talking past each other.

Camp one says AI should be allowed because the job allows it. Banning it is artificial, the argument goes, and we should select for the engineers who use the tools well. Camp two says AI should be banned because it makes cheating trivial and turns interviews into prompt-engineering contests. Both camps are arguing about the same surface question and missing the deeper one, which is what the interview is for in the first place.

If you think the interview is a simulation of the job, allowing AI is consistent. If you think the interview is a measurement of how the candidate thinks, banning AI is consistent. The two positions aren't really in conflict; they're answers to different questions about what an interview is.

Our position, to be explicit: the interview is a measurement, not a simulation. We have other ways of learning what someone is like to work with day to day. References, trial projects scoped to weeks rather than days, the first ninety days on the job. Those signals are richer than any interview, and the interview's job is the thing those signals can't easily produce: a controlled look at how someone reasons.

That's why AI stays out of the room: letting it in would defeat the only thing the interview is designed to do.

FAQ

Q: Should candidates be allowed to use AI in a coding interview? A: Convex's view is no. Interviews are a low-resolution measurement of how a candidate thinks, and AI tools add noise to that signal. The point isn't to see how fast someone can prompt. It's to see how they reason, where they get stuck, and how they recover.

Q: Does Convex require engineers to use AI on the job? A: No. There's no AI mandate and no AI ban. The only policy is that engineers are accountable for the quality of what they ship, regardless of how they produced it.

Q: How many interview rounds does Convex run? A: Roughly seven by the time a candidate reaches debrief: two phone screen coding rounds, two on-site coding rounds, an architecture interview for senior candidates, a deep dive, and team conversations.

Q: Why doesn't Convex use a scoring rubric? A: Rubrics produce lowest-common-denominator decisions. They measure what's easy to measure and miss clarity of thought, taste, and judgment. The debrief replaces the rubric, with the tradeoff that calibration depends on people rather than a checklist.

Q: Should startups run work trials for engineers? A: Work trials are useful for spotting fast coders. They're not useful for evaluating senior architectural thinkers, because the signals that matter at that level take months to surface.

Q: What does Convex look for in a deep-dive interview? A: Reflection on systems the candidate has built. Who used the system, why it existed, what tradeoffs they made, what they'd do differently now. Whether the system was technically impressive matters less than whether the candidate understood the problem it was solving.

Closing thought

The hiring debate around AI has split into two camps that mostly aren't talking to each other: mandate it, or ban it. Both miss the point. Interviews and the job aren't the same activity, so the right tool for one isn't automatically the right tool for the other. At Convex, we let engineers use AI on the job because the job rewards output. We don't let candidates use AI in interviews because interviews reward visible thinking. Neither answer is universal, but both are honest about what they're optimizing for.

If that approach to engineering culture sounds like the kind of place you'd want to work, we're hiring across the stack. See open roles at Convex

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started