Stack logo
Sync up on the latest from Convex.
Tom Redman's avatar
Tom Redman
a day ago

Midpoints: A Word Game Powered by AI Embeddings and Convex Components

Visual of vector embeddings on a 3D chart

Recently, Ian shared an idea with me that he had for a new word game. I told him, "I love Wordle!" And then he proceeded to explain the most brilliant and fun-sounding game I'd ever heard of.

Ian's game concept relies on a little bit of high school algebra applied to vector embeddings of words. I always thought "vector embedding" sounded so fancy, but the idea is simple: words are translated to an array of numbers, and words that are semantically or contextually similar will be "closer" to each other in the number space. That's it.

Take the world's simplest example: after being embedded by the same Magic Algorithm (e.g. OpenAI Embeddings or Anthropic Embeddings), "dog" and "cat" might have a difference* of 2. But "dog" and "archaeology" might have a difference of 100. See? Easy.

*The actual embedding are arrays of numbers, so that's why we need to use linear algebra to calculate the difference - or more correctly, distance.

But don't worry about a thing. We'll dive deeper into the math of vector embeddings below.

Midpoints: Words, Math, and a little Magic

Ever wondered what word sits perfectly between "sporty" and "practical"? Or perhaps "sweet" and "crunchy"? Midpoints gives you 10 chances to find the best match and scores you accordingly!

In Midpoints, players face a deceptively simple task: given two words, guess the words that best bridge the gap between them. But unlike traditional word association games, Midpoints uses AI embeddings to determine the "best" answers.

You can think of it as a combination of the game "Scattergories" and a crossword puzzle.

The core gameplay is straightforward: you're presented with two words and have 10 chances to guess words that lie conceptually between them, scoring points based on how "perfect" your answer is according to the AI's understanding of language.

How to Play

When you start a round of Midpoints, you're presented with two words and a simple challenge: guess the words that best connect them. Each round contains 10 hidden target words, carefully selected from a specific category (like cars, foods, or animals), and you get 10 guesses to find them.

For example, in a round about cars, you might see:

  • Word 1: sporty
  • Word 2: practical

Your task is to think of cars that balance these qualities. A Volkswagen Golf GTI might be a perfect answer, combining sporty performance with practical functionality. A Ferrari would be too sporty, while a minivan would be too practical – the game rewards finding the sweet spot between the two concepts.

Scoring System

The scoring is designed to reward precision while keeping the game engaging:

  • Finding the #1 ranked word: 10 points
  • Finding the #2 ranked word: 9 points
  • Finding the #3 ranked word: 8 points And so on...

Each guess gives you immediate feedback with your score and how close you were to the perfect answer. The game keeps track of your overall performance, displaying your rank and total score in the top right corner.

Example Round Walkthrough

Let's walk through a real round:

  1. The round begins: sporty + practical
  2. First guess: "Honda Civic" - 7 points! A good start, the Civic balances both qualities well
  3. Second guess: "Golf GTI" - 10 points! Perfect match, exactly between sporty and practical
  4. Third guess: "BMW M3" - 5 points, a bit too far on the sporty side
  5. And so on until you've used all 10 guesses or found all target words

A leaderboard shows how your scores compare to other players, adding a competitive element to the game mechanics.

The Technology Behind the Game

Vector Embeddings: Teaching AI to Understand Word Relationships

At the heart of Midpoints is a sophisticated system for understanding how words relate to each other. As mentioned, we use vector embeddings – a way of representing words as points in a high-dimensional space. Imagine a giant map where every word has its own unique coordinate. Words with similar meanings cluster together, while unrelated words end up far apart: In this mathematical space:

  • "Car" might be closer to "vehicle" than to "banana"
  • "Sporty" creates a direction toward performance and excitement
  • "Practical" points toward utility and functionality\

Scoring Algorithms: Finding the Perfect Middle Ground

Time to dust off that old algebra textbook you've been meaning to dive back into! Midpoints uses these algorithms to determine how good a guess is:

// convex/linearAlgebra.ts

// Calculate the midpoint between two embedding vectors
export function calculateMidpoint(a: number[], b: number[]) {
  return normalize(a.map((n, i) => n + b[i]));
}

// Normalize a vector to unit length
export function normalize(vector: number[]) {
  const magnitude = vectorLength(vector);
  return vector.map((n) => n / magnitude);
}

// Calculate vector magnitude/length
export function vectorLength(vector: number[]) {
  return Math.sqrt(vector.reduce((sum, n) => sum + n * n, 0));
}

// Calculate dot product between two vectors
export function dotProduct(a: number[], b: number[]) {
  return a.reduce((sum, n, i) => sum + n * b[i], 0);
}

// Calculate vector from point a to point b
export function deltaVector(a: number[], b: number[]) {
  return a.map((n, i) => b[i] - n);
}

Algorithm 1: Midpoint Calculation (calculateMidpoint)

We literally find the mathematical midpoint between two word vectors. If our words are "sporty" and "practical", we calculate a point exactly halfway between them in our high-dimensional space.


Algorithm 2: Reciprocal Rank Fusion (RRF)

This algorithm combines multiple ranking signals to create a more nuanced scoring system. It also has the coolest name of any algorithm in history. It's particularly good at handling cases where a word might be a great match for one input word but only okay for the other.

The algorithm for reciprocal rank fusion looks like this:

// convex/namespace.ts
function reciprocalRankFusion(aIndex: number, bIndex: number) {
  const k = 10;
  const a = aIndex + k;
  const b = bIndex + k;
  return (a + b) / (a * b);
}

Here's how this algorithm works:

  1. Takes two rank positions (aIndex and bIndex) as input

  2. Uses a constant k=10 to stabilize the calculation

  3. For each rank position:

    • Adds k to prevent division by very small numbers
    • Combines the ranks using the formula: (a + b) / (a * b)

For example:

  • If an item ranks 1st in both lists: (1+10 + 1+10) / ((1+10) * (1+10)) = 20/121 ≈ 0.165
  • If an item ranks 1st and 50th: (1+10 + 50+10) / ((1+10) * (50+10)) = 71/660 ≈ 0.108
  • If an item ranks 50th in both: (50+10 + 50+10) / ((50+10) * (50+10)) = 120/3600 ≈ 0.033

The resulting RRF score:

  • Is higher when items rank well in both lists
  • Decreases as ranks get worse
  • Handles ties and missing ranks gracefully
  • Gives more weight to higher ranks (closer to 1)

Algorithm 3: Vector Dot Products (dotProduct)

We use dot products to measure how similar two words are. This gives us a precise numerical score for how well your guess matches the target words.

Each scoring strategy has its strengths:

  • RRF is great for finding balanced matches
  • Midpoint calculation rewards precise matches to the mathematical center
  • Dot products help identify subtle relationships between words

Convex Components: Building a Scalable Game

When Midpoints inevitably goes viral because somebody in the community adds social features, we've made sure it'll scale without a hitch.

Midpoints is built on Convex, a backend platform that provides powerful building blocks called Components. Here's how we use them:

1. Leaderboards (aggregate)

Use the Aggregate component

  • Global leaderboard tracking all-time best scores
  • Round-specific leaderboards for competitive play
  • Efficient aggregation of scores across all players

2. Rate Limiting (ratelimiter)

Use the Rate Limiter component

  • Prevents spam and abuse
  • Ensures fair play by limiting:
    • Number of guesses per second
    • Round creation frequency
    • Word list uploads

3. Action Caching (actionCache)

Use the Action Cache component

  • Caches expensive embedding calculations
  • Speeds up repeated guesses
  • Reduces API costs for vector operations

4. Background Jobs (crons)

Use the Crons component

  • Regular maintenance tasks
  • Leaderboard updates
  • Cleanup of old game data

5. Sharded Counter (shardedCounter)

Use the Shared Counter component

  • Scales to handle many concurrent players
  • Accurately tracks global statistics
  • Manages high-throughput scoring

6. Database Evolution (migrations)

Use the Migrations component

  • Safely updates game data structure
  • Enables new features without disruption
  • Maintains compatibility across versions

These components work together to create a responsive, fair, and scalable game experience. The rate limiter keeps the game fair, caching makes it fast, and aggregation helps us track who's winning – all without writing complex infrastructure code from scratch.

Round Creation Workflow

Game creators supply:

  1. A category-specific word list (e.g., "cars", "foods", "movies")
  2. Two anchor words that form an interesting conceptual spectrum
  3. A scoring strategy (RRF, midpoint, or dot product)

Example pairings that work well:

  • Cars: "luxury" + "efficient"
  • Movies: "heartwarming" + "intense"
  • Foods: "healthy" + "indulgent"

Best practices:

  • Choose words with clear opposing qualities
  • Ensure word list contains items spanning the spectrum
  • Test rounds with small groups before public release
  • Avoid highly similar words as anchors

Technical Implementation Highlights

Performance Optimizations

// Caching embeddings to avoid recalculation
const embedding = await embedWithCache(ctx, args.text);

// Chunked processing for large word lists
await asyncMapChunked(texts, async (chunk) => 
  ctx.runMutation(internal.embed.populateTextsFromCache, {
    namespaceId: ctx.namespace._id,
    texts: chunk,
  })
);

Rate Limiting Configuration

  • Token bucket algorithm
  • Sharded counters for scalability
  • Per-minute rate limiting with configurable periods
// convex/auth.ts
const rate = new RateLimiter(components.ratelimiter, {
  createNamespace: { kind: "token bucket", period: 10 * SECOND, rate: 1 },
  addText: {
    kind: "token bucket",
    period: 24 * HOUR,
    rate: 10_000,
    shards: 10,
  },
  basicSearch: { kind: "token bucket", period: SECOND, rate: 1, capacity: 5 },
});

Leaderboard System

The leaderboard is one of the most interesting systems in the game. Let's dive into some of the details.

The system demonstrates several best practices:

  • Separation of concerns (round vs global rankings)
  • Efficient sorting with composite keys
  • Graceful handling of anonymous users
  • Automatic updates via triggers
  • Null-safe value handling

This implementation provides a scalable and maintainable way to track user performance both within individual rounds and across the entire game system.

Dual Leaderboard System

roundLeaderboard: Tracks scores within individual rounds globalLeaderboard: Maintains overall user rankings across the entire game

Smart Sorting with Composite Keys

  • Round leaderboard uses a triple key: [roundId, score, -submittedAt]
  • This ensures proper ordering by round, score, and submission time (favoring recent submissions!)
  • Global leaderboard uses [score, -creationTime] to rank by score and account age (favoring new users!)

Anonymous User Handling

  • Anonymous users' scores are divided by 1000 in the global leaderboard
  • Captured anonymous accounts (converted to real accounts) have their scores zeroed
  • This creates a natural hierarchy: real users > anonymous users > captured accounts

Trigger-based Updates

// convex/functions.ts
triggers.register("guesses", roundLeaderboard.idempotentTrigger());
triggers.register("users", globalLeaderboard.idempotentTrigger());
  • Leaderboards automatically update when relevant tables change
  • Uses idempotent triggers for reliability
  • Maint

Score Aggregation

  • Round scores are summed directly: sumValue: (d) => d.score
  • Global scores handle nulls gracefully: sumValue: (d) => d.score ?? 0
  • This provides resilience against missing or invalid data

Reduce Processing by Using an Action Cache

  • Efficient caching of expensive embedding operations
  • Action-based cache invalidation
  • Transparent cache layer
// convex/embed.ts
const embedCache = new ActionCache(components.actionCache, {
  action: internal.embed.generateEmbedding,
});

export async function embedWithCache(ctx: ActionCtx, text: string) {
  return embedCache.fetch(ctx, { model: CONFIG.embeddingModel, input: text });
}

Smart Vector Search Implementation

  • Namespace-based filtering
  • Score mapping for efficient lookups
  • Smart limit handling with buffer for edge cases
// convex/namespace.ts
const midpointMatchScoresById = new Map(
  await ctx.vectorSearch("embeddings", "embedding", {
    vector: midpointEmbedding,
    limit: 102, // extra two to account for the left and right embeddings
    filter: (q) => q.eq("namespaceId", ctx.namespace._id),
  }).then((results) => results.map((r) => [r._id, r._score])),
);

Using Chunks & Batches to Manage LLM Bandwidth

  • Efficient batch processing
  • Chunking for large datasets
  • Promise handling for async operations
// convex/llm.ts
export async function asyncMapChunked<In, Out>(
  items: In[],
  fn: (batch: In[], index: number) => Out[] | Promise<Out[]>,
  chunkSize?: number,
): Promise<Out[]> {
  return Promise.all(chunk(items, chunkSize).map(fn)).then((c) => c.flat());
}

Robust Error Handling with Result Types

  • Type-safe error handling
  • Discriminated unions for results
  • Consistent error pattern across the app
// convex/functions.ts
export type Result<T> =
  | { value: T; error: undefined }
  | { value: undefined; error: string };

export function error(message: string) {
  return { ok: false as const, value: undefined, error: message };
}

export function ok<T>(value: T) {
  return { ok: true as const, value, error: undefined };
}

What's Next for Midpoints?

Midpoints is open source and ready for community contributions. We welcome improvements in several areas:

Potential Enhancements

  • Make Midpoints run local-first!
  • Additional scoring algorithms
  • Custom category creation UI
  • Mobile-friendly interface
  • Integration with dictionary APIs
  • Social sharing features

Get Involved The codebase is available on GitHub at github.com/ianmacartney/mid-embeddings. We use Convex for the backend, React for the frontend, and standard TypeScript throughout.

Try It Now Play Midpoints at [game link] or fork the repo to create your own word-guessing game using our vector embedding infrastructure.

As always, happy coding!

Build in minutes, scale forever.

Convex is the sync platform with everything you need to build your full-stack project. Cloud functions, a database, file storage, scheduling, search, and realtime updates fit together seamlessly.

Get started