4 years ago

The unfulfilled promise of functional ideals on the web

Visualization of many different data points

While the TIOBE language rankings might place traditional functional programming languages solidly outside the top 10, the paradigms advanced by those languages are foundational to the popular emerging technologies engineers are using today.

Ideas around immutability, purity, and dataflow programming are quietly saturating our technologies like Bitcoin and React. And while these early inroads are enabling exciting new ways to build large and dependable applications, there is still much to do to realize the full potential of this revolution.

But first, let’s go back to the advent of some of these functional patterns through a comparison of the very different ways Haskell and C achieve a common programming task.

Getting beyond imperative

Given both resource and compiler limitations, most early mainstream programming languages were very close analogs to the underlying machine code. "Take this value from here, then do this to it, then put it here." These are imperative programming languages, which command the computer to do fairly concrete things as if moving boxes around a warehouse.

As an example, lets double an array of values imperatively in a language similar to C:

1for (int i = 0; i < 5; i++) {
2	my_list[i] *= 2;
3}
4
5

Examined critically, this code expresses much beyond the programmer’s desire to simply double the list. It's not just what the programmer wants done, it's explicitly how the machine should do it. Perhaps, as this code instructs, the right way to do this is to walk through each item in the list. And yes, perhaps there should be an incrementing variable that represents the index as we go. But those are "how" not "what" choices.

In contrast, later functional languages leveraged more powerful compilers and fewer resource limitations to provide new abstractions that were closer to mathematical functions. Compare our C code above to the equivalent Haskell:

1let newList = map (* 2) myList
2

This more directly says: create a new list which is this existing list doubled.

In this manner, functional languages introduced higher level abstractions that were akin to declarative transforms of values rather than literal commands to the machine. Functional idioms are less "do this and then do that" and more "data like this can transform in this way to create data like that."

Another way to think of this: you're not getting and setting data, you're just making rules about how data can flow.

Immutability and purity

One cool property of these functional languages is they could explore the benefits of making variables immutable by default, because a variable no longer intrinsically mapped to a specific memory address.

And Haskell did. Recall our list doubler:

1let newList = map (* 2) myList
2

At the language level, everything here is immutable. myList never changes, and newList also can never change. Even if the compiler opts to have the underlying memory change in place in the final machine code, the compiler still knows that the programmer has no specific need for this data to change in place.

In addition to default immutability of variables, Haskell functions by default are pure. Their behavior does not and cannot read or write any dynamic global or environmental state like I/O, time, or random number generators. Pure functions have the consequence, then, of always producing the same output for the same input. In fact, it’s impossible for them to produce a different result: they depend on nothing besides their arguments, and computers don't wiggle bits around just for fun! ¹

This property means you only have to run them once for a given set of arguments, since the result will be the same every time. This is clearly great for automatic caching and memoization. But it's also very useful for reproducibility: you know that anyone, on any computer can also run these pure functions and replicate your result, because your algorithm doesn't depend on any environmental factors.

Known immutability has an important relationship to functional purity: It’s the necessary ingredient to create “real purity” instead of merely “fingers-crossed purity." If you know that both the arguments and the global state are immutable, then you also know the entire function is guaranteed pure. Absent an environment that can enforce these constraints, when we utilize pure patterns, we’re merely achieving “fingers-crossed purity”. The programmer has to be careful that their functions don’t violate any of the rules!

Quick note: we're going to use the terms "purity" and "determinism" interchangeably from here on out, but generally speaking, purity is a stricter form of determinism. Impure deterministic functions can also make a consistent change to global (non-return value) state; but if you consider those effects part of the return value, the ideas are essentially equivalent. This stackoverflow answer is a good summary.

Mainstream purity: Safe HTTP methods

An example of purity you're probably already familiar with is HTTP GET, the most important safe method in HTTP.

Every dutiful REST practitioner knows that your GET needs to return the same result for the same URL and ETag² every time and have no side effects. And what benefits do we get out of that? Massive global CDN networks that can serve data out of caches! Faster websites, energy savings, lower hosting bills.

However, in practice this is almost always merely “fingers-crossed purity”, since our GET controllers are not typically written in languages and environments that truly prohibit state changes.

Chained purity and deterministic dataflow

If you pass the output of one pure function as the input of another, you create a long deterministic chain³ that retains all the useful reproducible/verifiable properties of any individual link. You can append new pure links over time to a previous chain to extend it.

When you chain pure functions, you’re creating a kind of dataflow programming system. You can think of dataflow programing as a graph of deterministic computation that can be recomputed consistently by replaying it.

As you probably know, some of the world's most popular types of pure functions to chain are hash functions. Chains of hash functions have been used to create things like git, and some crypto stuff you may have heard about...

Mainstream deterministic dataflow: Blockchain, Ethereum and the EVM

Back in 2008, Satoshi Nakamoto's famous paper emerged, laying out the ideas behind bitcoin and blockchain generally. Like git on steroids, if we have a sequence of checksummed and signed payloads, we can append a new checksummed and signed payload to it, and everyone else can re-run the same⁴ pure functions we did to collectively reproduce and validate our new payload. In bitcoin specifically, these payloads were monetary transfers of a new virtual coin, creating a system with an immutable, fully verifiable ledger where ownership of coins cannot be faked and history cannot be rewritten.

Bitcoin was a collection of known ideas, but composed together clearly and cleverly with striking implications for the future of decentralized systems. But very soon after, people starting contemplating other non-monetary payloads that could be put onto blockchain systems. Like domain names, or real estate titles, or... units of computation! If instead of coins, the payloads were code that everyone could run, you could make a big distributed computer that could be used to create any kind of other thing.

This is a great idea, right? Well, not if that code reads some file on the original developer's machine, or uses randomness, or tries to run an HTTP request to a web site that stopped paying its hosting bill. If it does anything like this, everyone can't generate the same ledger by running the same payload code, and the underlying verifiable principles behind blockchain are violated.

So computation on blockchain, too, requires computational determinism.

In Ethereum's case, currently the most popular general-computational blockchain, whatever code you write eventually gets compiled into byte code that runs on the Ethereum Virtual Machine, the EVM. And sure enough, from the EVM docs:

The EVM behaves as a mathematical function would: Given an input, it produces a deterministic output. It therefore is quite helpful to more formally describe Ethereum as having a state transition function:

Therefore, the code you write in your input program won't be allowed to do any non-deterministic things in order to be able to compile to EVM's intentionally limited instruction set. Thus the EVM enforces “real determinism." This is essential, as the whole point of Ethereum's big distributed computer is to be resistant to manipulation and abuse, so “fingers-crossed determinism” doesn’t go far enough.

Functional paradigms and pure dataflow power the modern web

In our opinion, the most exciting recent emergence of functional paradigms is in web development. About ten years ago, everyone’s favorite reactive framework appeared on the scene and changed everything.

We’re talking about Elm... of course!

For many developers, Elm was the first whiff of something like functional reactive programming. Generally, in this pattern there is some declarative way you express how your app maps underlying data into a visual form. These declarations are deterministic. When any mutation changes the underlying source data, that mutating code isn't directly responsible for fanning-out updates to all interested UI components. Instead, a runtime knows what components depend on this data and re-runs the appropriate declarative transforms over the new data to update the associated UI.

This pattern ends up being a very nice way to structure applications with UI. It really addresses one of the hardest parts about the traditional, ad-hoc way visual applications often manage the connections between visual UI and underlying data. Without a deterministic-subscription exchange like FRP provides, every mutation has to maintain a group of one-to-many mappings that represent "who cares about this change?" And even in many cases "how should they change?"

Managing this problem without a principled framework like FRP tends to lead to fragile spaghetti-like code expressing data dependencies all over your codebase. Keeping all these dependencies in sync as the app evolves is a huge headache for developers and slows down progress on the product. So, kudos to Elm for showing the web world a better, functional way.

Oh yeah, and then a year after Elm, Facebook released React. It did okay.

React basically allowed you to get a decent first taste of that happy FRP life without completely switching your web programming language. In particular, you had a way to set component-local state and specify a reactive pure transform that would turn that state into DOM that the browser can render.

Here's a silly app called "Fruit Trader" that will illustrate how the modern web utilizes pure dataflow, starting with how far React alone takes us:

Diagram Diagram

Redux / MobX and the state management gang

Granted, React helps you achieve the very end of the deterministic reactive dataflow puzzle–the place where a single component’s underlying state is updated and that component is re-rendered. But it doesn’t help with data exchange between more complex shared state which many components depend on.

So very soon after the release of React, projects like Redux, MobX, and others showed up to take on the completing-the-dataflow-graph problem which the web ecosystem has labeled "state management". Instead of the state graph just having a single edge with connecting a component with its specific data, these libraries let you create complex graphs with relationships between dependent data, any number of intermediate transformations, and then finally that react component-local state. Each of these hops along this graph must be–you guessed it–deterministic:

Diagram Diagram

Redux even has a page on its three principles that emphasize the functional underpinnings of Redux, of which two are:

State is read-only Changes are made with pure functions

Mutations in these systems involve modifying only the root object(s), and then letting the dataflow graph react, transform, and finally re-render:

Diagram Diagram

And with these libraries at hand, client-local state is pushing pure dataflow around, and we have clear principles about where data changes and how all the reactions cascade out. State management in web programming is solved, right? Not so fast.

The bummer about backends

With React and its rich ecosystem of state management libraries, building powerful web apps with clean abstractions is becoming easier and easier–as long as all the state necessary to service those applications is already in the browser.

The problem is those darn servers and that critical state that is stored and distributed on them. They're in the state graph, but they're very poor participants compared to our powerful browser-side libraries.

At Convex, we believe there are two main flaws that make them particularly underpowered for what modern apps need:

The have poor support for the reasoning about immutability vs. mutability and therefore for purity/determinism
They have very limited semantics around state subscription and invalidation

Let's talk about each.

Backend bummer: fragile server-side determinism

As we’ve explored, we already have safe GET! But really, that only restates the goal, because backends today are in the “fingers-crossed purity” zone. It’s still up to us to ensure the controllers backing our GET endpoints only read state and do not change state.

So the server-side help for this objective amounts to little more than be careful. As long as we're very careful, we can probably cache our GET requests and trust our ETags.

At Convex, we believe that backend systems should provide real determinism, and therefore enable anxiety-free GET safety and perfect caching. And they should do so without requiring we completely replace our backend programming languages and library ecosystems.

Backend bummer: limited subscription semantics

Even if our backends start providing real deterministic guarantees, this makes our GETs requests powerful and robust only against misbehaving GETs.

Eventually, one of our POST endpoints that is allowed to change state... does its thing and changes some backend data. Presumably some rows in some tables on the backend are new or updated, and therefore some GET requests need to be recomputed and some browsers' state updated:

Diagram Diagram

Keeping track of these mappings is a very tricky problem, especially as our applications grow. The server acts as a kind of black-box where state interchanges without helping the application know what needs updating. It's a giant, ugly interruption in our beautiful deterministic dataflow state graph!

In theory, subscriptions should solve this problem. But backend systems today do not have sufficiently powerful subscription semantics. They’re built on databases that only allow you to do row-level or table-level subscriptions. But what our components are actually dependent on is a controller function running, which may fetch records from multiple tables, and might only look at some subset of those records, and then may transform that data into a nice JSON object for the application's model. It might do all kinds of things to create that object.

But that object is really what they depend on. And today, there's not really a great way to subscribe to... whatever all of that is. We just want to know when that final JSON object would change.

And so it often falls to our app to somehow keep track of which POSTs imply what table updates which map to the outputs of our GET endpoints. The app has to stay in lock-step with what's happening within the controllers and the backing database so it can create subscriptions in the database's language instead of the app's language. ⁵

Enter Convex

Our mission at Convex is to solve this exact problem. We believe frontend engineers deserve to have the backend and the database manage state in the app's language. We think it's time for the backend to center its design around the needs of the app instead of the other way around.

And this is what our product is. Convex is a backend that supports automatic, pervasive ideas of immutability and mutability, and it leverages that information to provide controller and function-level subscriptions to backend state.

Basically: Convex completes your app's deterministic dataflow graph. The backend feels like a seamless, persistent extension of your state management.

Want to see for yourself? Sign up for our beta wait list. And if you’re excited to help us build Convex, we’re hiring!

Footnotes

Well, bit flips do happen, but we have little idea how much fun the machine finds them to be. ↩
One way to think about ETags is they version the "global" read-only state for a given URL and its params. ↩
Or tree, or DAG, or whatever. The dataflow graph doesn't have to be linear, but git's happens to be. And (eventually) blockchain's is as well, given the property that the majority of actors "vote" to kill off invalid forks via cryptographic verification and consensus. ↩
Technically, the signed aspect of blockchain does not use the "same" algorithm as the originator of the payload, but is the asymmetric public analog of that algorithm that can verify (but not forge) the private provenance ↩
Or, let's be honest, we just throw our hands up in the air, and invalidate and re-run everything when data changes just in case. ↩

Build in minutes, scale forever.

Convex is the backend platform with everything you need to build your full-stack AI project. Cloud functions, a database, file storage, scheduling, workflow, vector search, and realtime updates fit together seamlessly.

Get started