Protocol Versioning

June 05, 2024

I am currently re-visiting “Dilated File Transfer”, the next-generation protocol for Magic Wormhole’s main user feature (transferring files). This is built on top of the more-general “Dilation” feature, hence the name.

Versioning in General

In the context of a network protocol, we usually take “versioning” to mean something more aligned with “extensibility”. That is, one big reason to have a new version of a protocol is to add features. (Of course, fixing problems is another popular reason).

There’s even a whole RFC on this, RFC 9170 (“Long-Term Viability of Protocol Extension Mechanisms”) which has some examples. Sometimes the term “ossification” is used.

File Transfer in Magic Wormhole

The magic-wormhole Mailbox protocol already has an “extensibility slot” in its protocol: the app_versions information that is sent to each side (and used as a key-confirmation message).

Unfortunately, the current transfer protocol does not make use of this. (As an early example of the wisdom in RFC9170, the Haskell implementation decided to just always send {} no matter what – see Issue #66)

This means we have one “out” before the next protocol, which will begin using this slot. As currently imagined, this means sending some version information like:

app_versions = {
    transfer = { ... }
}

That is, the existence of the key "transfer" means that this is Dilated File Transfer (whereas empty version information indicates the classic protocol).

So, great! We can indicate to our peer whether we want to use the new protocol or not, and so can provide backwards-compatibility if desired.

However, what do we put in the { ... } part so that we can smoothly expand the protocol (or fix protocol-level bugs) in the future?

Number, Numbers, Features?

It seems immediately satisfying to conclude that a single number (with e.g. “highest common number” as the spoken protocol) isn’t great. Some of the problems with this:

A tweak to the single-number approach is to have a list of numbers. That is, all versions of the protocol one wishes to speak.

This does make it possible to “retire” a version of the protocol (and could also allow for experimental versions, as long as both sides speak them).

Following the high-level advice of RFC9170 leads me to believe that it’s beneficial to have some kind of extension point. The RFC also concludes that you must “use it or lose it”, that is protocols with unused extension points end up with implementations that don’t allow those extension points to operate properly (if they finally are exercised).

Given that we’ve already identified some desirable extensions to the protocol, we definitely want a way to have new features (that could be optional or not).

So, maybe all we need is a functioning “features” system.

Aside: Sending Features

Before we examine whether having only features (and no protocol version) will work, we’ll briefly go over how this works as currently specified.

Ignoring a bunch of the underlying protocol (see the Mailbox Server Protocol if interested), as soon as the two peers gain a mechanism to send encrypted messages to each other, they send a JSON open-ended dict known as the app_versions message.

That is, each peer has sent to the other peer some early version information. For the Dilated File Transfer protocol, this looks like:

app_versions = {
    transfer = {
        "features": ["zero", "compression", "fine-grained"]
    }
}

That is, there’s a list of "features" that are arbitrary text. Peers must tolerate previously-unknown feature names here (and may choose not to “advertise” features on a per-connection basis). A Peer could decide that a particular feature is “required” and terminate any connections that don’t advertise that feature.

Although there is no “negotiation” built in here (that is, it’s a one-time message) a sub-protocol could choose to implement additional messages over the Dilated connection that amount to negotiation.

Are Versions Just Features?

Stated another way, we could suppose that we have both a protocol version (or list of versions) as well as an optional-features mechanism. That is, consider that in addition to the "features": [] list we also had a "version": 1 or "versions": [0, 1] mechanism.

The question becomes: under what circumstances might we need or want to “add a protocol version” instead of a feature?

Protocol Bug?

Perhaps a protocol bug is a good example. Let us suppose that a serious problem has been found with the protocol and we need to change how some aspect of it works to fix the bug. The “hardest” thing here is likely something like changing the binary representation of an existing message (adding or removing members) or altering the state-machine (i.e. how a Peer is expected to respond to a message).

The underlying Dilation protocol gives us a record-pipe to our Peer, so we don’t have to worry about message-framing. Although Dilated File Transfer specifies msgpack for these sorts of messages, lets not depend on any msgpack features to get ourselves out of this.

So lets say we have an “Offer” message that consists of a byte indicating this “kind”, an arbitrary-length text indicating the “file name” and a 2-byte integer indicating the file size. Further, the protocol says that we MUST answer with an “Accept” or “Decline” message before the transfer continues.

It is found that the 2-byte integer is too small to represent file-sizes, and that the “waiting” is inefficient. So we wish to change the “Offer” message to have an 8-byte integer for the file size and to change the state-machine so that there is no “waiting” (a client that declines the file simply closes the subchannel).

With a single version of the protocol, we increment the number. If we encounter a Peer with a lower number, we can either choose to continue with the old protocol or disconnect. There is no way to indicate this (i.e. “I want to speak version 2, but will refuse to speak version 1 entirely”).

We can fix that latter point with a list of versions: now we can say “[1, 2]” if we’ll speak both or just “[2]” if we only allow the newest protocol.

To encode this as a feature we could have a “core-v1” feature indicating the first version of the core protocol. Older peers would be sending "features": ["core-v1"] and newer peers would send "features": ["core-v1", "core-v2"] if they supported both or just "features": ["core-v2"] for the latest.

Versions ARE Just Features

Using features like this begins to look a lot like the “list-of-versions” option, except encoded into the “features” mechanism. It might take some careful wording about required and optional features, but overall I prefer the idea of one thing and not two (that is, one way to extend/change the protocol instead of two).

Another issue with having both a version (or versions) and a list of features is the explosion of cases.

What happens to feature “foo” if you increment or add a protocol version? Are these now considered two different things? That is, logically now there is “foo” with protocol version 1 and “foo” with protocol version 2 (which could conceivable interact differently).

You don’t magically wave away ambiguities – that is, you can still make mistakes where it becomes ambiugious or contradictory to enable two different features at the same time. However, adding a new “core-v2” protocol only adds one thing, it doesn’t multiply (by all existing other features).

This also means we “Grease” (oof) the protocol by actually using the extension-point. (I fear that if there was both a “version” and a “feature” mechanism, the version one would go a long time without use).

Thoughts? Can you think of a case where a feature cannot be used effectively?