December 25, 2016

AES-GCM-SIV -- Nonce Reuse Without Abuse

Nonce reuse seems to be a common problem in many implementations of important cryptographic protocols and algorithms. One spectacularly catastrophic outcome of nonce reuse is with AES-GCM, which buckles when even a single nonce and key pair is reused for any message. Consider the GCM algorithm shown below.

GCM

The initial counter is used to derive both the CTR key stream and a OTP used in the tag computation. Given two messages and encrypted with the same key and nonce , then two important facts follow. First, cipher text blocks can be XOR’d together to learn the XOR of the corresponding plaintext blocks. (That’s a basic property of the CTR mode of operation.) Second, and perhaps more importantly, the authentication key can be extracted by Joux’s “forbidden attack” [1,2]. The attack itself is quite elegant. It treats the GCM tag generation as polynomial of the authentication key over . Then, using known values for the tag along with some finite field arithmetic and polynomial factoring, one can extract the authentication key with relatively high probability. (For fun, it turns out that this particular exercise is one of the crypto pals problems in Set 8 [3]. I’m not yet done with these challenges, but I hope to be soon.)

AES-GCM is one of the more common cipher suites in used by TLS 1.2. Particularly, because variants such as RC4 [4] are completely broken and CBC are subject to timing [5] and padding oracle attacks [6]. Per RFC 5288, the nonce for each AES-GCM invocation is composed of an implicit 32-bit “salt” and explicit 64-bit “nonce_explicit” part. The “salt” is derived during the key exchange and the “nonce_explicit” value is chosen by the sender and included in each packet. (Hence the term explicit.) Sadly, the only guidance given by the RFC as to the choice of the explicit part is to use the sequence number. In theory, such a monotonically increasing counter would work just fine. It would mean that a nonce is only reused every records. That’s astonishingly rare and well within reason. However, some implementations apparently get that wrong by either sending duplicate nonces, perhaps due to transmission problems (I’m not sure), or by using a randomly generated initial counter value that happens to collide with a previous session with the same key. Apparently, some even use random values as the “nonce_explicit” for every record, giving us a collision after approximately records. (This is not so large and should be cause for concern.)

There are several ways we could fix this problem. First, we could, by design, prevent key and nonce reuse for GCM. This is what TLS 1.3 strives to do by making the nonce fully implicit based on the record number. Specifically, the nonce is computed by XORing an IV derived from the master secret with the 64-bit sequence number. The nonce is required to be at a minimum 64 bytes, but may be longer if so required by the AEAD algorithm. For example, as per RFC 5116 [7], AES-GCM-128 uses a 12 byte nonce and 4 byte counter to form the complete 16 byte CTR block.

Another path forward is to avoid this key and nonce reuse problem altogether with a different algorithm. Fortunately, in 2007, Phil Rogaway gave us the first example of one such algorithm: SIV, or synthetic initialization vector. The primary idea is that the IV (nonce) is generated from the message itself, rather than being an explicit input into the algorithm. This allows the same key and nonce pair to be reused for different messages without disastrous consequences. Formally, this is known as nonce misuse-resistant authenticated encryption.

SIV algorithms are suitable for non-transport cases where we want authenticated encryption but don’t happen to have a monotonically increasing counter. There have been numerous SIV proposals. And here, we focus on AES-GCM-SIV, proposed last year by Gueron and Lindell [8]. Adam Langley helped translate their paper into a more digestible RFC this year [9]. It’s slowly making its way through the CFRG pipeline. To bring myself up to speed, I implemented a (terribly slow) version of this algorithm a couple months ago. The code is available here and below. The code itself should be fairly readable. The biggest struggle in this particular endeavor was trying to rationalize the byte-wise swaps between the block state and GF(2^128) field coefficients. This is handled by the convert function below.

References: