Post-Quantum Secure Boot on OpenTitan

Aug 14

This is part 1 of 3 in an experience report about implementing SPHINCS+ (aka SLH-DSA) for secure boot in OpenTitan root of trust (RoT) chips (1, 2). SPHINCS+ is a post-quantum secure signature algorithm and one of the four winners of NIST’s post-quantum cryptography competition; the final standard was released yesterday as FIPS 205. On this exciting occasion, we hope that sharing our experience with SPHINCS+ will help others who are considering migrating to PQC, especially for firmware signing.

The OpenTitan project has a complete SPHINCS+ implementation in mask ROM, which means post-quantum secure boot has been supported since the very first OpenTitan “Earl Grey” chip samples came back earlier this year. In this post, we’ll cover:

why post-quantum cryptography is important,
a quick, high-level primer on SPHINCS+, and
how SPHINCS+ compares to other post-quantum algorithms in this context.

Stay tuned for future posts with more focus on the implementation and our future plans!

Why now?

First, let’s take a step back and answer the basic question of motivation: why is it important to implement post-quantum cryptography now? No quantum computer currently exists that could break “classical” signature algorithms like RSA or ECDSA. In fact, current known quantum computers are quite far off from that goal; according to the Global Risk Institute’s 2023 survey, quantum computing experts were asked to estimate the chance, by various deadlines, of a quantum computer being able to crack RSA-2048 in a day. The experts gave it a 4-11% probability of happening by 2028, and a 17-31% probability of happening by 2033. The estimates don’t break 50% until 2038. So why start defending against quantum computers now?

For secure silicon, the most crucial factors to consider are that (a) hardware development timelines are long, and (b) the first signature verification during boot is critical to the trust model. A chip we design today might well be in the field 5, 10, or even 20 years from now, and the ROM code that does the signature verification can never be changed after manufacturing. If the ROM can only verify ECDSA signatures, then a quantum attacker who can forge an ECDSA signature could run their own code in early boot. This would break fundamental features of these devices, like OpenTItan’s ownership transfer; the attacker could insert malicious code without the device owner’s knowledge. So, even if the risk is low today, we still need to be ready for an uncertain future.

One notable factor we don’t need to worry about in this context is “harvest now, decrypt later” attacks. In those attacks, a patient attacker collects encrypted communications now and then decrypts them with a quantum computer many years in the future to find out what was said. This is an important concern for many contexts in modern cryptography. However, since secure boot doesn’t involve encryption at all, it’s not a concern here.

Finally, from a big-picture perspective, it will take the world a lot of time to agree on and migrate to post-quantum cryptography. The NIST competition was first announced in 2016, and (some of) the standards are now finally out 8 years later. It took a massive amount of time and effort from cryptographers to invent and analyze dozens of algorithms, but that work was just the start. Post-quantum cryptography generally involves much larger signatures and keys and/or much slower running times than classical cryptography. Migrating real-world live systems to use these algorithms will be a many-year process requiring global coordination; we can’t afford to put it off until it becomes an emergency.

About SPHINCS+

As post-quantum algorithms go, SPHINCS+ is refreshingly familiar. It’s a hash-based algorithm, meaning that it doesn’t rely on any new cryptographic constructs like lattices – instead, the internal mathematical structures are trees of hashed values. That makes SPHINCS+ more conservative security-wise than lattice-based schemes; since we’ve been studying hash functions for a long time, it’s unlikely that a new cryptanalytic result is going to suddenly weaken the security bounds. Hashing speed is the most important performance factor in SPHINCS+ by far. In particular, there’s a chain operation that repeatedly runs the hash function and can take up to 95% of the runtime, depending on the parameter set, platform, and hashing speed. Even with our hardware-accelerated version, where hashing is much faster compared to non-hash operations, the chain operation is about 79% of the runtime.

Another interesting aspect of SPHINCS+ is that it’s a signature framework; you can adjust 6 different parameters and freely swap out the hash function to make signature algorithms with different performance and size characteristics. The authors selected 36 specific parameter sets in their submission to the NIST competition. For each of the 3 security levels they targeted, they picked two settings for the framework parameters, one that targeted small signatures, the “s” parameters, and one that targeted fast signature generation, the “f” parameters. Those 6 options could each be deployed with one of 3 different hash functions (SHA2, SHAKE, and Haraka), and one of two different algorithmic variants “simple” and “robust”. NIST dropped the Haraka option and the “robust” variants, reducing the original 36 parameter sets to 12 for FIPS 205.

So, when we say OpenTitan has SPHINCS+, we need to be a little more specific; the first round of chips supported the shake-128s parameter set, meaning that the hash function is SHAKE, the security level is equivalent to AES-128, and the remaining parameters are tuned for small signatures at the expense of signing speed (3). We chose the AES-128 security level to match our existing classical signature verification. For a lattice-based scheme, it might make sense to go a level up, but since SPHINCS+ is hash-based, our risk assessment concluded it wasn’t necessary in this case. For firmware signing, the “s” small-signature parameter sets are clearly better suited than the fast-signing “f” parameter sets; “s” has faster signature verification time as well as smaller signatures, and since signing happens infrequently, there’s no problem with waiting a bit longer to generate signatures.

The next tapeout of Earl Grey chips will support the sha2-128s parameter set; the same settings, except with SHAKE swapped out for SHA-2. OpenTitan has hardware accelerators for both operations (the KMAC block for SHAKE and other Keccak-family functions, and the HMAC block for SHA-2), so either option works well. For signing or key generation, when secret values are involved, it would definitely make more sense to use SHAKE, because the KMAC block has masking measures to protect against physical side-channel attacks and the HMAC block does not. However, since verification doesn’t handle secret values, the lack of masking measures actually becomes an advantage. SHA-2 operations run slightly faster than SHAKE on OpenTitan because they don’t include overhead from masking. We also considered that it might be easier to interoperate with code-signing infrastructure using SHA-2 than SHAKE, since SHAKE is newer and not everything yet supports it. In the future, both hash functions may be supported.

Why SPHINCS+?

In addition to SPHINCS+, NIST is standardizing two other new signature algorithms: Falcon and Dilithium. (Dilithium is already released as ML-DSA in FIPS 204; the Falcon standard is not yet published.) There’s also LMS and XMSS, stateful hash-based signatures that have been standardized for some time. So, out of all of these options, why does SPHINCS+ make sense for OpenTitan?

First, let’s address the stateful hash-based signature schemes, LMS and XMSS. On the surface, they seem to have generally better stats than SPHINCS+ 128s. The LMS parameter set (n=24, h=20, w=4), for example, has signatures about ¼ the size of 128s, and probably would verify signatures about twice as fast on OpenTitan. Furthermore,And these schemes are hash-based, so they are about as safe from new cryptanalytic breaks as SPHINCS+ is. However, there’s one big catch: the “stateful” part. LMS and XMSS maintain a set of one-time-use keys, and must remember which ones they’ve already used. If you ever sign twice without changing the state then the security guarantees immediately break down. This poses a fair amount of operational risk. For example, backing up and restoring a stateful private key must be done very carefully; a signature between the backup and the restore could mean game over. In this case, the OpenTitan project decided we’d rather deal with large signatures than accept additional complexity for the signing infrastructure, but this is ultimately a judgment call. You can find more discussion of the risks and mitigation techniques for stateful signatures in IETF RFC 8554 and the public comments for the LMS/XMSS NIST standard.

So what about Falcon and Dilithium? The bottom line is that, given the current Earl Grey hardware and OpenTitan’s security requirements, these algorithms would be somewhat slower and riskier than SPHINCS+, and the reduction in public key + signature size they would offer is not game-changing enough to justify those tradeoffs. (For the future, we are optimistic about hardware modifications that would accelerate lattice cryptography on OpenTitan, which we’ll discuss in more detail in a later post.)

Falcon and Dilithium are based on newer, lattice-based cryptography rather than hash functions. This means that there’s a higher risk of new attacks that would weaken their security bounds or potentially break them completely. It’s not a hypothetical concern; this is exactly what happened to SIKE, an isogenies-based scheme that had advanced to the final stages of the NIST competition and withstood years of analysis. Given the long timescales of hardware and the fact that the signature scheme can’t ever be updated, this is problematic. We could, however, minimize the risk by using security levels one step higher than we strictly need, meaning Falcon-1024 or Dilithium3 (aka ML-DSA-65) instead of Falcon-512 or Dilithium2. This is in line with the recommendation of the Dilithium authors themselves.

Even with these beefier parameter sets, we’d get a substantial reduction in public key + signature size compared to SPHINCS+, which is at about 8kB, versus 5kB for Dilithium3 and 3kB for Falcon-1024. Dilithium and Falcon public keys are too large to store in the chip’s OTP like we do for ECDSA and SPHINCS+, but we could get around this issue by hashing the public key, storing only the hash, and passing the full public key along with the signature. Therefore, it makes sense to look at the combined public key + signature size here to understand the amount of data we’d need to include with the signature in practice. Any of those numbers take a significant chunk of space away from the space we have for the code we’re signing, but SPHINCS+ is significantly larger than the lattice-based schemes. Thus,So signature size is definitely a favorable point in favor of Dilithium and Falcon.

Performance is a bit tricky to estimate, but we can get a rough idea from the pqm4 project’s benchmarks for ARM Cortex M4. Thanks to the pqm4 authors for making these incredibly easy to access and interpret! OpenTitan’s Ibex core is vaguely similar to the Cortex M4 in that it’s a memory-constrained 32-bit processor. However, there are some major differences: for example, Ibex doesn’t have floating-point instructions. This is more important for Falcon than Dilithium, so our estimates are more certain for the latter. With that disclaimer, the most relevant benchmarks are reproduced here, alongside OpenTitan’s SPHINCS+ measurements:

Let’s break this down a bit. The “cycles” column records the CPU time needed for signature verification. The “hash %” column is the amount of time spent on hashing; in the case of both Dilithium and Falcon, the hashing is SHAKE. This column gives us some insight into how much we can speed up the implementation on OpenTitan by using the SHAKE accelerator, compared to a platform without hash acceleration. So, even though the Dilithium runtimes are slower, we have a bit more leeway to speed them up than we do for Falcon. With SPHINCS+, since the vast majority of runtime is hashing, we can get really dramatic speedups.

The last two columns are important factors for Earl Grey’s memory-constrained environment. The “memory” column records how much stack space the implementation needs, and the “code size” column records the amount of space needed to store the code itself. Unfortunately, pqm4’s benchmarks don’t include a code size metric for the verify routine on its own. Still, we can get a rough idea of where the dragons lie. For example, since our ROM is only 32kB, we can make an educated guess that the Falcon implementations might be difficult for us to fit. These memory metrics are definitely a point in the column for SPHINCS+.

We can – again, very roughly – estimate the speedup more precisely with some back-of-the-envelope linear equations. If we make the approximation that the non-hash components of the implementation will perform similarly, we can use the known values for SPHINCS+ to solve for the difference in hash performance.

In the above equations, t is the total number of cycles for OpenTitan’s Earl Grey and the Cortex M4 respectively, n is the time spent on non-hashing operations, h is the number of hashing operations, and s is the average time taken for a hashing operation. If we know the total cycles on both platforms, and we know the ratio of time that’s spent on hashing for Cortex M4 (which lets us solve for n), we can derive the value of (sot / sm4), a measurement of how long the average Earl Grey hashing operation takes compared to the Cortex M4 version. Applying these estimates to the known SPHINCS+ shake-128s numbers tells us that the hardware-accelerated SHAKE takes on average 3% of the time that the software SHAKE does, and gives us a ballpark estimate of around 1.5 million cycles for “clean” Dilithium3 and 3.4 million cycles for the “m4stack” variant. Falcon-1024 comes out to 1.2 million cycles for “clean” and 600K cycles for “m4-ct”.

Because the estimate makes some sweeping assumptions about the similarity of “Earl Grey” and Cortex M4, and the hashing operations for the different schemes, we shouldn’t interpret this as anything more precise than a ballpark estimate. Still, the rough numbers don’t give us reason to believe that Falcon or Dilithium verification would run much faster than SPHINCS+ on our current hardware, and they might even run slower, especially when we consider that code size might disqualify “m4-ct” Falcon. This makes sense, simply because Earl Grey is currently better at accelerating hash-based computations than lattice-based ones. For example, there are currently no vector instructions on Earl Grey, which are very handy for lattice cryptography.

So, in summary, we expect that on current Earl Grey hardware:

speed comes out slightly in favor of SPHINCS+
signature size is better with Dilithium and Falcon
code size + stack usage is lower with SPHINCS+
cryptanalytic risk is lower with SPHINCS+

Since the SPHINCS+ signature size of 8kB is in the high-but-manageable range for today’s Earl Grey chips, SPHINCS+ is more suited to this use-case. This doesn’t mean we’re not excited about lattice cryptography – quite the opposite! But for now, for this purpose, SPHINCS+ just makes the most sense.

That’s all for part 1! In part 2, we’ll focus more on implementation and organizational details: how the code actually landed in time for tapeout and how OpenTitan’s RFC process works for big changes like this. Then in part 3, we’ll focus on the future: how the tradeoff space discussed in this post may change with better lattice cryptography support and new SPHINCS+ parameter sets.

(1) The NIST standard, FIPS 205, renames the algorithm to SLH-DSA, but we’ll refer to it as SPHINCS+ for this post. Our implementation is compatible with round-3 SPHINCS+ for last year’s chips and will be compatible with FIPS 205 for future versions.

(2) OpenTitan is a large open-source project stewarded by lowRISC on which many organizations, including zeroRISC, collaborate.

(3) This is also called “L1” in the context of the NIST competition. The reason to say roundabout things like “equivalent to AES-128” or “L1” instead of “128 bits of security” is because of Grover’s algorithm, which theoretically halves security bound for quantum computers brute-forcing large numbers. However, it’s questionable whether Grover weakens the security bound significantly in practice, so saying “equivalent to AES-128” lets us all put it aside and allow a specific exception for quantum brute-force.

Jade Philipoom

Post-Quantum Secure Boot on OpenTitan

Why now?

About SPHINCS+

Why SPHINCS+?

Landing SPHINCS+ on OpenTitan

Introducing the zeroRISC Technical Blog