Landing SPHINCS+ on OpenTitan

Sep 3

This is part 2 of 3 in an experience report about implementing SPHINCS+ (aka SLH-DSA) for secure boot in OpenTitan root of trust (RoT) chips (1, 2). SPHINCS+ is a post-quantum secure signature algorithm and one of the four winners of NIST’s post-quantum cryptography competition; the final standard was recently released as FIPS 205. On this exciting occasion, we hope that sharing our experience with SPHINCS+ will help others who are considering migrating to PQC, especially for firmware signing. Read part 1 here.

This post will focus on the implementation and organizational aspects of adopting SPHINCS+ for OpenTitan’s current-generation “Earl Grey” chips. We’ll cover:

how the idea to use SPHINCS+ for secure boot on Earl Grey came about,
the process of preparing an RFC and getting it approved within OpenTitan, and
how we adapted and optimized the reference implementation to suit Earl Grey.

This will probably be the most detailed and longest blog post in the series; I think a really cool part of working on an open-source project is that we leave a public trail of discussions we’ve had and code we’ve merged, and I want readers to be able to follow that trail. So I’ll link liberally to commits and GitHub issues throughout this post. I also want to draw attention to the non-technical infrastructure I discuss here, especially the RFC process (part of the general “Silicon Commons” approach to collaborative open-source hardware projects) and how OpenTitan development works across multiple organizations.

Idea and Prototyping

The original suggestion to try running SPHINCS+ on OpenTitan came from Peter Schwabe, one of the original authors of SPHINCS+, following a serendipitous meeting at CHES 2022. He suggested that we could probably run the reference implementation efficiently on our platform without any hardware changes. With his help, we were able to strip out the non-verification code, replace the software hash function with calls to our hardware implementation, set up a test case, and successfully verify a signature in hardware simulation the next day. (You can see the excited commit message where I wrote “WORKING!” – I remember clearly the moment when I saw “verification passed” print on my screen for the first time.) The availability of an optimized, public-domain reference implementation was crucial here; it made getting the first prototype off the ground a breeze, and allowed us to continually check our implementation against reference test vectors throughout the whole development cycle to make sure we didn’t introduce bugs.

After that first signature, we spent a few weeks optimizing the implementation and experimenting with different parameter sets. You can see the whole trail of those experiments recorded on my prototyping branch: jadephilipoom/opentitan:sphincsplus. We were able to find a 3x speedup in verification time overall with platform-specific optimizations, mostly adjusting the KMAC block driver and assuming word-aligned buffers everywhere. The reference implementation has to assume that some internal buffers might be byte-aligned, but in OpenTItan’s ROM code there was an early design decision that absolutely everything should be word-aligned, and it’s OK to assume so. In my opinion, this decision has paid off extremely well, with benefits not just for performance but also for defense against power side-channels when handling secret data. (Byte-writes are generally more vulnerable than word-writes in that context, because there are fewer possibilities for the attacker to choose between.)

The benchmarks are all documented on the prototyping branch, along with instructions to reproduce them if you’re curious. We compared all the parameter sets at first but quickly realized that we wanted to target the shake-128s-simple parameter set, so the benchmarks focused on that. Below is the summary of what we tried during the initial prototyping and the effect on performance for shake-128s-simple:

You can find (or create!) benchmarks for other parameter sets by checking out the branch and running more tests.

It’s worth noting that the final version of SPHINCS+ in ROM today is even faster; we were able to bring shake-128s-simple down to just under 13ms, mostly with more word-alignment. After recently switching to SHA2 parameters (see issue #23144 and pull request #23732) for the upcoming Earl Grey chips, the verification takes about 9.3ms.

These experiments showed that SPHINCS+ was fast enough to run as an option for secure boot. It was still about 6x slower than RSA-3072 and 3x slower than ECDSA-P256, so we wouldn’t want to force all users of the chip to run it. However, we could potentially add a field to the chip’s one-time programmable (OTP) memory configuration to let us choose to enable post-quantum secure boot for certain chips at manufacturing time, an option that was suddenly looking feasible much earlier than expected.

Around the time we concluded the optimization experiments, we started evaluating what it would take to land this code in the very first Earl Grey tapeout. The timing was ambitious. This was late November 2022, and we had already scheduled a tapeout for the first chips. Based on that schedule, we needed to lock in the ROM implementation, with no further changes, by June 2023. Seven months to put a working prototype into production use might sound like plenty of time, but not for silicon where you can’t compile your way out of a problem. There is a lot of work in between passing an initial test and having code that is ready to go into ROM. For example, we needed to decide how the PQC option would interact with the existing classical signature verification, change the boot manifests to accommodate multiple signature types, add SPHINCS+ signing capability to the Rust-based opentitantool utility, adjust all the code to match OpenTitan’s specific style guide, and of course run extensive tests – not just on the signature verification core code but also on its integration with the rest of the ROM.

Before we did any of that, because this was a major change we needed to go through the RFC process and seek approval from the OpenTitan Technical Committee. Big decisions in OpenTitan can’t be made unilaterally by design. Lots of organizations collaborate to make the project possible. It’s vitally important to the health of the project that decisions are made fairly and transparently, giving everyone a chance to provide feedback, object to or adjust new proposals.

Creating an RFC

In December 2022, I presented an RFC to the OpenTitan Technical Committee, explaining the results from the initial prototype and optimization experiments and mapping out the estimated cost in terms of implementation effort, boot time, and code size.

One important decision for the RFC was whether enabling the SPHINCS+ secure boot flow should disable the classical flow. We decided if SPHINCS+ was enabled, both the classical and PQC flows would run, and the code would need two signatures. Although SPHINCS+ is based on the security of hash functions and is cryptanalytically low-risk, the flow was also completely new, and we didn’t want to risk introducing a new bug and undermining security. Plus, given that SPHINCS+ took quite a bit longer than classical verification, adding the classical verification time on top of it wasn’t much of a relative cost.

Safe Comparison

Another important detail was how to defend the sensitive final comparison against fault injection attacks. Some general principles behind defending against fault injection attacks in software, at a very high level, are:

It’s easier to glitch one bit than several at once, so avoid situations where a single bit-flip can bypass a security check (e.g. a security-critical if/else statement).
For enums, use values with a high Hamming distance from each other, making it hard for an attacker to glitch one value into another one that sends the code down a different path.
Sometimes, attackers can trick the code into reading unexpected values that exist in the logic (for example, by preventing a register from updating). Therefore, ensure any values that mean “passed security check” are not just hanging around in a register; they should be constructed slowly as the code goes through its intended path.

Like RSA and ECDSA, the signature verification procedure is structured so that it does a big computation on some combination of the public key, message, and signature, then checks the resulting value against part of the original input values (in SPHINCS+, the public key root). If the signature is valid, the two should be equal. If you’re an attacker and you want to bypass secure boot, this is the comparison you should target; by causing that single comparison to say “yes” when it should say “no”, you can cause an invalid signature to be accepted. Luckily, we had already implemented a safe comparison for the classical flow (see pull request #10024, and fellow formal methods nerds might also appreciate the small proof that shows this algorithm produces the right final result). Now, we just needed to adapt the design to work with both signatures.

The original design worked by taking advantage of the fact that we don’t want to produce a boolean true or false from the comparison. Ultimately, we need a “magic” 32-bit constant, kFlashExec, if the comparison succeeds. That high-Hamming-weight constant (and no other value) would allow us to unlock flash for the next boot stage. This makes it easier to defend the implementation from fault attacks, because we’re never relying on a single bit or branch. We generated pre-computed 32-bit shares of the magic value, meaning values x₀ through x_n-1 such that:

We chose the number of shares (n in the equation above) so that the total length of the shares, when concatenated, was equal to the length of the signature values we needed to compare. Then we would start with an all-zero result value. For every word of the two values we needed to compare, we would XOR (⊕) the two next words we needed to compare with the result value and also the next share of the precomputed list of shares. There was also an extra value called “diff” to make sure it was impossible to accidentally arrive at the correct value. It directly checked if the XOR of the two words was zero, and if so (or if the previous diff was nonzero), it set both the result value and the diff to all-ones.

We needed to adapt this design so it could generate kFlashExec only if either:

the classical verification passed and SPHINCS+ was disabled, or
both verifications passed and SPHINCS+ was enabled.

The approach we chose was to pick special values A and B, as well as special values for the “enable” and “disable” OTP settings, so that:

Then, instead of using shares for kFlashExec, we would have the classical signature verification routine use shares for A, and the SPHINCS+ comparison use the same strategy with shares for B. We’d XOR the results of these comparisons with the SPHINCS+ enablement field, so that either of the expected cases (but no unacceptable ones) would construct the correct value.

With this crucial detail planned out, I was ready to present the RFC to the Technical Committee. In one of the committee’s regular meetings, I presented the document and answered questions. Committee members gave feedback and requested more details in certain sections, for example on the changes to the manifest format. Per normal procedures, they didn’t vote to approve the RFC in that first meeting; rather, the RFC contributors made the changes and TC members deliberated offline. At the next meeting, they held a (successful!) vote approving the proposal.

Initial Implementation

By the time the RFC was done and approved, it was early January 2023; we had less than 6 months before the ROM freeze deadline. We quickly got to work. Starting with adjusting the reference implementation to match the code style conventions in the OpenTitan repository, we slowly merged the prototype implementation with about a dozen pull requests (see for example #17093, #17221, #17295, and #17367), each with a small, digestible chunk of code to undergo review. For now, nothing actually called the code.

As of pull request #17326 in late February, we had a complete implementation and were ready to integrate the code into the boot flow and surrounding infrastructure. We reworked how the ROM code represented keys to handle both classical and SPHINCS+ keys (see pull request #18512), and implemented the safe final comparison from the RFC (see pull request #17995). We added SPHINCS+ support for opentitantool based on the pqcrypto Rust crate (see pull requests #18184 and #18041). Finally, we added a new manifest extensions capability to the format for boot manifests, and incorporated the changes into opentitantool so we could sign images with SPHINCS+ signatures from the command line (see pull requests #18584 and #18667). The opentitantool support also allowed us to run integration tests that checked the signature verification code operated as expected with the whole boot sequence and correctly accepted or rejected the signatures from pqcrypto.

Maintenance and Updates

Since the initial implementation for the first Earl Grey tapeout, there have been a few additional updates and changes to the SPHINCS+ code for OpenTitan that will come into effect for the next tapeout. The old version was compatible with the round 3 SPHINCS+ submission to the NIST competition and used SHAKE as the hash function. The new version will be compatible with the FIPS 205 standard and use SHA-256 as the hash function.

First, NIST made a few changes for the FIPS 205 standard that are not backwards compatible. For example, there was a small endianness change in an internal routine that changed signature values completely. We implemented it in May 2024 (see pull request #22953) in preparation for the next tapeout. This was a somewhat tricky change to make, especially since NIST hadn’t released test vectors for it and the pqcrypto crate we used for opentitantool didn’t yet include it. The reference implementation had added a branch with the endianness change, so we were able to re-generate tests for part of our testing infrastructure that way. But another part of our test infrastructure directly pulled the NIST tests from round 3 of the PQC competition, so we needed to change it. Instead, as discussed in the comments on pull request #22953, we set up self-hosted test vectors generated from the right branch of the reference implementation to replace the round 3 tests.

We also needed a new way to generate signatures with opentitantool before the changes would be compatible with our integration tests and signing utilities. We discussed the issue at one of the regular Software Working Group meetings. This is a good forum for smaller-scale decisions, where OpenTitan maintainers from different organizations can informally coordinate and seek feedback on engineering decisions. There, we decided to directly link the reference implementation into opentitantool with bindgen rather than, for instance, look for a different Rust crate. This option would give us more future flexibility, including to experiment with alternative parameter sets like we’ll discuss in the next post. We integrated the reference implementation into our infrastructure and generated the bindings so that I could switch opentitantool to use them (see pull requests #23049 and #23104).

There were also two smaller code changes for the new version; domain separator support to match the FIPS 205 standard (see pull requests #23762 and #23765), and a small bugfix from the upstream reference implementation (see pull request #22894). The bug didn’t affect any of the parameter sets that had been submitted as part of the PQC competition, but was problematic when experimenting with different, alternative parameters.

Finally, we changed the parameter set from SHAKE to SHA-2, as I discussed a bit already in the last post. In May 2024, just before a code freeze, we received a time-sensitive request to switch to SHA2 parameters to maintain compatibility with project partners’ infrastructure. Luckily, we were able to take advantage of the existing organizational and technical infrastructure to evaluate the effort and risk of the change, agree on a plan, and implement and test it in time. First, we wrote an RFC for the change and the TC approved it. To help inform the decision, we did some quick performance estimates to check if there would be an impact on verification speed. In general, SHAKE is faster than SHA-2 in hardware. However, our SHAKE implementation is masked for side-channel protection and the SHA-2 implementation isn’t, so it runs a little bit faster. Also, we had implemented a new save/restore feature for the SHA-2 accelerator hardware since the first tapeout (see pull request #21307), which we could use to accelerate a performance-critical part of SPHINCS+. After the Technical Committee approved the RFC, we updated multiple parts of the test infrastructure to include SHA-2 parameter sets (see pull requests #21681 and #23598), and then implemented the extra bits of code (e.g. MGF1) that we needed and swapped over the implementation (see pull request #23710 and #23732). Then we went through and, using similar techniques as we had for SHAKE as well as the save/restore feature, optimized the code so that it would run a significant few milliseconds faster (see pull request #23761). For code size and schedule reasons, we fully swapped over the implementation and don’t support the SHAKE parameter sets as an option in this version. For future OpenTitan chips, we’re considering supporting both.

Closing Words and Thanks

And of course, like most technical projects, we build on the work of many others; in this case, we benefited greatly from the SPHINCS+ authors making a high-quality reference implementation and test vector generation script available under a permissive open-source license. We strongly believe that accessible, quality implementations are indispensable – in hardware and software alike! So thank you to the SPHINCS+ authors, especially Peter for suggesting that we try running SPHINCS+ and helping us set up the first experiments.

I think of this post as sort of a case study in how a major feature on OpenTitan chips was introduced, accepted and maintained. It takes a village to tape out a chip, and landing this feature required tons of expertise and hard work from people with different specialties (and frequently different employers!) Heartfelt thanks to all of the contributors on the OpenTitan project who wrote design docs, pushed code, reviewed PRs, and adjusted infrastructure to make this possible: Alphan, Jon, Ryan, Chris, and many more.

Stay tuned for the third and final post in this series, where we’ll focus on exciting future possibilities for PQC on OpenTitan: alternative SPHINCS+ parameter sets and lattice cryptography.

Interested in learning more? Sign up for our early access program or contact us at info@zerorisc.com.

(1) The NIST standard, FIPS 205, renames the algorithm to SLH-DSA, but we’ll refer to it as SPHINCS+ for this post. Our implementation is compatible with round-3 SPHINCS+ for last year’s chips and will be compatible with FIPS 205 for future versions.

(2) OpenTitan is a large open-source project stewarded by lowRISC on which many organizations, including zeroRISC, collaborate.

Jade Philipoom