Paper 2024/1649

Multiplying Polynomials without Powerful Multiplication Instructions (Long Paper)

Vincent Hwang, Max Planck Institute for Security and Privacy
YoungBeom Kim, Kookmin University
Seog Chung Seo, Kookmin University
Abstract

We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli and not the shapes of the moduli. This was unknown in the literature even though modular multiplication has been studied for more than 40 years. In the literature, shape-independent modular multiplications often perform several times slower than long multiplications even if we ignore the cost of the precomputation. (ii) We show that polynomial multiplications based on Nussbaumer fast Fourier transform and Toom–Cook over $\mathbb{Z}_{2^k}$ perform the best when modular multiplications are expensive and $k$ is not very close to the arithmetic precision. For practical evaluation, we implement assembly programs for the polynomial arithmetic used in the digital signature Dilithium on Cortex-M3. For the modular multiplications in Dilithium, our generalized Barrett multiplications are 1.92 times faster than the state-of-the-art assembly-optimized Montgomery multiplications, leading to 1.38−1.51 times faster Dilithium NTT/iNTT. Along with the improvement in accumulating products, the core polynomial arithmetic matrix-vector multiplications are 1.71−1.77 times faster. We further apply the FFT-based polynomial multiplications over $\mathbb{Z}_{2^k}$ to the challenge polynomial multiplication $c t_0$, leading to 1.31 times faster computation for $c t_0$. We additionally apply the ideas to Saber on Cortex-M3 and demonstrate their improvement to Dilithium and Saber on our 8-bit AVR environment. For Saber on Cortex-M3, we show that matrix-vector multiplications with FFT-based polynomial multiplications over $\mathbb{Z}_{2^k}$ are 1.42−1.46 faster than the ones with NTT-based polynomial multiplications over NTT-friendly coefficient rings. When moving to a platform with smaller arithmetic precision, such as 8-bit AVR, we improve the matrix-vector multiplication of Dilithium with our Barrett-based NTT/iNTT by a factor of 1.87−1.89. As for Saber on our 8-bit AVR environment, we show that matrix-vector multiplications with NTT-based polynomial multiplications over NTT-friendly coefficient rings are faster than polynomial multiplications over $\mathbb{Z}_{2^k}$ due to the large $k$ in Saber.

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Published by the IACR in TCHES 2025
Keywords
Lattice-based cryptographyDilithiumSaberBarrett multiplicationMicrocontrollerNussbaumer FFTToom–Cook
Contact author(s)
vincentvbh7 @ gmail com
darania @ kookmin ac kr
scseo @ kookmin ac kr
History
2024-10-14: approved
2024-10-13: received
See all versions
Short URL
https://ia.cr/2024/1649
License
No rights reserved
CC0

BibTeX

@misc{cryptoeprint:2024/1649,
      author = {Vincent Hwang and YoungBeom Kim and Seog Chung Seo},
      title = {Multiplying Polynomials without Powerful Multiplication Instructions (Long Paper)},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/1649},
      year = {2024},
      url = {https://eprint.iacr.org/2024/1649}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.