[en] This thesis studies techniques for the efficient implementation of public-key cryptography on resource-constrained computing devices such as wireless sensor nodes. We focus on three different families of public-key cryptosystems, namely RSA, elliptic curve schemes (e.g. ECDH), as well as a lattice-based encryption algorithm using the ring variant of the Learning With Errors problem (Ring-LWE).
In the context of efficient RSA implementation, this thesis first describes the Reverse Product Scanning (RPS) technique for multi-precision multiplication and squaring on 8-bit AVR processors. Then, we analyze and compare six algorithms for software implementation of Montgomery multiplication and shed some new light on the relative performance of the different variants. Finally, we present a highly-optimized 1024-bit RSA software that achieves record-setting execution times on 8-bit ATmega processors thanks to a combination of sophisticated implementation options including the Chinese Remainder Theorem (CRT), the m-ary exponentiation method, as well as the RPS multiplication technique.
The main contribution of this thesis lies in the area of Elliptic Curve Cryptography (ECC), where we describe three different implementations for wireless sensor nodes equipped with 8 and 16-bit processors. First, we present several techniques to improve the performance of prime-field arithmetic operations (e.g. optimized operand caching, sliding block doubling, and Karatsuba block squaring) and report the fastest implementation of key exchange and signature generation/verification based on the NIST P-192 curve. Second, we introduce an efficient yet highly-regular library for arithmetic in Optimal Prime Fields (OPFs) and propose a novel approach for the implementation of ephemeral ECDH key exchange that exploits the birational equivalence of Montgomery and twisted Edwards curves. Third, we describe an algorithm for the
generation of so-called MoTE curves that allows one to use the fastest formulae for point addition/doubling in both the Montgomery form and the twisted Edwards form. Finally, we describe efficient ECC implementations for 159 and 191-bit MoTE curves that achieve record-setting execution times for scalar multiplication on a 16-bit MSP430 processor.
The third part of this thesis deals with the efficient implementation of Ring-LWE encryption and proposes novel approaches to speed up the Number Theoretic Transform (NTT) used for polynomial multiplication. Our contributions in this area include the Move-and-Add (MA) method for coefficient multiplication, the Shift-Add-Multiply-Subtract-Subtract (SAMS2) technique for modular reduction, and an optimized variant for Montgomery multiplication. Furthermore, we describe an optimized coefficient storage method to minimize the RAM footprint of NTT multiplications. Finally, we propose a special byte-scanning technique to reduce the execution time of discrete Gaussian sampling based on the Knuth-Yao random walk algorithm.