Browse ORBi

- What it is and what it isn't
- Green Road / Gold Road?
- Ready to Publish. Now What?
- How can I support the OA movement?
- Where can I learn more?

ORBi

Parallel Implementation of SM2 Elliptic Curve Cryptography on Intel Processors with AVX2 ; ; et al in Liu, Joseph K.; Cui, Hui (Eds.) Information Security and Privacy, 25th Australasian Conference, ACISP 2020, Perth, WA, Australia, November 30 - December 2, 2020, Proceedings (2020, November) This paper presents an efficient and secure implementation of SM2, the Chinese elliptic curve cryptography standard that has been adopted by the International Organization of Standardization (ISO) as ISO ... [more ▼] This paper presents an efficient and secure implementation of SM2, the Chinese elliptic curve cryptography standard that has been adopted by the International Organization of Standardization (ISO) as ISO/IEC 14888-3:2018. Our SM2 implementation uses Intel’s Advanced Vector Extensions version 2.0 (AVX2), a family of three-operand SIMD instructions operating on vectors of 8, 16, 32, or 64-bit data elements in 256-bit registers, and is resistant against timing attacks. To exploit the parallel processing capabilities of AVX2, we studied the execution flows of Co-Z Jacobian point arithmetic operations and introduce a parallel 2-way Co-Z addition, Co-Z conjugate addition, and Co-Z ladder algorithm, which allow for fast Co-Z scalar multiplication. Furthermore, we developed an efficient 2-way prime-field arithmetic library using AVX2 to support our Co-Z Jacobian point operations. Both the field and the point operations utilize branch-free (i.e. constant-time) implementation techniques, which increase their ability to resist Simple Power Analysis (SPA) and timing attacks. Our software for scalar multiplication on the SM2 curve is, to our knowledge, the first constant-time implementation of the Co-Z based ladder that leverages the parallelism of AVX2. [less ▲] Detailed reference viewed: 39 (2 UL)Fast and Flexible Elliptic Curve Cryptography for Dining Cryptographers Networks Dupont, Briag Gerard Benjamin ; Franck, Christian ; Groszschädl, Johann in Bouzefrane, Samia; Laurent, Maryline; Boumerdassi, Selma (Eds.) et al Mobile, Secure, and Programmable Networking, 6th International Conference, MSPN 2020, Paris, France, October 28–29, 2020, Revised Selected Papers (2020, October) A Dining Cryptographers network (DCnet for short) allows anonymous communication with sender and receiver untraceability even if an adversary has unlimited access to the connection metadata of the network ... [more ▼] A Dining Cryptographers network (DCnet for short) allows anonymous communication with sender and receiver untraceability even if an adversary has unlimited access to the connection metadata of the network. Originally introduced by David Chaum in the 1980s, DCnets were for a long time considered not practical for real-world applications because of the tremendous communication and computation overhead they introduce. However, technological innovations such as 5G networks and extremely powerful 64-bit processors make a good case to reassess the practicality of DCnets. In addition, recent advances in elliptic-curve based commitment schemes and Zero-Knowledge Proofs (ZKPs) provide a great opportunity to reduce the computational cost of modern DCnets that are able to detect malicious behavior of communicating parties. In this paper we introduce X64ECC, a self-contained library for Elliptic Curve Cryptography (ECC) developed from scratch to support all the public-key operations needed by modern DCnets: key exchange, digital signatures, Pedersen commitments, and ZKPs. X64ECC is written in C and uses compiler intrinsics to speed up performance-critical arithmetic operations. It is highly scalable and works with Montgomery curves and twisted Edwards curves of different cryptographic strength. Despite its high scalability and portability, X64ECC is able to compute a fixed-base scalar multiplication on a twisted Edwards curve over a 255-bit prime field in about 145,000 clock cycles on a modern Intel X64 processor. All cryptosystems can be adapted on-the-fly (i.e. without recompilation) to implement DCnets with arbitrary message sizes, and tradeoffs between the cryptographic strength and throughput of a DCnet are possible. [less ▲] Detailed reference viewed: 38 (5 UL)Alzette: A 64-Bit ARX-box (Feat. CRAX and TRAX) ; Biryukov, Alex ; Cardoso Dos Santos, Luan et al in Micciancio, Daniele; Ristenpart, Thomas (Eds.) Advances in Cryptology -- CRYPTO 2020, 40th Annual International Cryptology Conference, CRYPTO 2020, Santa Barbara, CA, USA, August 17-21, 2020, Proceedings, Part III (2020, August) S-boxes are the only source of non-linearity in many symmetric primitives. While they are often defined as being functions operating on a small space, some recent designs propose the use of much larger ... [more ▼] S-boxes are the only source of non-linearity in many symmetric primitives. While they are often defined as being functions operating on a small space, some recent designs propose the use of much larger ones (e.g., 32 bits). In this context, an S-box is then defined as a subfunction whose cryptographic properties can be estimated precisely. We present a 64-bit ARX-based S-box called Alzette, which can be evaluated in constant time using only 12 instructions on modern CPUs. Its parallel application can also leverage vector (SIMD) instructions. One iteration of Alzette has differential and linear properties comparable to those of the AES S-box, and two are at least as secure as the AES super S-box. As the state size is much larger than the typical 4 or 8 bits, the study of the relevant cryptographic properties of Alzette is not trivial. We further discuss how such wide S-boxes could be used to construct round functions of 64-, 128- and 256-bit (tweakable) block ciphers with good cryptographic properties that are guaranteed even in the related-tweak setting. We use these structures to design a very lightweight 64-bit block cipher (Crax) which outperforms SPECK-64/128 for short messages on micro-controllers, and a 256-bit tweakable block cipher (Trax) which can be used to obtain strong security guarantees against powerful adversaries (nonce misuse, quantum attacks). [less ▲] Detailed reference viewed: 137 (16 UL)Lightweight AEAD and Hashing using the Sparkle Permutation Family Beierle, Christof ; Biryukov, Alex ; Cardoso Dos Santos, Luan et al in IACR Transactions on Symmetric Cryptology (2020), 2020(S1), 208-261 We introduce the Sparkle family of permutations operating on 256, 384 and 512 bits. These are combined with the Beetle mode to construct a family of authenticated ciphers, Schwaemm, with security levels ... [more ▼] We introduce the Sparkle family of permutations operating on 256, 384 and 512 bits. These are combined with the Beetle mode to construct a family of authenticated ciphers, Schwaemm, with security levels ranging from 120 to 250 bits. We also use them to build new sponge-based hash functions, Esch256 and Esch384. Our permutations are among those with the lowest footprint in software, without sacrificing throughput. These properties are allowed by our use of an ARX component (the Alzette S-box) as well as a carefully chosen number of rounds. The corresponding analysis is enabled by the long trail strategy which gives us the tools we need to efficiently bound the probability of all the differential and linear trails for an arbitrary number of rounds. We also present a new application of this approach where the only trails considered are those mapping the rate to the outer part of the internal state, such trails being the only relevant trails for instance in a differential collision attack. To further decrease the number of rounds without compromising security, we modify the message injection in the classical sponge construction to break the alignment between the rate and our S-box layer. [less ▲] Detailed reference viewed: 70 (13 UL)A Lightweight Implementation of NTRU Prime for the Post-Quantum Internet of Things Cheng, Hao ; ; Groszschädl, Johann et al in Laurent, Maryline; Giannetsos, Thanassis (Eds.) Information Security Theory and Practice, 13th IFIP WG 11.2 International Conference, WISTP 2019, Paris, France, December 11–12, 2019, Proceedings (2019, December) The dawning era of quantum computing has initiated various initiatives for the standardization of post-quantum cryptosystems with the goal of (eventually) replacing RSA and ECC. NTRU Prime is a variant of ... [more ▼] The dawning era of quantum computing has initiated various initiatives for the standardization of post-quantum cryptosystems with the goal of (eventually) replacing RSA and ECC. NTRU Prime is a variant of the classical NTRU cryptosystem that comes with a couple of tweaks to minimize the attack surface; most notably, it avoids rings with "worrisome" structure. This paper presents, to our knowledge, the first assembler-optimized implementation of Streamlined NTRU Prime for an 8-bit AVR microcontroller and shows that high-security lattice-based cryptography is feasible for small IoT devices. An encapsulation operation using parameters for 128-bit post-quantum security requires 8.2 million clock cycles when executed on an 8-bit ATmega1284 microcontroller. The decapsulation is approximately twice as costly and has an execution time of 15.6 million cycles. We achieved this performance through (i) new low-level software optimization techniques to accelerate Karatsuba-based polynomial multiplication on the 8-bit AVR platform and (ii) an efficient implementation of the coefficient modular reduction written in assembly language. The execution time of encapsulation and decapsulation is independent of secret data, which makes our software resistant against timing attacks. Finally, we assess the performance one could theoretically gain by using a so-called product-form polynomial as part of the secret key and discuss potential security implications. [less ▲] Detailed reference viewed: 186 (28 UL)FELICS-AEAD: Benchmarking of Lightweight Authenticated Encryption Algorithms Cardoso Dos Santos, Luan ; Groszschädl, Johann ; Biryukov, Alex in Belaïd, Sonia; Güneysu, Tim (Eds.) Smart Card Research and Advanced Applications, 18th International Conference, CARDIS 2019, Prague, Czech Republic, November 11–13, 2019, Revised Selected Papers (2019, November) Cryptographic algorithms that can simultaneously provide both encryption and authentication play an increasingly important role in modern security architectures and protocols (e.g. TLS v1.3). Dozens of ... [more ▼] Cryptographic algorithms that can simultaneously provide both encryption and authentication play an increasingly important role in modern security architectures and protocols (e.g. TLS v1.3). Dozens of authenticated encryption systems have been designed in the past five years, which has initiated a large body of research in cryptanalysis. The interest in authenticated encryption has further risen after the National Institute of Standards and Technology (NIST) announced an initiative to standardize "lightweight" authenticated ciphers and hash functions that are suitable for resource-constrained devices. However, while there already exist some cryptanalytic results on these recent designs, little is known about their performance, especially when they are executed on small 8, 16, and 32-bit microcontrollers. In this paper, we introduce an open-source benchmarking tool suite for a fair and consistent evaluation of Authenticated Encryption with Associated Data (AEAD) algorithms written in C or assembly language for 8-bit AVR, 16-bit MSP430, and 32-bit ARM Cortex-M3 platforms. The tool suite is an extension of the FELICS benchmarking framework and provides a new AEAD-specific low-level API that allows users to collect very fine-grained and detailed results for execution time, RAM consumption, and binary code size in a highly automated fashion. FELICS-AEAD comes with two pre-defined evaluation scenarios, which were developed to resemble security-critical operations commonly carried out by real IoT applications to ensure the benchmarks are meaningful in practice. We tested the AEAD tool suite using five authenticated encryption algorithms, namely AES-GCM and the CAESAR candidates ACORN, ASCON, Ketje-Jr, and NORX, and present some preliminary results. [less ▲] Detailed reference viewed: 128 (19 UL)Triathlon of Lightweight Block Ciphers for the Internet of Things Dinu, Dumitru-Daniel ; Le Corre, Yann ; Khovratovich, Dmitry et al in Journal of Cryptographic Engineering (2019), 9(3), 283-302 In this paper, we introduce a framework for the benchmarking of lightweight block ciphers on a multitude of embedded platforms. Our framework is able to evaluate the execution time, RAM footprint, as well ... [more ▼] In this paper, we introduce a framework for the benchmarking of lightweight block ciphers on a multitude of embedded platforms. Our framework is able to evaluate the execution time, RAM footprint, as well as binary code size, and allows one to define a custom "figure of merit" according to which all evaluated candidates can be ranked. We used the framework to benchmark implementations of 19 lightweight ciphers, namely AES, Chaskey, Fantomas, HIGHT, LBlock, LEA, LED, Piccolo, PRESENT, PRIDE, PRINCE, RC5, RECTANGLE, RoadRunneR, Robin, Simon, SPARX, Speck, and TWINE, on three microcontroller platforms: 8-bit AVR, 16-bit MSP430, and 32-bit ARM. Our results bring some new insights into the question of how well these lightweight ciphers are suited to secure the Internet of things. The benchmarking framework provides cipher designers with an easy-to-use tool to compare new algorithms with the state of the art and allows standardization organizations to conduct a fair and consistent evaluation of a large number of candidates. [less ▲] Detailed reference viewed: 166 (2 UL)A Lightweight Implementation of NTRUEncrypt for 8-bit AVR Microcontrollers Cheng, Hao ; Groszschädl, Johann ; Roenne, Peter et al E-print/Working paper (2019) Introduced in 1996, NTRUEncrypt is not only one of the earliest but also one of the most scrutinized lattice-based cryptosystems and a serious contender in NIST’s ongoing Post-Quantum Cryptography (PQC ... [more ▼] Introduced in 1996, NTRUEncrypt is not only one of the earliest but also one of the most scrutinized lattice-based cryptosystems and a serious contender in NIST’s ongoing Post-Quantum Cryptography (PQC) standardization project. An important criterion for the assessment of candidates is their computational cost in various hardware and software environments. This paper contributes to the evaluation of NTRUEncrypt on the ATmega class of AVR microcontrollers, which belongs to the most popular 8-bit platforms in the embedded domain. More concretely, we present AvrNtru, a carefully-optimized implementation of NTRUEncrypt that we developed from scratch with the goal of achieving high performance and resistance to timing attacks. AvrNtru complies with version 3.3 of the EESS#1 specification and supports recent product-form parameter sets like ees443ep1, ees587ep1, and ees743ep1. A full encryption operation (including mask generation and blinding- polynomial generation) using the ees443ep1 parameters takes 834,272 clock cycles on an ATmega1281 microcontroller; the decryption is slightly more costly and has an execution time of 1,061,683 cycles. When choosing the ees743ep1 parameters to achieve a 256-bit security level, 1,539,829 clock cycles are cost for encryption and 2,103,228 clock cycles for decryption. We achieved these results thanks to a novel hybrid technique for multiplication in truncated polynomial rings where one of the operands is a sparse ternary polynomial in product form. Our hybrid technique is inspired by Gura et al’s hybrid method for multiple-precision integer multiplication (CHES 2004) and takes advantage of the large register file of the AVR architecture to minimize the number of load instructions. A constant-time multiplication in the ring specified by the ees443ep1 parameters requires only 210,827 cycles, which sets a new speed record for the arithmetic component of a lattice-based cryptosystem on an 8-bit microcontroller. [less ▲] Detailed reference viewed: 175 (32 UL)Alzette: A 64-bit ARX-box Beierle, Christof ; Biryukov, Alex ; Cardoso Dos Santos, Luan et al E-print/Working paper (2019) S-boxes are the only source of non-linearity in many symmetric primitives. While they are often defined as being functions operating on a small space, some recent designs propose the use of much larger ... [more ▼] S-boxes are the only source of non-linearity in many symmetric primitives. While they are often defined as being functions operating on a small space, some recent designs propose the use of much larger ones (e.g., 32 bits). In this context, an S-box is then defined as a subfunction whose cryptographic properties can be estimated precisely. In this paper, we present a 64-bit ARX-based S-box called Alzette, which can be evaluated in constant time using only 12 instructions on modern CPUs. Its parallel application can also leverage vector (SIMD) instructions. One iteration of Alzette has differential and linear properties comparable to those of the AES S-box, while two iterations are at least as secure as the AES super S-box. Since the state size is much larger than the typical 4 or 8 bits, the study of the relevant cryptographic properties of Alzette is not trivial. [less ▲] Detailed reference viewed: 95 (5 UL)A Family of Lightweight Twisted Edwards Curves for the Internet of Things Ghatpande, Sankalp ; Groszschädl, Johann ; Liu, Zhe in Blazy, Olivier; Yeun, Chan Y. (Eds.) Information Security Theory and Practice, 12th IFIP WG 11.2 International Conference, WISTP 2018, Brussels, Belgium, December 10-11, 2018, Proceedings (2018, December) We introduce a set of four twisted Edwards curves that satisfy common security requirements and allow for fast implementations of scalar multiplication on 8, 16, and 32-bit processors. Our curves are ... [more ▼] We introduce a set of four twisted Edwards curves that satisfy common security requirements and allow for fast implementations of scalar multiplication on 8, 16, and 32-bit processors. Our curves are defined by an equation of the form -x^2 + y^2 = 1 + dx^2y^2 over a prime field Fp, where d is a small non-square modulo p. The underlying prime fields are based on "pseudo-Mersenne" primes given by p = 2^k - c and have in common that p is congruent to 5 modulo 8, k is a multiple of 32 minus 1, and c is at most eight bits long. Due to these common features, our primes facilitate a parameterized implementation of the low-level arithmetic so that one and the same arithmetic function is able to process operands of different length. Each of the twisted Edwards curves we introduce in this paper is birationally equivalent to a Montgomery curve of the form -(A+2)y^2 = x^3 + Ax^2 + x where 4/(A+2) is small. Even though this contrasts with the usual practice of choosing A such that (A+2)/4 is small, we show that the Montgomery form of our curves allows for an equally efficient implementation of point doubling as Curve25519. The four curves we put forward roughly match the common security levels of 80, 96, 112 and 128 bits. In addition, their Weierstraß representations are isomorphic to curves of the form y^2 = x^3 - 3x + b so as to facilitate inter-operability with TinyECC and other legacy software. [less ▲] Detailed reference viewed: 335 (32 UL)Efficient Implementation of the SHA-512 Hash Function for 8-bit AVR Microcontrollers Cheng, Hao ; ; Groszschädl, Johann in Lanet, Jean-Louis; Toma, Cristian (Eds.) Innovative Security Solutions for Information Technology and Communications, 11th International Conference, SecITC 2018, Bucharest, Romania, November 8-9, 2018, Revised Selected Papers (2018, November) SHA-512 is a member of the SHA-2 family of cryptographic hash algorithms that is based on a Davies-Mayer compression function operating on eight 64-bit words to produce a 512-bit digest. It provides ... [more ▼] SHA-512 is a member of the SHA-2 family of cryptographic hash algorithms that is based on a Davies-Mayer compression function operating on eight 64-bit words to produce a 512-bit digest. It provides strong resistance to collision and preimage attacks, and is assumed to remain secure in the dawning era of quantum computers. However, the compression function of SHA-512 is challenging to implement on small 8 and 16-bit microcontrollers because of their limited register space and the fact that 64-bit rotations are generally slow on such devices. In this paper, we present the first highly-optimized Assembler implementation of SHA-512 for the ATmega family of 8-bit AVR microcontrollers. We introduce a special optimization technique for the compression function based on a duplication of the eight working variables so that they can be more efficiently loaded from RAM via the indirect addressing mode with displacement (using the ldd and std instruction). In this way, we were able to achieve high performance without unrolling the main loop of the compression function, thereby keeping the code size small. When executed on an 8-bit AVR ATmega128 microcontroller, the compression function takes slightly less than 60k clock cycles, which corresponds to a compression rate of roughly 467 cycles per byte. The binary code size of the full SHA-512 implementation providing a standard Init-Update-Final (IUF) interface amounts to approximately 3.5 kB. [less ▲] Detailed reference viewed: 307 (36 UL)Energy-Scalable Montgomery-Curve ECDH Key Exchange for ARM Cortex-M3 Microcontrollers Franck, Christian ; Groszschädl, Johann ; Le Corre, Yann et al in Awan, Irfan; Younas, Muhammad; Portela, Filipe (Eds.) Proceedings of the 6th International Conference on Future Internet of Things and Cloud Workshops (W-FICLOUD 2018) (2018, August) The number of smart devices connected to the Internet is growing at an enormous pace and will reach 30 billion within the next five years. A large fraction of these devices have limited processing ... [more ▼] The number of smart devices connected to the Internet is growing at an enormous pace and will reach 30 billion within the next five years. A large fraction of these devices have limited processing capabilities and energy supply, which makes the execution of computation-intensive cryptographic algorithms very costly. This problem is exacerbated by the fact that basic optimization techniques like loop unrolling can not (always) be applied since cryptographic software for the IoT often needs to meet strict constraints on code size to not exceed the program storage capacity of the target device. In this paper we introduce SECCCM3, a "lightweight" software library for scalable elliptic curve cryptography on ARM Cortex-M3 microcontrollers. The current version of SECCCM3 is able to carry out variable-base scalar multiplication on Montgomery-form curves over pseudo-Mersenne prime fields, such as Curve25519, and can be used to implement static ECDH key exchange. SECCCM3 is scalable in the sense that it supports curves of different order (as long as certain conditions are met), thereby enabling trade-offs between security and execution time (resp. energy dissipation). We made an effort to protect the field arithmetic against Timing Attacks (TAs) and Simple Power Analysis (SPA), taking into account the so-called early-termination effect of the Cortex-M3 integer multiplier, which makes the latency of "long" multiply instructions operand-dependent. Our experiments show that the integration of countermeasures against information leakage caused by this effect increases the execution time by 34%, while the code size grows by 13%. A TA and SPA-resistant scalar multiplication on Curve25519 has an execution time of 4.565 million clock cycles and consumes approximately 5.1 mJ of energy when executed on a STM32L152RE Cortex-M3 microcontroller. SECCCM3 has a binary code size of 4.0 kB, which includes domain parameters for curves over 159, 191, 223, and 255-bit prime fields. [less ▲] Detailed reference viewed: 151 (6 UL)Micro-Architectural Power Simulator for Leakage Assessment of Cryptographic Software on ARM Cortex-M3 Processors Le Corre, Yann ; Groszschädl, Johann ; Dinu, Dumitru-Daniel in Fan, Junfeng; Gierlichs, Benedikt (Eds.) Constructive Side-Channel Analysis and Secure Design - 9th International Workshop, COSADE 2018, Singapore, April 23-24, 2018, Proceedings (2018, April) Masking is a common technique to protect software implementations of symmetric cryptographic algorithms against Differential Power Analysis (DPA) attacks. The development of a properly masked version of a ... [more ▼] Masking is a common technique to protect software implementations of symmetric cryptographic algorithms against Differential Power Analysis (DPA) attacks. The development of a properly masked version of a block cipher is an incremental and time-consuming process since each iteration of the development cycle involves a costly leakage assessment. To achieve a high level of DPA resistance, the architecture-specific leakage properties of the target processor need to be taken into account. However, for most embedded processors, a detailed description of these leakage properties is lacking and often not even the HDL model of the micro-architecture is openly available. Recent research has shown that power simulators for leakage assessment can significantly speed up the development process. Unfortunately, few such simulators exist and even fewer take target-specific leakages into account. To fill this gap, we present MAPS, a micro-architectural power simulator for the M3 series of ARM Cortex processors, one of today's most widely-used embedded platforms. MAPS is fast, easy to use, and able to model the Cortex-M3 pipeline leakages, in particular the leakage introduced by the pipeline registers. The M3 leakage properties are inferred from its HDL source code, and therefore MAPS does not need a complicated and expensive profiling phase. Taking first-order masked Assembler implementations of the lightweight cipher Simon as example, we study how the pipeline leakages manifest and discuss some guidelines on how to avoid them. [less ▲] Detailed reference viewed: 105 (5 UL)Micro-architectural Power Simulator for Leakage Assessment of Cryptographic Software on ARM Cortex-M3 Processors Le Corre, Yann ; Groszschädl, Johann ; Dinu, Dumitru-Daniel in Fan, Junfeng; Gierlichs, Benedikt (Eds.) Constructive Side-Channel Analysis and Secure Design - 9th International Workshop, COSADE 2018, Singapore, April 23-24, 2018, Proceedings (2018, April) Masking is a common technique to protect software implementations of symmetric cryptographic algorithms against Differential Power Analysis (DPA) attacks. The development of a properly masked version of a ... [more ▼] Masking is a common technique to protect software implementations of symmetric cryptographic algorithms against Differential Power Analysis (DPA) attacks. The development of a properly masked version of a block cipher is an incremental and time-consuming process since each iteration of the development cycle involves a costly leakage assessment. To achieve a high level of DPA resistance, the architecture-specific leakage properties of the target processor need to be taken into account. However, for most embedded processors, a detailed description of these leakage properties is lacking and often not even the HDL model of the micro-architecture is openly available. Recent research has shown that power simulators for leakage assessment can significantly speed up the development process. Unfortunately, few such simulators exist and even fewer take target-specific leakages into account. To fill this gap, we present MAPS, a micro-architectural power simulator for the M3 series of ARM Cortex processors, one of today’s most widely-used embedded platforms. MAPS is fast, easy to use, and able to model the Cortex-M3 pipeline leakages, in particular the leakage introduced by the pipeline registers. The M3 leakage properties are inferred from its HDL source code, and therefore MAPS does not need a complicated and expensive profiling phase. Taking first-order masked Assembler implementations of the lightweight cipher Simon as example, we study how the pipeline leakages manifest and discuss some guidelines on how to avoid them. [less ▲] Detailed reference viewed: 141 (2 UL)Securing Edge Devices in the Post-Quantum Internet of Things Using Lattice-Based Cryptography Liu, Zhe ; ; Groszschädl, Johann in IEEE Communications Magazine (2018), 56(2), 158-162 In order to increase the security of edge computing, all data transmitted to and from edge devices, as well as all data stored on edge devices, must be encrypted. Especially when the transmitted or stored ... [more ▼] In order to increase the security of edge computing, all data transmitted to and from edge devices, as well as all data stored on edge devices, must be encrypted. Especially when the transmitted or stored data contains sensitive personal information, long-term protection over periods of ten or more years may be required, which can only be achieved with post-quantum cryptography. This paper first gives a brief overview of post-quantum public-key cryptosystems based on hard mathematical problems related to hash functions, error-correcting codes, multivariate quadratic systems, and lattices. Then, the suitability of lattice-based cryptosystems for resource-constrained devices is discussed and efficient implementations for 8 and 32-bit microcontrollers are outlined. [less ▲] Detailed reference viewed: 150 (3 UL)Efficient Masking of ARX-Based Block Ciphers Using Carry-Save Addition on Boolean Shares Dinu, Dumitru-Daniel ; Groszschädl, Johann ; Le Corre, Yann in Nguyen, Phong Q.; Zhou, Jianying (Eds.) Information Security - 20th International Conference, ISC 2017, Ho Chi Minh City, Vietnam, November 22-24, 2017, Proceedings (2017, November) Masking is a widely-used technique to protect block ciphers and other symmetric cryptosystems against Differential Power Analysis (DPA) attacks. Applying masking to a cipher that involves both arithmetic ... [more ▼] Masking is a widely-used technique to protect block ciphers and other symmetric cryptosystems against Differential Power Analysis (DPA) attacks. Applying masking to a cipher that involves both arithmetic and Boolean operations requires a conversion between arithmetic and Boolean masks. An alternative approach is to perform the required arithmetic operations (e.g. modular addition or subtraction) directly on Boolean shares. At FSE 2015, Coron et al. proposed a logarithmic-time algorithm for modular addition on Boolean shares based on the Kogge-Stone carry-lookahead adder. We revisit their addition algorithm in this paper and present a fast implementation for ARM processors. Then, we introduce a new technique for direct modular addition/subtraction on Boolean shares using a simple Carry-Save Adder (CSA) in an iterative fashion. We show that the average complexity of CSA-based addition on Boolean shares grows logarithmically with the operand size, similar to the Kogge-Stone carry-lookahead addition, but consists of only a single AND, an XOR, and a left-shift per iteration. A 32-bit CSA addition~on Boolean shares has an average execution time of 162 clock cycles on an ARM Cortex-M3 processor, which is approximately 43% faster than the Kogge-Stone adder. The performance gain increases to over 55% when comparing the average subtraction times. We integrated both addition techniques into a masked implementation of the block cipher Speck and found that the CSA-based variant clearly outperforms its Kogge-Stone counterpart by a factor of 1.70 for encryption and 2.30 for decryption. [less ▲] Detailed reference viewed: 152 (1 UL)High-Performance Ideal Lattice-Based Cryptography on 8-Bit AVR Microcontrollers ; ; et al in ACM Transactions on Embedded Computing Systems (2017), 16(4), 117 Over recent years lattice-based cryptography has received much attention due to versatile average-case problems like Ring-LWE or Ring-SIS that appear to be intractable by quantum computers. In this work ... [more ▼] Over recent years lattice-based cryptography has received much attention due to versatile average-case problems like Ring-LWE or Ring-SIS that appear to be intractable by quantum computers. In this work, we evaluate and compare implementations of Ring-LWE encryption and the bimodal lattice signature scheme (BLISS) on an 8-bit Atmel ATxmega128 microcontroller. Our implementation of Ring-LWE encryption provides comprehensive protection against timing side-channels and takes 24.9ms for encryption and 6.7ms for decryption. To compute a BLISS signature, our software takes 317ms and 86ms for verification. These results underline the feasibility of lattice-based cryptography on constrained devices. [less ▲] Detailed reference viewed: 104 (3 UL)Efficient Implementation of Pedersen Commitments Using Twisted Edwards Curves Franck, Christian ; Groszschädl, Johann in Bouzefrane, Samia; Banerjee, Soumya; Sailhan, Françoise (Eds.) et al Mobile, Secure, and Programmable Networking - Third International Conference, MSPN 2017, Paris, France, June 29-30, 2017, Revised Selected Papers (2017, June) Cryptographic commitment schemes are used in many contexts, whereby the size of the secret data and the security requirements depend on the target application. Using a software library that has been ... [more ▼] Cryptographic commitment schemes are used in many contexts, whereby the size of the secret data and the security requirements depend on the target application. Using a software library that has been designed for other purposes (e.g., key-exchange or digital signatures) to compute commitments can be complicated or inefficient. We present in this paper a flexible implementation of Pedersen commitments based on elliptic curves in twisted Edwards form. The implementation supports a set of five curves of varying cryptographic strength, which are defined over 127, 159, 191, 223, and 255-bit pseudo-Mersenne prime fields. One can dynamically (i.e., at runtime) choose one of the curves according to the required level of security, and it is also possible to adapt to the size of the data to be committed by varying the number of base points. The point arithmetic is performed with optimized formulas using extended coordinates and dynamically pre-computed tables are utilized to speed up the scalar multiplication. Our implementation is written in ANSI C (with optional x86 assembler optimizations for the field arithmetic) and was compiled and tested successfully with Visual C on Windows, gcc on Linux, and clang on macOS. We present detailed benchmarking results for the field and point arithmetic on all five curves. When using an Intel Core i7 processor clocked at 2.7 GHz as test platform, we can compute more than 38,000 commitments per second on a twisted Edwards curve over a 127-bit field. [less ▲] Detailed reference viewed: 315 (20 UL)Elliptic Curve Cryptography with Efficiently Computable Endomorphisms and Its Hardware Implementations for the Internet of Things Liu, Zhe ; Groszschädl, Johann ; et al in IEEE Transactions on Computers (2017), 66(5), 773-785 Verification of an ECDSA signature requires a double scalar multiplication on an elliptic curve. In this work, we study the computation of this operation on a twisted Edwards curve with an efficiently ... [more ▼] Verification of an ECDSA signature requires a double scalar multiplication on an elliptic curve. In this work, we study the computation of this operation on a twisted Edwards curve with an efficiently computable endomorphism, which allows reducing the number of point doublings by approximately 50 percent compared to a conventional implementation. In particular, we focus on a curve defined over the 207-bit prime field Fp with p = 2^207 - 5131. We develop several optimizations to the operation and we describe two hardware architectures for computing the operation. The first architecture is a small processor implemented in 0.13 μm CMOS ASIC and is useful in resource-constrained devices for the Internet of Things (IoT) applications. The second architecture is designed for fast signature verifications by using FPGA acceleration and can be used in the server-side of these applications. Our designs offer various trade-offs and optimizations between performance and resource requirements and they are valuable for IoT applications. [less ▲] Detailed reference viewed: 102 (2 UL)Efficient Arithmetic on ARM-NEON and Its Application for High-Speed RSA Implementation ; Liu, Zhe ; Groszschädl, Johann et al in Security and Communication Networks (2016), 9(18), 5401-5411 A steadily increasing number of modern processors support Single Instruction Multiple Data (SIMD) instructions to speed up multimedia, communication, and security applications. The computational power of ... [more ▼] A steadily increasing number of modern processors support Single Instruction Multiple Data (SIMD) instructions to speed up multimedia, communication, and security applications. The computational power of Intel's SSE and AVX extensions as well as ARM's NEON engine has initiated a body of research on SIMD-parallel implementation of multiple-precision integer arithmetic operations, in particular modular multiplication and modular squaring, which are performance-critical components of widely-used public-key cryptosystems such as RSA, DSA, Diffie-Hellman, and their elliptic-curve variants ECDSA and ECDH. In this paper, we introduce the Double Operand Scanning (DOS) method for multiple-precision squaring and describe its implementation for ARM NEON processors. The DOS method uses a full-radix representation of the operand to be squared and aims to maximize performance by reducing the number of Read-After-Write (RAW) dependencies between source and destination registers. We also analyze the benefits of applying Karatsuba's technique to both multiple-precision multiplication and squaring, and present an optimized implementation of Montgomery's algorithm for modular reduction. Our performance evaluation shows that the DOS method along with the other optimizations described in this paper allows one to execute a full 2048-bit modular exponentiation in about 14.25 million clock cycles on an ARM Cortex-A15 processor, which is significantly faster than previously-reported RSA implementations for the ARM-NEON platform. [less ▲] Detailed reference viewed: 114 (2 UL) |
||