Browse ORBi

- What it is and what it isn't
- Green Road / Gold Road?
- Ready to Publish. Now What?
- How can I support the OA movement?
- Where can I learn more?

ORBi

Efficient Masking of ARX-Based Block Ciphers Using Carry-Save Addition on Boolean Shares Dinu, Dumitru-Daniel ; Groszschädl, Johann ; Le Corre, Yann in Nguyen, Phong Q.; Zhou, Jianying (Eds.) Information Security - 20th International Conference, ISC 2017, Ho Chi Minh City, Vietnam, November 22-24, 2017, Proceedings (2017, November) Masking is a widely-used technique to protect block ciphers and other symmetric cryptosystems against Differential Power Analysis (DPA) attacks. Applying masking to a cipher that involves both arithmetic ... [more ▼] Masking is a widely-used technique to protect block ciphers and other symmetric cryptosystems against Differential Power Analysis (DPA) attacks. Applying masking to a cipher that involves both arithmetic and Boolean operations requires a conversion between arithmetic and Boolean masks. An alternative approach is to perform the required arithmetic operations (e.g. modular addition or subtraction) directly on Boolean shares. At FSE 2015, Coron et al. proposed a logarithmic-time algorithm for modular addition on Boolean shares based on the Kogge-Stone carry-lookahead adder. We revisit their addition algorithm in this paper and present a fast implementation for ARM processors. Then, we introduce a new technique for direct modular addition/subtraction on Boolean shares using a simple Carry-Save Adder (CSA) in an iterative fashion. We show that the average complexity of CSA-based addition on Boolean shares grows logarithmically with the operand size, similar to the Kogge-Stone carry-lookahead addition, but consists of only a single AND, an XOR, and a left-shift per iteration. A 32-bit CSA addition~on Boolean shares has an average execution time of 162 clock cycles on an ARM Cortex-M3 processor, which is approximately 43% faster than the Kogge-Stone adder. The performance gain increases to over 55% when comparing the average subtraction times. We integrated both addition techniques into a masked implementation of the block cipher Speck and found that the CSA-based variant clearly outperforms its Kogge-Stone counterpart by a factor of 1.70 for encryption and 2.30 for decryption. [less ▲] Detailed reference viewed: 168 (1 UL)High-Performance Ideal Lattice-Based Cryptography on 8-Bit AVR Microcontrollers ; ; et al in ACM Transactions on Embedded Computing Systems (2017), 16(4), 117 Over recent years lattice-based cryptography has received much attention due to versatile average-case problems like Ring-LWE or Ring-SIS that appear to be intractable by quantum computers. In this work ... [more ▼] Over recent years lattice-based cryptography has received much attention due to versatile average-case problems like Ring-LWE or Ring-SIS that appear to be intractable by quantum computers. In this work, we evaluate and compare implementations of Ring-LWE encryption and the bimodal lattice signature scheme (BLISS) on an 8-bit Atmel ATxmega128 microcontroller. Our implementation of Ring-LWE encryption provides comprehensive protection against timing side-channels and takes 24.9ms for encryption and 6.7ms for decryption. To compute a BLISS signature, our software takes 317ms and 86ms for verification. These results underline the feasibility of lattice-based cryptography on constrained devices. [less ▲] Detailed reference viewed: 125 (3 UL)Efficient Implementation of Pedersen Commitments Using Twisted Edwards Curves Franck, Christian ; Groszschädl, Johann in Bouzefrane, Samia; Banerjee, Soumya; Sailhan, Françoise (Eds.) et al Mobile, Secure, and Programmable Networking - Third International Conference, MSPN 2017, Paris, France, June 29-30, 2017, Revised Selected Papers (2017, June) Cryptographic commitment schemes are used in many contexts, whereby the size of the secret data and the security requirements depend on the target application. Using a software library that has been ... [more ▼] Cryptographic commitment schemes are used in many contexts, whereby the size of the secret data and the security requirements depend on the target application. Using a software library that has been designed for other purposes (e.g., key-exchange or digital signatures) to compute commitments can be complicated or inefficient. We present in this paper a flexible implementation of Pedersen commitments based on elliptic curves in twisted Edwards form. The implementation supports a set of five curves of varying cryptographic strength, which are defined over 127, 159, 191, 223, and 255-bit pseudo-Mersenne prime fields. One can dynamically (i.e., at runtime) choose one of the curves according to the required level of security, and it is also possible to adapt to the size of the data to be committed by varying the number of base points. The point arithmetic is performed with optimized formulas using extended coordinates and dynamically pre-computed tables are utilized to speed up the scalar multiplication. Our implementation is written in ANSI C (with optional x86 assembler optimizations for the field arithmetic) and was compiled and tested successfully with Visual C on Windows, gcc on Linux, and clang on macOS. We present detailed benchmarking results for the field and point arithmetic on all five curves. When using an Intel Core i7 processor clocked at 2.7 GHz as test platform, we can compute more than 38,000 commitments per second on a twisted Edwards curve over a 127-bit field. [less ▲] Detailed reference viewed: 366 (21 UL)Elliptic Curve Cryptography with Efficiently Computable Endomorphisms and Its Hardware Implementations for the Internet of Things Liu, Zhe ; Groszschädl, Johann ; et al in IEEE Transactions on Computers (2017), 66(5), 773-785 Verification of an ECDSA signature requires a double scalar multiplication on an elliptic curve. In this work, we study the computation of this operation on a twisted Edwards curve with an efficiently ... [more ▼] Verification of an ECDSA signature requires a double scalar multiplication on an elliptic curve. In this work, we study the computation of this operation on a twisted Edwards curve with an efficiently computable endomorphism, which allows reducing the number of point doublings by approximately 50 percent compared to a conventional implementation. In particular, we focus on a curve defined over the 207-bit prime field Fp with p = 2^207 - 5131. We develop several optimizations to the operation and we describe two hardware architectures for computing the operation. The first architecture is a small processor implemented in 0.13 μm CMOS ASIC and is useful in resource-constrained devices for the Internet of Things (IoT) applications. The second architecture is designed for fast signature verifications by using FPGA acceleration and can be used in the server-side of these applications. Our designs offer various trade-offs and optimizations between performance and resource requirements and they are valuable for IoT applications. [less ▲] Detailed reference viewed: 115 (3 UL)Design Strategies for ARX with Provable Bounds: SPARX and LAX Dinu, Dumitru-Daniel ; Perrin, Léo Paul ; Udovenko, Aleksei et al in Cheon, Jung Hee; Takagi, Tsuyoshi (Eds.) Advances in Cryptology --- ASIACRYPT 2016, 22nd International Conference on the Theory and Application of Cryptology and Information Security, Hanoi, Vietnam, December 4-8, 2016, Proceedings, Part I (2016, December) We present, for the first time, a general strategy for designing ARX symmetric-key primitives with provable resistance against single-trail differential and linear cryptanalysis. The latter has been a ... [more ▼] We present, for the first time, a general strategy for designing ARX symmetric-key primitives with provable resistance against single-trail differential and linear cryptanalysis. The latter has been a long standing open problem in the area of ARX design. The Wide-Trail design Strategy (WTS), that is at the basis of many S-box based ciphers, including the AES, is not suitable for ARX designs due to the lack of S-boxes in the latter. In this paper we address the mentioned limitation by proposing the Long-Trail design Strategy (LTS) -- a dual of the WTS that is applicable (but not limited) to ARX constructions. In contrast to the WTS, that prescribes the use of small and efficient S-boxes at the expense of heavy linear layers with strong mixing properties, the LTS advocates the use of large (ARX-based) S-Boxes together with sparse linear layers. With the help of the so-called long-trail argument, a designer can bound the maximum differential and linear probabilities for any number of rounds of a cipher built according to the LTS. To illustrate the effectiveness of the new strategy, we propose Sparx -- a family of ARX-based block ciphers designed according to the LTS. Sparx has 32-bit ARX-based S-boxes and has provable bounds against differential and linear cryptanalysis. In addition, Sparx is very efficient on a number of embedded platforms. Its optimized software implementation ranks in the top-6 of the most software-efficient ciphers along with Simon, Speck, Chaskey, LEA and RECTANGLE. As a second contribution we propose another strategy for designing ARX ciphers with provable properties, that is completely independent of the LTS. It is motivated by a challenge proposed earlier by Wallen and uses the differential properties of modular addition to minimize the maximum differential probability across multiple rounds of a cipher. A new primitive, called LAX is designed following those principles. LAX partly solves the Wallen challenge. [less ▲] Detailed reference viewed: 248 (14 UL)Efficient Arithmetic on ARM-NEON and Its Application for High-Speed RSA Implementation ; Liu, Zhe ; Groszschädl, Johann et al in Security and Communication Networks (2016), 9(18), 5401-5411 A steadily increasing number of modern processors support Single Instruction Multiple Data (SIMD) instructions to speed up multimedia, communication, and security applications. The computational power of ... [more ▼] A steadily increasing number of modern processors support Single Instruction Multiple Data (SIMD) instructions to speed up multimedia, communication, and security applications. The computational power of Intel's SSE and AVX extensions as well as ARM's NEON engine has initiated a body of research on SIMD-parallel implementation of multiple-precision integer arithmetic operations, in particular modular multiplication and modular squaring, which are performance-critical components of widely-used public-key cryptosystems such as RSA, DSA, Diffie-Hellman, and their elliptic-curve variants ECDSA and ECDH. In this paper, we introduce the Double Operand Scanning (DOS) method for multiple-precision squaring and describe its implementation for ARM NEON processors. The DOS method uses a full-radix representation of the operand to be squared and aims to maximize performance by reducing the number of Read-After-Write (RAW) dependencies between source and destination registers. We also analyze the benefits of applying Karatsuba's technique to both multiple-precision multiplication and squaring, and present an optimized implementation of Montgomery's algorithm for modular reduction. Our performance evaluation shows that the DOS method along with the other optimizations described in this paper allows one to execute a full 2048-bit modular exponentiation in about 14.25 million clock cycles on an ARM Cortex-A15 processor, which is significantly faster than previously-reported RSA implementations for the ARM-NEON platform. [less ▲] Detailed reference viewed: 129 (2 UL)Implementation of a Leakage-Resilient ElGamal Key Encapsulation Mechanism Galindo, David ; Groszschädl, Johann ; Liu, Zhe et al in Journal of Cryptographic Engineering (2016), 6(3), 229-238 Leakage-resilient cryptography aims to extend the rigorous guarantees achieved through the provable security paradigm to physical implementations. The constructions designed on basis of this new approach ... [more ▼] Leakage-resilient cryptography aims to extend the rigorous guarantees achieved through the provable security paradigm to physical implementations. The constructions designed on basis of this new approach inevitably suffer from an Achilles heel: a bounded leakage assumption is needed. Currently, a huge gap exists between the theory of such designs and their implementation to confirm the leakage resilience in practice. The present work tries to narrow this gap for the leakage-resilient bilinear ElGamal key encapsulation mechanism (BEG-KEM) proposed by Kiltz and Pietrzak in 2010. Our first contribution is a variant of the bounded leakage and the only-computation-leaks model that is closer to practice. We weaken the restriction on the image size of the leakage functions in these models and only insist that the inputs to the leakage functions have sufficient min-entropy left, in spite of the leakage, with no limitation on the quantity of this leakage. We provide a novel security reduction for BEG-KEM in this relaxed leakage model using the generic bilinear group axiom. Secondly, we show that a naive implementation of the exponentiation in BEG-KEM makes it impossible to meet the leakage bound. Instead of trying to find an exponentiation algorithm that meets the leakage axiom (which is a non-trivial problem in practice), we propose an advanced scheme, BEG-KEM+, that avoids exponentiation by a secret value, but rather uses an encoding into the base group due to Fouque and Tibouchi. Thirdly, we present a software implementation of BEG-KEM+ based on the Miracl library and provide detailed experimental results. We also assess its (theoretical) resistance against power analysis attacks from a practical perspective, taking into account the state-of-the-art in side-channel cryptanalysis. [less ▲] Detailed reference viewed: 114 (2 UL)Efficient Implementation of NIST-Compliant Elliptic Curve Cryptography for 8-bit AVR-Based Sensor Nodes Liu, Zhe ; ; Groszschädl, Johann et al in IEEE Transactions on Information Forensics and Security (2016), 11(7), 1385-1397 In this paper, we introduce a highly optimized software implementation of standards-compliant elliptic curve cryptography (ECC) for wireless sensor nodes equipped with an 8-bit AVR microcontroller. We ... [more ▼] In this paper, we introduce a highly optimized software implementation of standards-compliant elliptic curve cryptography (ECC) for wireless sensor nodes equipped with an 8-bit AVR microcontroller. We exploit the state-of-the-art optimizations and propose novel techniques to further push the performance envelope of a scalar multiplication on the NIST P-192 curve. To illustrate the performance of our ECC software, we develope the prototype implementations of different cryptographic schemes for securing communication in a wireless sensor network, including elliptic curve Diffie-Hellman (ECDH) key exchange, the elliptic curve digital signature algorithm (ECDSA), and the elliptic curve Menezes-Qu-Vanstone (ECMQV) protocol. We obtain record-setting execution times for fixed-base, point variable-base, and double-base scalar multiplication. Compared with the related work, our ECDH key exchange achieves a performance gain of roughly 27% over the best previously published result using the NIST P-192 curve on the same platform, while our ECDSA performs twice as fast as the ECDSA implementation of the well-known TinyECC library. We also evaluate the impact of Karatsuba's multiplication technique on the overall execution time of a scalar multiplication. In addition to offering high performance, our implementation of scalar multiplication has a highly regular execution profile, which helps to protect against certain side-channel attacks. Our results show that NIST-compliant ECC can be implemented efficiently enough to be suitable for resource-constrained sensor nodes. [less ▲] Detailed reference viewed: 239 (12 UL)Energy-Efficient Elliptic Curve Cryptography for MSP430-Based Wireless Sensor Nodes Liu, Zhe ; Groszschädl, Johann ; et al in Liu, Joseph K.; Steinfeld, Ron (Eds.) Information Security and Privacy - 21st Australasian Conference, ACISP 2016, Melbourne, VIC, Australia, July 4-6, 2016, Proceedings, Part I (2016, July) The Internet is rapidly evolving from a network of personal computers and servers to a network of smart objects ("things") able to communicate with each other and with central resources. This evolution ... [more ▼] The Internet is rapidly evolving from a network of personal computers and servers to a network of smart objects ("things") able to communicate with each other and with central resources. This evolution has created a demand for lightweight implementations of cryptographic algorithms suitable for resource-constrained devices such as RFID tags and wireless sensor nodes. In this paper we describe a highly optimized software implementation of Elliptic Curve Cryptography (ECC) for the MSP430 series of ultra-low-power 16-bit microcontrollers. Our software is scalable in the sense that it supports prime fields and elliptic curves of different order without recompilation, which allows for flexible trade-offs between execution time (i.e. energy consumption) and security. The low-level modular arithmetic is optimized for pseudo-Mersenne primes of the form p = 2^n - c where n is a multiple of 16 minus 1 and c fits in a 16-bit register. All prime-field arithmetic functions are parameterized with respect to the length of operands (i.e. the number of 16-bit words they consist of) and written in Assembly language, whereby we avoided conditional jumps and branches that could leak information about the secret key. Our ECC implementation can perform scalar multiplication on two types of elliptic curves, namely Montgomery curves and twisted Edwards curves. A full scalar multiplication using a Montgomery curve over a 159-bit field requires about 3.86*10^6 clock cycles when executed on an MSP430F1611 microcontroller. [less ▲] Detailed reference viewed: 317 (21 UL)Correlation Power Analysis of Lightweight Block Ciphers: From Theory to Practice Biryukov, Alex ; Dinu, Dumitru-Daniel ; Groszschädl, Johann in Manulis, Mark; Sadeghi, Ahmad-Reza; Schneider, Steve (Eds.) Applied Cryptography and Network Security - 14th International Conference, ACNS 2016, Guildford, UK, June 19-22, 2016. Proceedings (2016, June) Side-Channel Analysis (SCA) represents a serious threat to the security of millions of smart devices that form part of the so-called Internet of Things (IoT). Choosing the "right" cryptographic primitive ... [more ▼] Side-Channel Analysis (SCA) represents a serious threat to the security of millions of smart devices that form part of the so-called Internet of Things (IoT). Choosing the "right" cryptographic primitive for the IoT is a highly challenging task due to the resource constraints of IoT devices and the variety of primitives. An important criterion to assess the suitability of a lightweight cipher with respect to SCA is the amount of leakage available to an adversary. In this paper, we analyze the efficiency of different selection functions that are commonly used in Correlation Power Analysis (CPA) attacks on symmetric primitives. To this end, we attacked implementations of the lightweight block ciphers AES, Fantomas, LBlock, Piccolo, PRINCE, RC5, Simon, and Speck on an 8-bit AVR processor. By exploring the relation between the nonlinearity of the studied selection functions and the measured leakages, we discovered some imperfections when using nonlinearity to quantify the resilience against CPA. Then, we applied these findings in an evaluation of the "intrinsic" CPA-resistance of unprotected implementations of the eight mentioned ciphers. We show that certain implementation aspects can influence the leakage level and try to explain why. Our results shed new light on the resilience of basic operations executed by these ciphers against CPA and help to bridge the gap between theory and practice. [less ▲] Detailed reference viewed: 540 (70 UL)Efficient Ring-LWE Encryption on 8-bit AVR Processors Liu, Zhe ; ; et al in Güneysu, Tim; Handschuh, Helena (Eds.) Cryptographic Hardware and Embedded Systems - CHES 2015, 17th International Workshop, Saint-Malo, France, September 13-16, 2015, Proceedings (2015, September) Public-key cryptography based on the "ring-variant" of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a ... [more ▼] Public-key cryptography based on the "ring-variant" of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a carefully-optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128. Our research contributions include several optimizations for the Number Theoretic Transform (NTT) used for polynomial multiplication. More concretely, we describe the Move-and-Add (MA) and the Shift-Add-Multiply-Subtract-Subtract (SAMS2) technique to speed up the performance-critical multiplication and modular reduction of coefficients, respectively. We take advantage of incompletely-reduced intermediate results to minimize the total number of reduction operations and use a special coefficient-storage method to decrease the RAM footprint of NTT multiplications. In addition, we propose a byte-wise scanning strategy to improve the performance of a discrete Gaussian sampler based on the Knuth-Yao random walk algorithm. For medium-term security, our ring-LWE implementation needs 590k, 672k, and 276k clock cycles for key-generation, encryption, and decryption, respectively. On the other hand, for long-term security, the execution time of key-generation, encryption, and decryption amount to 2.2M, 2.6M, and 686k cycles, respectively. These results set new speed records for ring-LWE encryption on an 8-bit processor and outperform related RSA and ECC implementations by an order of magnitude. [less ▲] Detailed reference viewed: 219 (9 UL)Triathlon of Lightweight Block Ciphers for the Internet of Things Dinu, Dumitru-Daniel ; Le Corre, Yann ; Khovratovich, Dmitry et al Scientific Conference (2015, July) In this paper we introduce an open framework for the benchmarking of lightweight block ciphers on a multitude of embedded platforms. Our framework is able to evaluate execution time, RAM footprint, as ... [more ▼] In this paper we introduce an open framework for the benchmarking of lightweight block ciphers on a multitude of embedded platforms. Our framework is able to evaluate execution time, RAM footprint, as well as (binary) code size, and allows a user to define a custom "figure of merit" according to which all evaluated candidates can be ranked. We used the framework to benchmark various implementations of 13 lightweight ciphers, namely AES, Fantomas, HIGHT, LBlock, LED, Piccolo, PRESENT, PRINCE, RC5, Robin, Simon, Speck, and TWINE, on three different platforms: 8-bit ATmega, 16-bit MSP430, and 32-bit ARM. Our results give new insights to the question of how well these ciphers are suited to secure the Internet of Things (IoT). The benchmarking framework provides cipher designers with a tool to compare new algorithms with the state-of-the-art and allows standardization bodies to conduct a fair and comprehensive evaluation of a large number of candidates. [less ▲] Detailed reference viewed: 467 (30 UL)FELICS - Fair Evaluation of Lightweight Cryptographic Systems Dinu, Dumitru-Daniel ; Biryukov, Alex ; Groszschädl, Johann et al Scientific Conference (2015, July) In this paper we introduce FELICS, a free and open-source benchmarking framework designed for fair and consistent evaluation of software implementations of lightweight cryptographic primitives for ... [more ▼] In this paper we introduce FELICS, a free and open-source benchmarking framework designed for fair and consistent evaluation of software implementations of lightweight cryptographic primitives for embedded devices. The framework is very flexible thanks to its modular structure, which allows for an easy integration of new metrics, target devices and evaluation scenarios. It consists of two modules that can currently asses the performance of lightweight block and stream ciphers on three widely used microcontrollers: 8-bit AVR, 16-bit MSP and 32-bit ARM. The metrics extracted are execution time, RAM consumption and binary code size. FELICS has a simple user interface and is intended to be used by cipher designers to compare new primitives with the state of the art. The extracted metrics are very detailed and assist embedded software engineers in selecting the best cipher to match the requirements of a particular application. The tool aims to increase the transparency and trust in benchmarking results of lightweight primitives and facilitates a fair comparison between different primitives using the same evaluation conditions. [less ▲] Detailed reference viewed: 655 (14 UL)Higher-Order Masking in Practice: A Vector Implementation of Masked AES for ARM NEON Wang, Junwei ; Vadnala, Praveen Kumar ; Groszschädl, Johann et al in Nyberg, Kaisa (Ed.) Topics in Cryptology - CT-RSA 2015, The Cryptographer's Track at the RSA Conference 2015, San Francisco, CA, USA, April 20-24, 2015. Proceedings (2015, April) Real-world software implementations of cryptographic algorithms need to be able to resist various kinds of side-channel attacks, in particular Differential Power Analysis (DPA). Masking is a widely-used ... [more ▼] Real-world software implementations of cryptographic algorithms need to be able to resist various kinds of side-channel attacks, in particular Differential Power Analysis (DPA). Masking is a widely-used countermeasure to protect block ciphers like the Advanced Encryption Standard (AES) against DPA attacks. The basic principle is to split all sensitive intermediate variables manipulated by the algorithm into two shares and process these shares separately. However, this approach still succumbs to higher-order DPA attacks, which exploit the joint leakage of a number of intermediate variables. A viable solution is to generalize masking such that at least d+1 shares are used to protect against d-th order attacks. Unfortunately, all current higher-order masking schemes introduce a significant computational overhead compared to unmasked implementations. To facilitate the deployment of higher-order masking for the AES in practice, we developed a vector implementation of Coron et al's masking scheme (FSE 2012) for ARM NEON processors. After a comprehensive complexity analysis, we found that Coron et al's scheme with n shares for each sensitive variable needs O(n^2) multiplications in the field GF(2^8) and O(n^2) random-number generations. Both of these performance-critical operations are executed with only 15 instructions in our software, which is possible thanks to the rich functionality of the NEON instruction set. Our experimental results demonstrate that the performance penalty caused by the integration of higher-order masking is significantly lower than in generally assumed and reported in previous papers. For example, our second-order DPA-protected AES (with three shares for each sensitive variable) is merely eight times slower than an unmasked baseline implementation that resists cache-timing attacks. [less ▲] Detailed reference viewed: 241 (16 UL)Efficient Implementation of ECDH Key Exchange for MSP430-Based Wireless Sensor Networks Liu, Zhe ; ; et al in Bao, Feng; Miller, Steven; Zhou, Jianying (Eds.) et al ASIACCS'15: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, April 14-17, 2015, Singapore (2015, April) Public-Key Cryptography (PKC) is an indispensable building block of modern security protocols, and, thus, essential for secure communication over insecure networks. Despite a significant body of research ... [more ▼] Public-Key Cryptography (PKC) is an indispensable building block of modern security protocols, and, thus, essential for secure communication over insecure networks. Despite a significant body of research devoted to making PKC more "lightweight," it is still commonly perceived that software implementations of PKC are computationally too expensive for practical use in ultra-low power devices such as wireless sensor nodes. In the present paper we aim to challenge this perception and present a highly-optimized implementation of Elliptic Curve Cryptography (ECC) for the TI MSP430 series of 16-bit microcontrollers. Our software is inspired by MoTE-ECC and supports scalar multiplication on two families of elliptic curves, namely Montgomery and twisted Edwards curves. However, in contrast to MoTE-ECC, we use pseudo-Mersenne prime fields as underlying algebraic structure to facilitate inter-operability with existing ECC implementations. We introduce a novel "zig-zag" technique for multiple-precision squaring on the MSP430 and assess its execution time. Similar to MoTE-ECC, we employ the Montgomery model for variable-base scalar multiplications and the twisted Edwards model if the base point is fixed (e.g. to generate an ephemeral key pair). Our experiments show that the two scalar multiplications needed to perform an ephemeral ECDH key exchange can be accomplished in 4.88 million clock cycles altogether (using a 159-bit prime field), which sets a new speed record for ephemeral ECDH on a 16-bit processor. We also describe the curve generation process and analyze the execution time of various field and point arithmetic operations on curves over a 159-bit and a 191-bit pseudo-Mersenne prime field. [less ▲] Detailed reference viewed: 255 (19 UL)Faster Mask Conversion with Lookup Tables Vadnala, Praveen Kumar ; Groszschädl, Johann in Mangard, Stefan; Poschmann, Axel Y. (Eds.) Constructive Side-Channel Analysis and Secure Design, 6th International Workshop, COSADE 2015, Berlin, Germany, April 13-14, 2015. Revised Selected Papers (2015, April) Masking is an effective and widely-used countermeasure to thwart Differential Power Analysis (DPA) attacks on symmetric cryptosystems. When a symmetric cipher involves a combination of Boolean and ... [more ▼] Masking is an effective and widely-used countermeasure to thwart Differential Power Analysis (DPA) attacks on symmetric cryptosystems. When a symmetric cipher involves a combination of Boolean and arithmetic operations, it is necessary to convert the masks from one form to the other. There exist algorithms for mask conversion that are secure against first-order attacks, but they can not be generalized to higher orders. At CHES 2014, Coron, Großschädl and Vadnala (CGV) introduced a secure conversion scheme between Boolean and arithmetic masking of any order, but their approach requires d=2t+1 shares to protect against attacks of order t. In the present paper, we improve the algorithms for second-order conversion with the help of lookup tables so that only three shares instead of five are needed, which is the minimal number for second-order resistance. Furthermore, we also improve the first-order secure addition method proposed by Karroumi, Richard and Joye, again with lookup tables. We prove the security of all presented algorithms using well established assumptions and models. Finally, we provide experimental evidence of our improved mask conversion applied to HMAC-SHA-1. Simulation results show that our algorithms improve the execution time by 85% at the expense of little memory overhead. [less ▲] Detailed reference viewed: 270 (5 UL)Conversion from Arithmetic to Boolean Masking with Logarithmic Complexity Coron, Jean-Sébastien ; Groszschädl, Johann ; et al in Leander, Gregor (Ed.) Fast Software Encryption, 22nd International Workshop, FSE 2015, Istanbul, Turkey, March 8-11, 2015, Revised Selected Papers (2015, March) A general technique to protect a cryptographic algorithm against side-channel attacks consists in masking all intermediate variables with a random value. For cryptographic algorithms combining Boolean ... [more ▼] A general technique to protect a cryptographic algorithm against side-channel attacks consists in masking all intermediate variables with a random value. For cryptographic algorithms combining Boolean operations with arithmetic operations, one must then perform conversions between Boolean masking and arithmetic masking. At CHES 2001, Goubin described a very elegant algorithm for converting from Boolean masking to arithmetic masking, with only a constant number of operations. Goubin also described an algorithm for converting from arithmetic to Boolean masking, but with O(k) operations where k is the addition bit size. In this paper we describe an improved algorithm with time complexity O(log k) only. Our new algorithm is based on the Kogge-Stone carry look-ahead adder, which computes the carry signal in O(log k) instead of O(k) for the classical ripple carry adder. We also describe an algorithm for performing arithmetic addition modulo 2^k directly on Boolean shares, with the same complexity O(log k) instead of O(k). We prove the security of our new algorithm against first-order attacks. Our algorithm performs well in practice, as for k=64 we obtain a 23% improvement compared to Goubin’s algorithm. [less ▲] Detailed reference viewed: 244 (8 UL)Reverse Product-Scanning Multiplication and Squaring on 8-bit AVR Processors Liu, Zhe ; ; Groszschädl, Johann et al in Hui, Lucas C. K.; Qing, Sihan; Shi, Elaine (Eds.) et al Information and Communications Security - 16th International Conference, ICICS 2014, Hong Kong, China, December 16-17, 2014. Proceedings (2014, December) High performance, small code size, and good scalability are important requirements for software implementations of multi-precision arithmetic algorithms to fit resource-limited embedded systems. In this ... [more ▼] High performance, small code size, and good scalability are important requirements for software implementations of multi-precision arithmetic algorithms to fit resource-limited embedded systems. In this paper, we describe optimization techniques to speed up multi-precision multiplication and squaring on the AVR ATmega series of 8-bit microcontrollers. First, we present a new approach to perform multi-precision multiplication, called Reverse Product Scanning (RPS), that resembles the hybrid technique of Gura et al., but calculates the byte-products in the inner loop in reverse order. The RPS method processes four bytes of the two operands in each iteration of the inner loop and employs two carry-catcher registers to minimize the number of add instructions. We also describe an optimized algorithm for multi-precision squaring based on the RPS technique that is, depending on the operand length, up to 44.3% faster than multiplication. Our AVR Assembly implementations of RPS multiplication and RPS squaring occupy less than 1 kB of code space each and are written in a parameterized fashion so that they can support operands of varying length without recompilation. Despite this high level of flexibility, our RPS multiplication outperforms the looped variant of Hutter et al.'s operand-caching technique and saves between 40 and 51% of code size. We also combine our RPS multiplication and squaring routines with Karatsuba's method to further reduce execution time. When executed on an ATmega128 processor, the "karatsubarized RPS method" needs only 85k clock cycles for a 1024-bit multiplication (or 48k cycles for a squaring). These results show that it is possible to achieve high performance without sacrificing code size or scalability. [less ▲] Detailed reference viewed: 428 (40 UL)Montgomery Modular Multiplication on ARM-NEON Revisited ; Liu, Zhe ; Groszschädl, Johann et al in Kim, Jongsung; Lee, Jooyoung (Eds.) Information Security and Cryptology - ICISC 2014, 17th International Conference, Seoul, Korea, December 3-5, 2014, Revised Selected Papers (2014, December) Detailed reference viewed: 214 (8 UL)Secure Conversion between Boolean and Arithmetic Masking of Any Order Coron, Jean-Sébastien ; Groszschädl, Johann ; Vadnala, Praveen Kumar in Batina, Lejla; Robshaw, Matthew (Eds.) Cryptographic Hardware and Embedded Systems - CHES 2014, 16th International Workshop, Busan, South Korea, September 23-26, 2014. Proceedings (2014, September) Detailed reference viewed: 199 (12 UL) |
||