Browse ORBi

- What it is and what it isn't
- Green Road / Gold Road?
- Ready to Publish. Now What?
- How can I support the OA movement?
- Where can I learn more?

ORBi

Efficient Arithmetic on ARM-NEON and Its Application for High-Speed RSA Implementation ; Liu, Zhe ; Groszschädl, Johann et al in Security and Communication Networks (2016), 9(18), 5401-5411 A steadily increasing number of modern processors support Single Instruction Multiple Data (SIMD) instructions to speed up multimedia, communication, and security applications. The computational power of ... [more ▼] A steadily increasing number of modern processors support Single Instruction Multiple Data (SIMD) instructions to speed up multimedia, communication, and security applications. The computational power of Intel's SSE and AVX extensions as well as ARM's NEON engine has initiated a body of research on SIMD-parallel implementation of multiple-precision integer arithmetic operations, in particular modular multiplication and modular squaring, which are performance-critical components of widely-used public-key cryptosystems such as RSA, DSA, Diffie-Hellman, and their elliptic-curve variants ECDSA and ECDH. In this paper, we introduce the Double Operand Scanning (DOS) method for multiple-precision squaring and describe its implementation for ARM NEON processors. The DOS method uses a full-radix representation of the operand to be squared and aims to maximize performance by reducing the number of Read-After-Write (RAW) dependencies between source and destination registers. We also analyze the benefits of applying Karatsuba's technique to both multiple-precision multiplication and squaring, and present an optimized implementation of Montgomery's algorithm for modular reduction. Our performance evaluation shows that the DOS method along with the other optimizations described in this paper allows one to execute a full 2048-bit modular exponentiation in about 14.25 million clock cycles on an ARM Cortex-A15 processor, which is significantly faster than previously-reported RSA implementations for the ARM-NEON platform. [less ▲] Detailed reference viewed: 108 (2 UL)Implementation of a Leakage-Resilient ElGamal Key Encapsulation Mechanism Galindo, David ; Groszschädl, Johann ; Liu, Zhe et al in Journal of Cryptographic Engineering (2016), 6(3), 229-238 Leakage-resilient cryptography aims to extend the rigorous guarantees achieved through the provable security paradigm to physical implementations. The constructions designed on basis of this new approach ... [more ▼] Leakage-resilient cryptography aims to extend the rigorous guarantees achieved through the provable security paradigm to physical implementations. The constructions designed on basis of this new approach inevitably suffer from an Achilles heel: a bounded leakage assumption is needed. Currently, a huge gap exists between the theory of such designs and their implementation to confirm the leakage resilience in practice. The present work tries to narrow this gap for the leakage-resilient bilinear ElGamal key encapsulation mechanism (BEG-KEM) proposed by Kiltz and Pietrzak in 2010. Our first contribution is a variant of the bounded leakage and the only-computation-leaks model that is closer to practice. We weaken the restriction on the image size of the leakage functions in these models and only insist that the inputs to the leakage functions have sufficient min-entropy left, in spite of the leakage, with no limitation on the quantity of this leakage. We provide a novel security reduction for BEG-KEM in this relaxed leakage model using the generic bilinear group axiom. Secondly, we show that a naive implementation of the exponentiation in BEG-KEM makes it impossible to meet the leakage bound. Instead of trying to find an exponentiation algorithm that meets the leakage axiom (which is a non-trivial problem in practice), we propose an advanced scheme, BEG-KEM+, that avoids exponentiation by a secret value, but rather uses an encoding into the base group due to Fouque and Tibouchi. Thirdly, we present a software implementation of BEG-KEM+ based on the Miracl library and provide detailed experimental results. We also assess its (theoretical) resistance against power analysis attacks from a practical perspective, taking into account the state-of-the-art in side-channel cryptanalysis. [less ▲] Detailed reference viewed: 98 (2 UL)Efficient Implementation of NIST-Compliant Elliptic Curve Cryptography for 8-bit AVR-Based Sensor Nodes Liu, Zhe ; ; Groszschädl, Johann et al in IEEE Transactions on Information Forensics and Security (2016), 11(7), 1385-1397 In this paper, we introduce a highly optimized software implementation of standards-compliant elliptic curve cryptography (ECC) for wireless sensor nodes equipped with an 8-bit AVR microcontroller. We ... [more ▼] In this paper, we introduce a highly optimized software implementation of standards-compliant elliptic curve cryptography (ECC) for wireless sensor nodes equipped with an 8-bit AVR microcontroller. We exploit the state-of-the-art optimizations and propose novel techniques to further push the performance envelope of a scalar multiplication on the NIST P-192 curve. To illustrate the performance of our ECC software, we develope the prototype implementations of different cryptographic schemes for securing communication in a wireless sensor network, including elliptic curve Diffie-Hellman (ECDH) key exchange, the elliptic curve digital signature algorithm (ECDSA), and the elliptic curve Menezes-Qu-Vanstone (ECMQV) protocol. We obtain record-setting execution times for fixed-base, point variable-base, and double-base scalar multiplication. Compared with the related work, our ECDH key exchange achieves a performance gain of roughly 27% over the best previously published result using the NIST P-192 curve on the same platform, while our ECDSA performs twice as fast as the ECDSA implementation of the well-known TinyECC library. We also evaluate the impact of Karatsuba's multiplication technique on the overall execution time of a scalar multiplication. In addition to offering high performance, our implementation of scalar multiplication has a highly regular execution profile, which helps to protect against certain side-channel attacks. Our results show that NIST-compliant ECC can be implemented efficiently enough to be suitable for resource-constrained sensor nodes. [less ▲] Detailed reference viewed: 216 (11 UL)Energy-Efficient Elliptic Curve Cryptography for MSP430-Based Wireless Sensor Nodes Liu, Zhe ; Groszschädl, Johann ; et al in Liu, Joseph K.; Steinfeld, Ron (Eds.) Information Security and Privacy - 21st Australasian Conference, ACISP 2016, Melbourne, VIC, Australia, July 4-6, 2016, Proceedings, Part I (2016, July) The Internet is rapidly evolving from a network of personal computers and servers to a network of smart objects ("things") able to communicate with each other and with central resources. This evolution ... [more ▼] The Internet is rapidly evolving from a network of personal computers and servers to a network of smart objects ("things") able to communicate with each other and with central resources. This evolution has created a demand for lightweight implementations of cryptographic algorithms suitable for resource-constrained devices such as RFID tags and wireless sensor nodes. In this paper we describe a highly optimized software implementation of Elliptic Curve Cryptography (ECC) for the MSP430 series of ultra-low-power 16-bit microcontrollers. Our software is scalable in the sense that it supports prime fields and elliptic curves of different order without recompilation, which allows for flexible trade-offs between execution time (i.e. energy consumption) and security. The low-level modular arithmetic is optimized for pseudo-Mersenne primes of the form p = 2^n - c where n is a multiple of 16 minus 1 and c fits in a 16-bit register. All prime-field arithmetic functions are parameterized with respect to the length of operands (i.e. the number of 16-bit words they consist of) and written in Assembly language, whereby we avoided conditional jumps and branches that could leak information about the secret key. Our ECC implementation can perform scalar multiplication on two types of elliptic curves, namely Montgomery curves and twisted Edwards curves. A full scalar multiplication using a Montgomery curve over a 159-bit field requires about 3.86*10^6 clock cycles when executed on an MSP430F1611 microcontroller. [less ▲] Detailed reference viewed: 292 (21 UL)Correlation Power Analysis of Lightweight Block Ciphers: From Theory to Practice Biryukov, Alex ; Dinu, Dumitru-Daniel ; Groszschädl, Johann in Manulis, Mark; Sadeghi, Ahmad-Reza; Schneider, Steve (Eds.) Applied Cryptography and Network Security - 14th International Conference, ACNS 2016, Guildford, UK, June 19-22, 2016. Proceedings (2016, June) Side-Channel Analysis (SCA) represents a serious threat to the security of millions of smart devices that form part of the so-called Internet of Things (IoT). Choosing the "right" cryptographic primitive ... [more ▼] Side-Channel Analysis (SCA) represents a serious threat to the security of millions of smart devices that form part of the so-called Internet of Things (IoT). Choosing the "right" cryptographic primitive for the IoT is a highly challenging task due to the resource constraints of IoT devices and the variety of primitives. An important criterion to assess the suitability of a lightweight cipher with respect to SCA is the amount of leakage available to an adversary. In this paper, we analyze the efficiency of different selection functions that are commonly used in Correlation Power Analysis (CPA) attacks on symmetric primitives. To this end, we attacked implementations of the lightweight block ciphers AES, Fantomas, LBlock, Piccolo, PRINCE, RC5, Simon, and Speck on an 8-bit AVR processor. By exploring the relation between the nonlinearity of the studied selection functions and the measured leakages, we discovered some imperfections when using nonlinearity to quantify the resilience against CPA. Then, we applied these findings in an evaluation of the "intrinsic" CPA-resistance of unprotected implementations of the eight mentioned ciphers. We show that certain implementation aspects can influence the leakage level and try to explain why. Our results shed new light on the resilience of basic operations executed by these ciphers against CPA and help to bridge the gap between theory and practice. [less ▲] Detailed reference viewed: 505 (70 UL)Efficient Ring-LWE Encryption on 8-bit AVR Processors Liu, Zhe ; ; et al in Güneysu, Tim; Handschuh, Helena (Eds.) Cryptographic Hardware and Embedded Systems - CHES 2015, 17th International Workshop, Saint-Malo, France, September 13-16, 2015, Proceedings (2015, September) Public-key cryptography based on the "ring-variant" of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a ... [more ▼] Public-key cryptography based on the "ring-variant" of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a carefully-optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128. Our research contributions include several optimizations for the Number Theoretic Transform (NTT) used for polynomial multiplication. More concretely, we describe the Move-and-Add (MA) and the Shift-Add-Multiply-Subtract-Subtract (SAMS2) technique to speed up the performance-critical multiplication and modular reduction of coefficients, respectively. We take advantage of incompletely-reduced intermediate results to minimize the total number of reduction operations and use a special coefficient-storage method to decrease the RAM footprint of NTT multiplications. In addition, we propose a byte-wise scanning strategy to improve the performance of a discrete Gaussian sampler based on the Knuth-Yao random walk algorithm. For medium-term security, our ring-LWE implementation needs 590k, 672k, and 276k clock cycles for key-generation, encryption, and decryption, respectively. On the other hand, for long-term security, the execution time of key-generation, encryption, and decryption amount to 2.2M, 2.6M, and 686k cycles, respectively. These results set new speed records for ring-LWE encryption on an 8-bit processor and outperform related RSA and ECC implementations by an order of magnitude. [less ▲] Detailed reference viewed: 198 (9 UL)Triathlon of Lightweight Block Ciphers for the Internet of Things Dinu, Dumitru-Daniel ; Le Corre, Yann ; Khovratovich, Dmitry et al Scientific Conference (2015, July) In this paper we introduce an open framework for the benchmarking of lightweight block ciphers on a multitude of embedded platforms. Our framework is able to evaluate execution time, RAM footprint, as ... [more ▼] In this paper we introduce an open framework for the benchmarking of lightweight block ciphers on a multitude of embedded platforms. Our framework is able to evaluate execution time, RAM footprint, as well as (binary) code size, and allows a user to define a custom "figure of merit" according to which all evaluated candidates can be ranked. We used the framework to benchmark various implementations of 13 lightweight ciphers, namely AES, Fantomas, HIGHT, LBlock, LED, Piccolo, PRESENT, PRINCE, RC5, Robin, Simon, Speck, and TWINE, on three different platforms: 8-bit ATmega, 16-bit MSP430, and 32-bit ARM. Our results give new insights to the question of how well these ciphers are suited to secure the Internet of Things (IoT). The benchmarking framework provides cipher designers with a tool to compare new algorithms with the state-of-the-art and allows standardization bodies to conduct a fair and comprehensive evaluation of a large number of candidates. [less ▲] Detailed reference viewed: 444 (30 UL)FELICS - Fair Evaluation of Lightweight Cryptographic Systems Dinu, Dumitru-Daniel ; Biryukov, Alex ; Groszschädl, Johann et al Scientific Conference (2015, July) In this paper we introduce FELICS, a free and open-source benchmarking framework designed for fair and consistent evaluation of software implementations of lightweight cryptographic primitives for ... [more ▼] In this paper we introduce FELICS, a free and open-source benchmarking framework designed for fair and consistent evaluation of software implementations of lightweight cryptographic primitives for embedded devices. The framework is very flexible thanks to its modular structure, which allows for an easy integration of new metrics, target devices and evaluation scenarios. It consists of two modules that can currently asses the performance of lightweight block and stream ciphers on three widely used microcontrollers: 8-bit AVR, 16-bit MSP and 32-bit ARM. The metrics extracted are execution time, RAM consumption and binary code size. FELICS has a simple user interface and is intended to be used by cipher designers to compare new primitives with the state of the art. The extracted metrics are very detailed and assist embedded software engineers in selecting the best cipher to match the requirements of a particular application. The tool aims to increase the transparency and trust in benchmarking results of lightweight primitives and facilitates a fair comparison between different primitives using the same evaluation conditions. [less ▲] Detailed reference viewed: 533 (14 UL)Higher-Order Masking in Practice: A Vector Implementation of Masked AES for ARM NEON Wang, Junwei ; Vadnala, Praveen Kumar ; Groszschädl, Johann et al in Nyberg, Kaisa (Ed.) Topics in Cryptology - CT-RSA 2015, The Cryptographer's Track at the RSA Conference 2015, San Francisco, CA, USA, April 20-24, 2015. Proceedings (2015, April) Real-world software implementations of cryptographic algorithms need to be able to resist various kinds of side-channel attacks, in particular Differential Power Analysis (DPA). Masking is a widely-used ... [more ▼] Real-world software implementations of cryptographic algorithms need to be able to resist various kinds of side-channel attacks, in particular Differential Power Analysis (DPA). Masking is a widely-used countermeasure to protect block ciphers like the Advanced Encryption Standard (AES) against DPA attacks. The basic principle is to split all sensitive intermediate variables manipulated by the algorithm into two shares and process these shares separately. However, this approach still succumbs to higher-order DPA attacks, which exploit the joint leakage of a number of intermediate variables. A viable solution is to generalize masking such that at least d+1 shares are used to protect against d-th order attacks. Unfortunately, all current higher-order masking schemes introduce a significant computational overhead compared to unmasked implementations. To facilitate the deployment of higher-order masking for the AES in practice, we developed a vector implementation of Coron et al's masking scheme (FSE 2012) for ARM NEON processors. After a comprehensive complexity analysis, we found that Coron et al's scheme with n shares for each sensitive variable needs O(n^2) multiplications in the field GF(2^8) and O(n^2) random-number generations. Both of these performance-critical operations are executed with only 15 instructions in our software, which is possible thanks to the rich functionality of the NEON instruction set. Our experimental results demonstrate that the performance penalty caused by the integration of higher-order masking is significantly lower than in generally assumed and reported in previous papers. For example, our second-order DPA-protected AES (with three shares for each sensitive variable) is merely eight times slower than an unmasked baseline implementation that resists cache-timing attacks. [less ▲] Detailed reference viewed: 216 (16 UL)Efficient Implementation of ECDH Key Exchange for MSP430-Based Wireless Sensor Networks Liu, Zhe ; ; et al in Bao, Feng; Miller, Steven; Zhou, Jianying (Eds.) et al ASIACCS'15: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, April 14-17, 2015, Singapore (2015, April) Public-Key Cryptography (PKC) is an indispensable building block of modern security protocols, and, thus, essential for secure communication over insecure networks. Despite a significant body of research ... [more ▼] Public-Key Cryptography (PKC) is an indispensable building block of modern security protocols, and, thus, essential for secure communication over insecure networks. Despite a significant body of research devoted to making PKC more "lightweight," it is still commonly perceived that software implementations of PKC are computationally too expensive for practical use in ultra-low power devices such as wireless sensor nodes. In the present paper we aim to challenge this perception and present a highly-optimized implementation of Elliptic Curve Cryptography (ECC) for the TI MSP430 series of 16-bit microcontrollers. Our software is inspired by MoTE-ECC and supports scalar multiplication on two families of elliptic curves, namely Montgomery and twisted Edwards curves. However, in contrast to MoTE-ECC, we use pseudo-Mersenne prime fields as underlying algebraic structure to facilitate inter-operability with existing ECC implementations. We introduce a novel "zig-zag" technique for multiple-precision squaring on the MSP430 and assess its execution time. Similar to MoTE-ECC, we employ the Montgomery model for variable-base scalar multiplications and the twisted Edwards model if the base point is fixed (e.g. to generate an ephemeral key pair). Our experiments show that the two scalar multiplications needed to perform an ephemeral ECDH key exchange can be accomplished in 4.88 million clock cycles altogether (using a 159-bit prime field), which sets a new speed record for ephemeral ECDH on a 16-bit processor. We also describe the curve generation process and analyze the execution time of various field and point arithmetic operations on curves over a 159-bit and a 191-bit pseudo-Mersenne prime field. [less ▲] Detailed reference viewed: 215 (18 UL)Faster Mask Conversion with Lookup Tables Vadnala, Praveen Kumar ; Groszschädl, Johann in Mangard, Stefan; Poschmann, Axel Y. (Eds.) Constructive Side-Channel Analysis and Secure Design, 6th International Workshop, COSADE 2015, Berlin, Germany, April 13-14, 2015. Revised Selected Papers (2015, April) Masking is an effective and widely-used countermeasure to thwart Differential Power Analysis (DPA) attacks on symmetric cryptosystems. When a symmetric cipher involves a combination of Boolean and ... [more ▼] Masking is an effective and widely-used countermeasure to thwart Differential Power Analysis (DPA) attacks on symmetric cryptosystems. When a symmetric cipher involves a combination of Boolean and arithmetic operations, it is necessary to convert the masks from one form to the other. There exist algorithms for mask conversion that are secure against first-order attacks, but they can not be generalized to higher orders. At CHES 2014, Coron, Großschädl and Vadnala (CGV) introduced a secure conversion scheme between Boolean and arithmetic masking of any order, but their approach requires d=2t+1 shares to protect against attacks of order t. In the present paper, we improve the algorithms for second-order conversion with the help of lookup tables so that only three shares instead of five are needed, which is the minimal number for second-order resistance. Furthermore, we also improve the first-order secure addition method proposed by Karroumi, Richard and Joye, again with lookup tables. We prove the security of all presented algorithms using well established assumptions and models. Finally, we provide experimental evidence of our improved mask conversion applied to HMAC-SHA-1. Simulation results show that our algorithms improve the execution time by 85% at the expense of little memory overhead. [less ▲] Detailed reference viewed: 245 (5 UL)Conversion from Arithmetic to Boolean Masking with Logarithmic Complexity Coron, Jean-Sébastien ; Groszschädl, Johann ; et al in Leander, Gregor (Ed.) Fast Software Encryption, 22nd International Workshop, FSE 2015, Istanbul, Turkey, March 8-11, 2015, Revised Selected Papers (2015, March) A general technique to protect a cryptographic algorithm against side-channel attacks consists in masking all intermediate variables with a random value. For cryptographic algorithms combining Boolean ... [more ▼] A general technique to protect a cryptographic algorithm against side-channel attacks consists in masking all intermediate variables with a random value. For cryptographic algorithms combining Boolean operations with arithmetic operations, one must then perform conversions between Boolean masking and arithmetic masking. At CHES 2001, Goubin described a very elegant algorithm for converting from Boolean masking to arithmetic masking, with only a constant number of operations. Goubin also described an algorithm for converting from arithmetic to Boolean masking, but with O(k) operations where k is the addition bit size. In this paper we describe an improved algorithm with time complexity O(log k) only. Our new algorithm is based on the Kogge-Stone carry look-ahead adder, which computes the carry signal in O(log k) instead of O(k) for the classical ripple carry adder. We also describe an algorithm for performing arithmetic addition modulo 2^k directly on Boolean shares, with the same complexity O(log k) instead of O(k). We prove the security of our new algorithm against first-order attacks. Our algorithm performs well in practice, as for k=64 we obtain a 23% improvement compared to Goubin’s algorithm. [less ▲] Detailed reference viewed: 212 (8 UL)Montgomery Modular Multiplication on ARM-NEON Revisited ; Liu, Zhe ; Groszschädl, Johann et al in Kim, Jongsung; Lee, Jooyoung (Eds.) Information Security and Cryptology - ICISC 2014, 17th International Conference, Seoul, Korea, December 3-5, 2014, Revised Selected Papers (2014, December) Detailed reference viewed: 189 (4 UL)Reverse Product-Scanning Multiplication and Squaring on 8-bit AVR Processors Liu, Zhe ; ; Groszschädl, Johann et al in Hui, Lucas C. K.; Qing, Sihan; Shi, Elaine (Eds.) et al Information and Communications Security - 16th International Conference, ICICS 2014, Hong Kong, China, December 16-17, 2014. Proceedings (2014, December) High performance, small code size, and good scalability are important requirements for software implementations of multi-precision arithmetic algorithms to fit resource-limited embedded systems. In this ... [more ▼] High performance, small code size, and good scalability are important requirements for software implementations of multi-precision arithmetic algorithms to fit resource-limited embedded systems. In this paper, we describe optimization techniques to speed up multi-precision multiplication and squaring on the AVR ATmega series of 8-bit microcontrollers. First, we present a new approach to perform multi-precision multiplication, called Reverse Product Scanning (RPS), that resembles the hybrid technique of Gura et al., but calculates the byte-products in the inner loop in reverse order. The RPS method processes four bytes of the two operands in each iteration of the inner loop and employs two carry-catcher registers to minimize the number of add instructions. We also describe an optimized algorithm for multi-precision squaring based on the RPS technique that is, depending on the operand length, up to 44.3% faster than multiplication. Our AVR Assembly implementations of RPS multiplication and RPS squaring occupy less than 1 kB of code space each and are written in a parameterized fashion so that they can support operands of varying length without recompilation. Despite this high level of flexibility, our RPS multiplication outperforms the looped variant of Hutter et al.'s operand-caching technique and saves between 40 and 51% of code size. We also combine our RPS multiplication and squaring routines with Karatsuba's method to further reduce execution time. When executed on an ATmega128 processor, the "karatsubarized RPS method" needs only 85k clock cycles for a 1024-bit multiplication (or 48k cycles for a squaring). These results show that it is possible to achieve high performance without sacrificing code size or scalability. [less ▲] Detailed reference viewed: 399 (40 UL)Secure Conversion between Boolean and Arithmetic Masking of Any Order Coron, Jean-Sébastien ; Groszschädl, Johann ; Vadnala, Praveen Kumar in Batina, Lejla; Robshaw, Matthew (Eds.) Cryptographic Hardware and Embedded Systems - CHES 2014, 16th International Workshop, Busan, South Korea, September 23-26, 2014. Proceedings (2014, September) Detailed reference viewed: 166 (12 UL)MoTE-ECC: Energy-Scalable Elliptic Curve Cryptography for Wireless Sensor Networks Liu, Zhe ; ; Groszschädl, Johann in Boureanu, Ioana; Owezarski, Philippe; Vaudenay, Serge (Eds.) Applied Cryptography and Network Security - 12th International Conference, ACNS 2014, Lausanne, Switzerland, June 10-13, 2014. Proceedings (2014, June) Wireless Sensor Networks (WSNs) are susceptible to a wide range of malicious attacks, which has stimulated a body of research on "light-weight" security protocols and cryptographic primitives that are ... [more ▼] Wireless Sensor Networks (WSNs) are susceptible to a wide range of malicious attacks, which has stimulated a body of research on "light-weight" security protocols and cryptographic primitives that are suitable for resource-restricted sensor nodes. In this paper we introduce MoTE-ECC, a highly optimized yet scalable ECC library for Memsic's MICAz motes and other sensor nodes equipped with an 8-bit AVR processor. MoTE-ECC supports scalar multiplication on Montgomery and twisted Edwards curves over Optimal Prime Fields (OPFs) of variable size, e.g. 160, 192, 224, and 256 bits, which allows for various trade-offs between security and execution time (resp. energy consumption). OPFs are a special family of "low-weight" prime fields that, in contrast to the NIST-specified fields, facilitate a parameterized implementation of the modular arithmetic so that one and the same software function can be used for operands of different length. To demonstrate the performance of MoTE-ECC, we take (ephemeral) ECDH key exchange between two nodes as example, which requires each node to execute two scalar multiplications. The first scalar multiplication is performed on a fixed base point (to generate a key pair), whereas the second scalar multiplication gets an arbitrary point as input. Our implementation uses a fixed-base comb method on a twisted Edwards curve for the former and a simple ladder approach on a birationally-equivalent Montgomery curve for the latter. Both scalar multiplications require about 9*10^6 clock cycles in total and occupy only 380 bytes in RAM when the underlying OPF has a length of 160 bits. We also describe our efforts to harden MoTE-ECC against side-channel attacks (e.g. simple power analysis) and introduce a highly regular implementation of the comb method. [less ▲] Detailed reference viewed: 383 (70 UL)New Speed Records for Montgomery Modular Multiplication on 8-Bit AVR Microcontrollers Liu, Zhe ; Groszschädl, Johann in Pointcheval, David; Vergnaud, Damien (Eds.) Progress in Cryptology - AFRICACRYPT 2014, 7th International Conference on Cryptology in Africa, Marrakesh, Morocco, May 28-30, 2014. Proceedings (2014, May) Detailed reference viewed: 175 (12 UL)High-Speed Elliptic Curve Cryptography on the NVIDIA GT200 Graphics Processing Unit ; Liu, Zhe ; Groszschädl, Johann et al in Huang, Xinyi; Zhou, Jianying (Eds.) Information Security Practice and Experience, 10th International Conference, ISPEC 2014, Fuzhou, China, May 5-8, 2014. Proceedings (2014, May) Detailed reference viewed: 189 (12 UL)Low-Weight Primes for Lightweight Elliptic Curve Cryptography on 8-bit AVR Processors Liu, Zhe ; Groszschädl, Johann ; in Lin, Dongdai; Xu, Shouhuai; Yung, Moti (Eds.) Information Security and Cryptology - 9th International Conference, INSCRYPT 2013, Guangzhou, China, November 27-30, 2013 (2013, November) Small 8-bit RISC processors and micro-controllers based on the AVR instruction set architecture are widely used in the embedded domain with applications ranging from smartcards over control systems to ... [more ▼] Small 8-bit RISC processors and micro-controllers based on the AVR instruction set architecture are widely used in the embedded domain with applications ranging from smartcards over control systems to wireless sensor nodes. Many of these applications require asymmetric encryption or authentication, which has spurred a body of research into implementation aspects of Elliptic Curve Cryptography (ECC) on the AVR platform. In this paper, we study the suitability of a special class of finite fields, the so-called Optimal Prime Fields (OPFs), for a "lightweight" implementation of ECC with a view towards high performance and security. An OPF is a finite field Fp defined by a prime of the form p = u*2^k + v, whereby both u and v are "small" (in relation to 2^k) so that they fit into one or two registers of an AVR processor. OPFs have a low Hamming weight, which allows for a very efficient implementation of the modular reduction since only the non-zero words of p need to be processed. We describe a special variant of Montgomery multiplication for OPFs that does not execute any input-dependent conditional statements (e.g. branch instructions) and is, hence, resistant against certain side-channel attacks. When executed on an Atmel ATmega processor, a multiplication in a 160-bit OPF takes just 3237 cycles, which compares favorably with other implementations of 160-bit modular multiplication on an 8-bit processor. We also describe a performance-optimized and a security-optimized implementation of elliptic curve scalar multiplication over OPFs. The former uses a GLV curve and executes in 4.19M cycles (over a 160-bit OPF), while the latter is based on a Montgomery curve and has an execution time of approximately 5.93M cycles. Both results improve the state-of-the-art in lightweight ECC on 8-bit processors. [less ▲] Detailed reference viewed: 564 (156 UL)Efficient Implementation of NIST-Compliant Elliptic Curve Cryptography for Sensor Nodes Liu, Zhe ; ; Groszschädl, Johann et al in Qing, Sihan; Zhou, Jianying; Liu, Dongmei (Eds.) Information and Communications Security - 15th International Conference, ICICS 2013, Beijing, China, November 20-22, 2013. Proceedings (2013, November) In this paper, we present a highly-optimized implementation of standards-compliant Elliptic Curve Cryptography (ECC) for wireless sensor nodes and similar devices featuring an 8-bit AVR processor. The ... [more ▼] In this paper, we present a highly-optimized implementation of standards-compliant Elliptic Curve Cryptography (ECC) for wireless sensor nodes and similar devices featuring an 8-bit AVR processor. The field arithmetic is written in Assembly language and optimized for the 192-bit NIST-specified prime p = 2^192 - 2^64 - 1, while the group arithmetic (i.e. point addition and doubling) is programmed in ANSI C. One of our contributions is a novel lazy doubling method for multi-precision squaring which provides better performance than any of the previously-proposed squaring techniques. Based on our highly optimized arithmetic library for the 192-bit NIST prime, we achieve record-setting execution times for scalar multiplication (with both fixed and arbitrary points) as well as multiple scalar multiplication. Experimental results, obtained on an AVR ATmega128 processor, show that the two scalar multiplications of ephemeral Elliptic Curve Diffie-Hellman (ECDH) key exchange can be executed in 1.75 s altogether (at a clock frequency of 7.37 MHz) and consume an energy of some 42 mJ. The generation and verification of an ECDSA signature requires roughly 1.91 s and costs 46 mJ at the same clock frequency. Our results significantly improve the state-of-the-art in ECDH and ECDSA computation on the P-192 curve, outperforming the previous best implementations in the literature by a factor of 1.35 and 2.33, respectively. We also protected the field arithmetic and algorithms for scalar multiplication against side-channel attacks, especially Simple Power Analysis (SPA). [less ▲] Detailed reference viewed: 383 (56 UL) |
||