Reference : Efficient Arithmetic on ARM-NEON and Its Application for High-Speed RSA Implementation
Scientific journals : Article
Engineering, computing & technology : Computer science
Security, Reliability and Trust
http://hdl.handle.net/10993/37482
Efficient Arithmetic on ARM-NEON and Its Application for High-Speed RSA Implementation
English
Seo, Hwajeong [Pusan National University > School of Computer Science and Engineering]
Liu, Zhe [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Computer Science and Communications Research Unit (CSC) >]
Groszschädl, Johann mailto [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC) >]
Kim, Howon [Pusan National University > School of Computer Science and Engineering]
Dec-2016
Security and Communication Networks
John Wiley & Sons
9
18
5401-5411
Yes (verified by ORBilu)
1939-0114
1939-0122
Malden
United Kingdom
[en] Public-Key Cryptography ; Multiple-Precision Arithmetic ; Modular Reduction ; SIMD-Level Parallelism ; Vector Instructions ; ARM NEON
[en] A steadily increasing number of modern processors support Single Instruction Multiple Data (SIMD) instructions to speed up multimedia, communication, and security applications. The computational power of Intel's SSE and AVX extensions as well as ARM's NEON engine has initiated a body of research on SIMD-parallel implementation of multiple-precision integer arithmetic operations, in particular modular multiplication and modular squaring, which are performance-critical components of widely-used public-key cryptosystems such as RSA, DSA, Diffie-Hellman, and their elliptic-curve variants ECDSA and ECDH. In this paper, we introduce the Double Operand Scanning (DOS) method for multiple-precision squaring and describe its implementation for ARM NEON processors. The DOS method uses a full-radix representation of the operand to be squared and aims to maximize performance by reducing the number of Read-After-Write (RAW) dependencies between source and destination registers. We also analyze the benefits of applying Karatsuba's technique to both multiple-precision multiplication and squaring, and present an optimized implementation of Montgomery's algorithm for modular reduction. Our performance evaluation shows that the DOS method along with the other optimizations described in this paper allows one to execute a full 2048-bit modular exponentiation in about 14.25 million clock cycles on an ARM Cortex-A15 processor, which is significantly faster than previously-reported RSA implementations for the ARM-NEON platform.
http://hdl.handle.net/10993/37482
10.1002/sec.1706
http://onlinelibrary.wiley.com/doi/10.1002/sec.1706

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Limited access
SCN2016.pdfAuthor postprint99.74 kBRequest a copy

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.