[en] Today, leveraging the enormous modular power, diversity and flexibility of manycore systems-on-a-chip (SoCs) requires careful orchestration of complex and heterogeneous resources, a task left to low-level software, e.g., hypervisors. In current architectures, this software forms a single point of failure and worthwhile target for attacks: once compromised, adversaries can gain access to all information and full control over the platform and the environment it controls. This article proposes Midir, an enhanced manycore architecture, effecting a paradigm shift from SoCs to distributed SoCs. Midir changes the way platform resources are controlled, by retrofitting tile-based fault containment through well known mechanisms, while securing low-overhead quorum-based consensus on all critical operations, in particular privilege management and, thus, management of containment domains. Allowing versatile redundancy management, Midir promotes resilience for all software levels, including at low level. We explain this architecture, its associated algorithms and hardware mechanisms and show, for the example of a Byzantine fault tolerant microhypervisor, that it outperforms the highly efficient MinBFT by one order of magnitude.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Pinto-Gouveia, Ines; University of Luxembourg > SnT
VOLP, Marcus ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CritiX
Esteves-Verissimo, Paulo
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Behind the last line of defense: Surviving SoC faults and intrusions
Date de publication/diffusion :
décembre 2022
Titre du périodique :
Computers and Security
ISSN :
0167-4048
Maison d'édition :
Elsevier
Volume/Tome :
123
Peer reviewed :
Peer reviewed vérifié par ORBi
Projet FnR :
FNR12686210 - Architectural Support For Intrusion Tolerant Operating-system Kernels, 2018 (01/11/2018-31/10/2021) - Marcus Völp
Aggarwal, N., Ranganathan, P., Jouppi, N.P., Smith, J.E., Configurable isolation: building high availability systems with commodity multi-core processors. International Symposium on Computer Architecture (ISCA), 2007, 470–481.
Aguilera, M.K., Ben-David, N., Guerraoui, R., Marathe, V., Zablotchi, I., The impact of RDMA on agreement. Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, 2019, Association for Computing Machinery, New York, NY, USA, 409418, 10.1145/3293611.3331601.
Aguilera, M.K., Ben-David, N., Guerraoui, R., Marathe, V.J., Xygkis, A., Zablotchi, I., Microsecond consensus for microsecond applications. 14th USENIX Symposium on Operating Systems Design and Implementation, 2020.
Al-Boghdady, A., Wassif, K., El-Ramly, M., The presence, trends, and causes of security vulnerabilities in operating systems of IoT's low-end devices. Sensors, 21(7), 2021, 2329.
Asmussen, N., Völp, M., Nöthen, B., Härtig, H., Fettweis, G., M3: a hardware/operating-system co-design to tame heterogeneous manycores. Architectural Support for Programming Languages and Operating Systems, 2016, ACM, Atlanta, GA, USA.
Avizienis A., Chen L., et al. On the implementation of n-version programming for software fault-tolerance during program execution1977;.
Baumann, A., Barham, P., Dagand, P.E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., Singhania, A., The multikernel: a new OS architecture for scalable multicore systems. Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP ’09, 2009, ACM, New York, NY, USA, 29–44, 10.1145/1629575.1629579.
Bhat, K., Vogt, D., van der Kouwe, E., Gras, B., Sambuc, L., Tanenbaum, A.S., Bos, H., Giuffrida, C., Osiris: efficient and consistent recovery of compartmentalized operating systems. 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016, 25–36, 10.1109/DSN.2016.12.
Biggs, S., Lee, D., Heiser, G., The jury is in: monolithic OS design is flawed. Asia-Pacific Workshop on Systems (APSys), 2018, ACM SIGOPS, Korea, 10.1145/3265723.3265733.
Bolchini, C., Carminati, M., Miele, A., Self-adaptive fault tolerance in multi-/many-core systems. J. Electron. Test 29:2 (2013), 159–175, 10.1007/s10836-013-5367-y.
Bressoud, T.C., Schneider, F.B., Hypervisor-based fault tolerance. 15th ACM Symposium on Operating Systems Principles (SOSP), Copper Mountain, Colorado, USA, 1995, 1–11.
Brooks, T.T., Caicedo, C., Park, J.S., Security vulnerability analysis in virtualized computing environments. Int. J. Intell. Comput. Res. 3:1/2 (2012), 277–291.
Castro, M., Liskov, B., Practical byzantine fault tolerance. 3rd Symposium on Operating Systems Design and Implementation, 1999, ACM, New Orleans, USA.
Chapin, J., Rosenblum, M., Devine, S., Lahiri, T., Teodosiu, D., Gupta, A., Hive: fault containment for shared-memory multiprocessors. Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP ’95, 1995, ACM, New York, NY, USA, 12–25, 10.1145/224056.224059.
Colman-Meixner, C., Develder, C., Tornatore, M., Mukherjee, B., A survey on resiliency techniques in cloud computing infrastructures and applications. IEEE Commun. Surv. Tutor. 18:3 (2016), 2244–2281, 10.1109/COMST.2016.2531104.
Correia, M., Neves, N.F., Verissimo, P., How to tolerate half less one byzantine nodes in practical distributed systems. Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004, 174–183, 10.1109/RELDIS.2004.1353018.
Costan, V., Devadas, S., Intel SGX Explained. Technical Report, 2016, Massachusetts Institute of Technology https://eprint.iacr.org/2016/086.pdf (Accessed: 2016-07-22).
Das D. An indian nuclear power plant suffered a cyberattack. Here's what you need to know. https://www.washingtonpost.com/politics/2019/11/04/an-indian-nuclear-power-plant-suffered-cyberattack-heres-what-you-need-know/; 2019. Accessed: 2017-03-12.
David, F.M., Chan, E.M., Carlyle, J.C., Campbell, R.H., CuriOS: improving reliability through operating system structure. Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, 2008, USENIX Association, Berkeley, CA, USA, 59–72 http://dl.acm.org/citation.cfm?id=1855741.1855746.
Davies A. Tesla's autopilot has had its first deadly crash. https://www.wired.com/2016/06/teslas-autopilot-first-deadly-crash/; 2016. Accessed: 2017-03-12.
Depoutovitch, A., Stumm, M., Otherworld: giving applications a chance to survive OS kernel crashes. Proceedings of the 5th European Conference on Computer Systems, EuroSys ’10, 2010, ACM, New York, NY, USA, 181–194, 10.1145/1755913.1755933.
Döbel, B., Operating System Support for Redundant Multithreading, 2014, Technische Universität Dresden, Dresden, Germany Ph.D. thesis.
Elphinstone, K., Shen, Y., Increasing the trustworthiness of commodity hardware through software. 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2013.
Engel, M., Döbel, B., The reliable computing base: a paradigm for software-based reliability. Workshop on SOBRES, 2012.
Ermolov, M., Goryachy, M., How to hack a turned-off computer — or running unsigned code in intel management engine. Black hat Europe, London, UK, 2017 Avail at https://www.blackhat.com/docs/eu-17/materials/eu-17-Goryachy-How-To-Hack-A-Turned-Off-Computer-Or-Running-Unsigned-Code-In-Intel-Management-Engine.pdf, accessed 15.04.2018.
Esposito, E.G., Coelho, P., Pedone, F., Kernel paxos. 37th Symposium on Reliable Distributed Systems (SRDS), 2018, IEEE.
Függer, M., Schmid, U., Reconciling fault-tolerant distributed computing and systems-on-chip. Distrib. Comput. 24:6 (2012), 323–355.
Gens, D., OS-Level Attacks and Defenses: From Software to Hardware-Based Exploits, 2018, Technische Universität Darmstadt Ph.D. thesis.
Govil, K., Teodosiu, D., Huang, Y., Rosenblum, M., Cellular disco: resource management using virtual clusters on shared-memory multiprocessors. Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles, SOSP ’99, 1999, ACM, New York, NY, USA, 154–169, 10.1145/319151.319162.
Greenberg A. Hackers remotely kill a jeep on the highway. http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/; 2015.
Herder, J.N., Bos, H., Gras, B., Homburg, P., Tanenbaum, A.S., Construction of a highly dependable operating system. Proceedings of the Sixth European Dependable Computing Conference, EDCC ’06, 2006, IEEE Computer Society, Washington, DC, USA, 3–12, 10.1109/EDCC.2006.7.
Hoffmann, M., Dietrich, C., Lohmann, D., Failure by design: influence of the RTOS interface on memory fault resilience. G. S. of Informatics, (eds.) Proceedings of the 2nd GI Workshop on Software-Based Methods for Robust Embedded Systems (SOBRES ’13), 2013 http://www4.cs.fau.de/Publications/2013/hoffmann_13_sobres.pdf.
Hofmann, O.S., Kim, S., Dunn, A.M., Lee, M.Z., Witchel, E., Inktag: secure applications on an untrusted operating system. SIGPLAN Not. 48:4 (2013), 265–278, 10.1145/2499368.2451146.
Joseph, M.K., Avizienis, A., A fault tolerance approach to computer viruses. IEEE Symposium on Security and Privacy, Oakland, CA, USA, 1988, 52–58.
Kapitza, R., Behl, J., Cachin, C., Distler, T., Kuhnle, S., Mohammadi, S.V., Schröder-Preikschat, W., Stengel, K., CheapBFT: Resource-efficient byzantine fault tolerance. Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys ’12, 2012, ACM, New York, NY, USA, 295–308, 10.1145/2168836.2168866.
Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S., seL4: Formal verification of an OS kernel. Matthews, J.N., Anderson, T.E., (eds.) Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, SOSP 2009, Big Sky, Montana, USA, October 11–14, 2009, 2009, ACM, 207–220, 10.1145/1629575.1629596.
Knight, J.C., Leveson, N.G., An experimental evaluation of the assumption of independence in multiversion programming. IEEE Trans. Softw. Eng. SE-12:1 (1986), 96–109.
Kocher, P., Genkin, D., Gruss, D., Haar, W., Hamburg, M., Lipp, M., Mangard, S., Prescher, T., Schwarz, M., Yarom, Y., Spectre Attacks: Exploiting Speculative Execution. Technical Report, 2018 ArXiv e-prints 1801.01203.
Kopetz, H., Bauer, G., The time-triggered architecture. Proc. IEEE 91:1 (2003), 112–126.
Kuvaiskii, D., Faqueh, R., Bhatotia, P., Felber, P., Fetzer, C., Haft: hardware-assisted fault tolerance. 11th European Conference on Computer Systems (EuroSys), London, UK, 2016, 1–17.
Lackorzynski A., Warg A., Hohmuth M., Härtig H. L4re. https://l4re.org/doc/index.html; 2018.
Lee D. Myfitnesspal breach affects millions of under armour users. bbc.com; 2018.
Lee R.M., Assante M.J., Conway T. Analysis of the cyber attack on the ukrainian power grid. 2016. https://ics.sans.org/media/E-ISAC_SANS_Ukraine_DUC_5.pdf.
Lenharth, A., Adve, V.S., King, S.T., Recovery domains: an organizing principle for recoverable operating systems. Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, 2009, ACM, New York, NY, USA, 49–60, 10.1145/1508244.1508251.
Levin, D., Douceur, J.R., Lorch, J.R., Moscibroda, T., TrInc: small trusted hardware for large distributed systems. Proceedings of the Sixth USENIX Symposium on Networked Systems Design and Implementation, NSDI 2009, Boston, Massachusetts, USA, April 22–24, 2009, Boston, Massachusetts, USA, vol. 9, 2009, 1–14.
Liedtke, J., On micro-kernel construction. Jones, M.B., (eds.) Proceedings of the Fifteenth ACM Symposium on Operating System Principles, SOSP 1995, Copper Mountain Resort, Colorado, USA, December 3–6, 1995, 1995, ACM, 237–250, 10.1145/224056.224075.
Lipp, M., Schwart, M., Gruss, D., Prescher, T., Haas, W., Mangard, S., Kocher, P., Genkin, D., Yarom, Y., Hamburg, M., Meltdown (CVE-2017-5754). Technical Report, 2018 ArXiv e-prints 1801.01207.
Mancini, L., Modular redundancy in a message passing system. IEEE Trans. Softw. Eng.(1), 1986, 79–86.
Matias, R., Prince, M., Borges, L., Sousa, C., Henrique, L., An empirical exploratory study on operating system reliability. Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC ’14, 2014, ACM, New York, NY, USA, 1523–1528, 10.1145/2554850.2555021.
McCune, J.M., Li, Y., Qu, N., Zhou, Z., Datta, A., Gligor, V., Perrig, A., Trustvisor: efficient TCB reduction and attestation. 2010 IEEE Symposium on Security and Privacy, 2010, 143–158, 10.1109/SP.2010.17.
Meserve J. Mouse click could plunge city into darkness, experts say. http://edition.cnn.com/2007/US/09/27/power.at.risk/index.html; 2007. Accessed: 2017-03-12.
Mullender, S., (eds.) Distributed Systems, second ed., 1993, ACM Press/Addison-Wesley Publishing Co., New York, NY, USA.
Needham, R.M., Wilkes, M.V., Domains of protection and the management of processes. Comput. J. 17:2 (1974), 117–120.
Nikolaev, R., Back, G., VirtuOS: an operating system with kernel virtualization. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, 2013, ACM, New York, NY, USA, 116–132, 10.1145/2517349.2522719.
Ogg, S., Al-Hashimi, B., Yakovlev, A., Asynchronous transient resilient links for NoC. Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS ’08, 2008, ACM, New York, NY, USA, 209–214, 10.1145/1450135.1450182.
Ostrand, T.J., Weyuker, E.J., The distribution of faults in a large industrial software system. Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA ’02, 2002, ACM, New York, NY, USA, 55–64, 10.1145/566172.566181.
Ostrand, T.J., Weyuker, E.J., Bell, R.M., Where the bugs are. Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA ’04, 2004, ACM, New York, NY, USA, 86–96, 10.1145/1007512.1007524.
Patterson D., Ganapathi A. Crash data collection: a windows case study. 3D Digital Imaging and Modeling, International Conference on2005;280–285. 10.1109/DSN.2005.32.
Powell, D., Bonn, G., Seaton, D.T., Verissimo, P., Waeselynck, F., The delta-4 approach to dependability in open distributed computing systems. 18th IEEE International Symposium on Fault-Tolerant Computing (FTCS), 1988, 246–251.
Prabahar, B.P., Edwin, B.E., Survey on virtual machine security. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 1:8 (2012), 115–121.
Price R. Facebook says it ’unintentionally uploaded’ 1.5 million people's email contacts without their consent. Businessinsider.com; 2019.
Schiper, N., Rahli, V., Van Renesse, R., Bickford, M., Constable, R.L., Developing correctly replicated databases using formal tools. 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014, IEEE, 395–406.
Schmid U., Steininger A. Decentralised fault-tolerant clock pulse generation in VLSI chips. 2010. TU Wien, patent: US7791394B2.
Seshadri, A., Luk, M., Qu, N., Perrig, A., Secvisor: a tiny hypervisor to provide lifetime kernel code integrity for commodity OSes. Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP ’07, 2007, ACM, New York, NY, USA, 335–350, 10.1145/1294261.1294294.
Shapiro, J.S., Hardy, N., Eros: a principle-driven operating system from the ground up. IEEE Softw. 19:1 (2002), 26–33, 10.1109/52.976938.
Sousa, P., Neves, N.F., Verissimo, P., Proactive resilience through architectural hybridization. Proceedings of the 2006 ACM Symposium on Applied Computing, 2006, ACM, 686–690.
Sundararaman, S., Subramanian, S., Rajimwale, A., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Swift, M.M., Membrane: operating system support for restartable file systems. Trans. Storage 6:3 (2010), 11:1–11:30, 10.1145/1837915.1837919.
Szefer, J., Keller, E., Lee, R.B., Rexford, J., Eliminating the hypervisor attack surface for a more secure cloud. Proceedings of the 18th ACM Conference on Computer and Communications Security, 2011, 401–412.
Szefer, J., Keller, E., Lee, R.B., Rexford, J., Eliminating the hypervisor attack surface for a more secure cloud. Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS ’11, 2011, ACM, New York, NY, USA, 401–412, 10.1145/2046707.2046754.
Tanenbaum, A.S., Kaashoek, M.F., The amoeba microkernel. Distributed Open Systems, 1994, 11–30.
Thongthua, A., Ngamsuriyaroj, S., Assessment of hypervisor vulnerabilities. 2016 International Conference on Cloud Computing Research and Innovations (ICCCRI), 2016, IEEE, 71–77.
Traverse, P., Lacaze, I., Souyris, J., Airbus fly-by-wire: a total approach to dependability. Building the Information Society, 2004, Springer, 191–212.
Tsidulko J. The 10 biggest cloud outages of 2018. https://www.crn.com/slide-shows/cloud/the-10-biggest-cloud-outages-of-2018; 2018.
Turnbull, L., Shropshire, J., Breakpoints: an analysis of potential hypervisor attack vectors. 2013 Proceedings of IEEE Southeastcon, 2013, IEEE, 1–6.
Verissimo, P., Neves, N., Cachin, C., Poritz, J., Powell, D., Deswarte, Y., Stroud, R., Welch, I., Intrusion-tolerant middleware - the road to automatic security. Secur. Privacy, IEEE 4 (2006), 54–62, 10.1109/MSP.2006.95.
Veríssimo, P.E., Travelling through wormholes: a new look at distributed systems models. SIGACT News 37:1 (2006), 66–81.
Waingold, E., Taylor, M., Srikrishna, D., Sarkar, V., Lee, W., Lee, V., Kim, J., Frank, M., Finch, P., Barua, R., Babb, J., Amarasinghe, S., Agarwal, Anant, Baring it all to software: raw machines. IEEE Comput. 30 (1997), 86–93.
Woodruff, J., Watson, R.N.M., Chisnall, D., Moore, S.W., Anderson, J., Davis, B., Laurie, B., Neumann, P.G., Norton, R., Roe, M., The CHERI capability model: Revisiting RISC in an age of risk. Proceeding of the 41st Annual International Symposium on Computer Architecuture, ISCA ’14, 2014, IEEE Press, Piscataway, NJ, USA, 457–468.
Yang, P., Wang, Q., Li, W., Yu, Z., Ye, H., A fault tolerance NoC topology and adaptive routing algorithm. 2016 13th International Conference on Embedded Software and Systems (ICESS), 2016, 42–47, 10.1109/ICESS.2016.20.
Yusof N. Personal data of 808,000 blood donors compromised for nine weeks; HSA lodges police report. TODAYonline; 2019.
Zhou, F., Condit, J., Anderson, Z., Bagrak, I., Ennals, R., Harren, M., Necula, G., Brewer, E., Safedrive: safe and recoverable extensions using language-based techniques. Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7, 2006, USENIX Association; OSDI ’06, Berkeley, CA, USA, 4 http://dl.acm.org/citation.cfm?id=1267308.1267312.