Frontiers of Extreme Computing

Operating Systems for Exascale Computing
Thomas Sterling, Louisiana State University

As computer systems technology and architecture have evolved, so have operating systems (OS). OS has historically scheduled jobs, allocated resources, virtualized low level devices, and managed file name spaces. Future high performance computing (HPC) systems will depart dramatically from conventional systems in that multicore, multithreaded, and heterogeneous accelerators will replace single microprocessors for total system parallelism that exceeds billion-way concurrency in hardware and software. In addition system reliability will be a major aspect of system control with system configuration adjusting potentially every few minutes. A new generation of system-wide OS may contend with these new structures replacing the ensemble computing strategy of one OS per node. This presentation will discuss one possible general strategy for future OS design and operation based on lightweight kernels on local elements operating in synergy across systems to present a single albeit distributed set of functionalities to parallel applications. Such a strategy provides a virtual global system rather than a collection of virtualized local nodes, managing the changing underlying resource configuration to achieve graceful degradation in the presence of faults while responding to adaptive resource demands of new classes of dynamic applications.