The microprocessor trend towards multi-core and many-core means that redundant resources are getting inexpensive and can be used for both computation and increased capabilities, including fault tolerance. Because multi-core architectures have an inherent programmable redundancy, multi-core processors can support flexible, software-implemented fault tolerance. The Maestro processor, which has been implemented with rad-hard by design technology, has 49 cores, as well as redundant inter-core networks and chip interfaces. In this talk, we explore a range of fault tolerance techniques that can be used on Maestro and other multi-core and many-core architectures and identify other challenges that have not yet been addressed.
Document date May 13, 2010.