TITLE TBD

Ken Mighell

Note: This is the abstract to the accompanying paper which is being used here temporarily in lieu of a specific abstract...

I describe the performance of CRBLASTER, a parallel-processing cosmic-ray application, on the new 49-core MAESTRO processor which is based on the Tilera Corporation 64- core TILE64 processor. CRBLASTER was initially ported to the TILE64 processor using hardware and software provided by the OPERA project which developed the MAESTRO processor for the United States Government. Writing parallel-processing applications is very similar for the TILE64 and the MAESTRO processors. The port of CRBLASTER from the TILE64 processor to the MAESTRO simulator took only 6 hours in October 2009. CRBLASTER is a scientific application which spends most of its time doing doubleprecision floating point computations. The TILE64 processor has no floating point hardware assist all floating point computations on the TILE64 processor must be emulated in software. Each MAESTRO core (tile) has an integrated IEEE 754 compliant floating point unit (FPU). In theory, scientific applications that do a lot of floating point computations should perform significantly faster on the MAESTRO processor than on the TILE64 processor when the clock rates are normalized. The actual performance of CRBLASTER running on 1 to 45 cores on an early MAESTRO developement board (MDB) is described in detail. One major result is that CRBLASTER running on 30 or more cores typically achieves speedup factors of 20 or more. The 100 MHz MDB running CRBLASTER on the standard 750×750 pixel input test image takes 32.82 s (wall time). CRBLASTER running on one core of an Apple Mac Pro with dual 2.8 GHz Quad-Core Intel Xeons takes 5.29 s (wall time) with the same input image. The 100 MHz MDB with floating point operations emulated in software has the measured equivalent computational power of a 450 MHz CPU with a FPU. If the speed of the MDB can be increased to 260 MHz, the computational power of the MDB should then be equivalent to a 1.17 GHz CPU with a FPU. The MAESTRO processor definitely has the potential to be an enabling technology for the next generation of NASA astrophysical missions.

Document date March 23, 2011.