Stress Testing the 49-core Maestro Processor
Kenneth Mighell
National Optical Astronomy Observatory

Maestro Development Boards (MDBs) with the new 49-core RHDB (Radiation Hardened By Design) Maestro processor are now being evaluated by a few researchers throughout the U.S. I discuss my experience of running my CRBLASTER parallel-processing cosmic-ray-rejection application on a MDB with the goal of evaluating the performance of the Maestro processor and to stress test the MDB computational platform. I describe a programming technique that can significantly improve the computational efficiency of memory bound applications running on a Maestro processor by using all four of the memory controllers on a Maestro processor. I then discuss the importance of using real scientific applications during the testing phase of next-generation computer hardware; running complex real-world scientific applications can stress hardware in unexpected ways that may not might not otherwise be revealed while executing simple applications or unit tests.

Update: I have MDB #32 in my office and am currently stress testing it. The MDB is booted from a Linux box which is connected to the MDB using a null modem cable. Once the bootrom is uploaded after 14 minutes at 14,400 characters per second, Fast Ethernet is enabled. One then logs on to the MDB using a SSH session. Software and data can be up/downloaded to/from the MDB using the tile-monitor command (sftp may be ported in the future). Heat dissipation is a major issue. If the Maestro processor gets to hot it will spontaneously generate illegal instructions which can cause the Linux SMP kernel to panic and hang the processor. Overheating apparently does not cause any permanent damage to the Maestro processor; once a hot Maestro chip has cooled down, the MDB boots correctly and normal operations can proceed.