Scaling to Exaflop/s for Mesh-based Algorithms

David Keyes, Columbia

Many scientific applications that push scientific supercomputing performance past its ever-rising high watermark possess as their central data structure a graph with local connections that comes from a mesh in some small number of dimensions (up to about six). This includes, for instance, systems of conservation laws in physical space, problems for distribution functions posed in phase space, lattice quantum chromodynamics, some circuit models, etc. The central kernels for the best solvers in such applications are essentially sparse matrix multiplications of dense vectors drawn from hierarchical subsets of the mesh, and vector-vector operations. Such applications are suitable for indefinite scaling, provided that the data structures are long-lived enough between adaptations to amortize the cost of creating good schedules for the required data movements. However, this does not mean that such applications are easy to program for performance. This talk is intended to lay out the relatively simple data motion consequences of the (sometimes relatively rich) underlying mathematics of the algorithms and help forge a dialog with architecture and language experts en route from petaflop/s, where programmer-directed message passing is a workable model, to exaflop/s.