Frontiers of Extreme Computing

Agenda

Day:	Monday	Tuesday	Wednesday	Thursday
Theme:	General Methods	Specific Methods	Future Methods	Report
8:15 am	Registration
9:00 am	Workshop Introduction, Overview, and Goals, E. DeBenedictis, Sandia [ppt] [pdf]	Drivers from Science and Engineering Applications, D. Keyes, Columbia Un.	Custom vs. Commodity Processors and Memory, W. Dally, Stanford [ppt] [pdf]	Reports: Software and Applications [ppt] [pdf] Future Hardware [ppt] [pdf] Extreme Hardware [ppt] [pdf] General Hardware Discussion [ppt] [pdf]
9:30 am	Applications Keynote: Simulating the Earth's Environment, D. Bader, LLNL [ppt] [pdf]	Simulating Plasma Fusion, S. Jardin, PPPL [ppt] [pdf]	Cyclops, TBD
10:00 am	Technology Keynote: Quantum Computing - its Promise and Limitations, C. H. Bennett, IBM [ppt1] [ppt2] [ppt3] [ppt4] [ppt5] [pdf1] [pdf2] [pdf3] [pdf4] [pdf5]	NAS Supercomputing Report, Bill Dally, Stanford [ppt] [pdf]	How to Replace MPI as the Programming Model of the Future, W. Gropp, ANL [ppt] [pdf]
10:30 am	Break	Break	Break	Break
11:00 am	The ITRS Roadmap, P. Zeitzoff, Sematech [ppt] [pdf]	Manufacturability and Computability at the Nano-Scale, S. Williams, HP [ppt] [pdf]	Problems Soluable with Graph Algorithms, B. Hendrickson, SNL [ppt] [pdf]	Overall Outbrief [ppt] [pdf]
11:30 am	The End of Silicon: Implications and Predictions for HPC Systems, D. Burger, UT Austin [ppt] [pdf]	Nanowire-based Computing Systems, A. DeHon, Caltech [ppt] [pdf]	Space Applications, R. Biswas, NASA [ppt] [pdf]
12:00 Noon	Superconducting Technologies, A. Silver [ppt] [pdf]	Future Technologies, T. Theis, IBM [ppt] [pdf]	National Leadership Computing Facility, J. Nichols, ORNL [ppt1] [ppt2] [pdf1] [pdf2]
12:30 pm	Lunch	Lunch	Lunch	Lunch
1:30 pm	Reversible Computing, Mike Frank, UFL [ppt] [pdf]	QDCA Hardware, C. Lent, Notre Dame [ppt] [pdf]	Programmatic Perspectives on Quantum Information, M. Foster, NSF [ppt] [pdf]
2:00 pm	Continuum Computer Architecture, T. Sterling, LSU [ppt] [pdf]	QDCA Systems, M. Niemier, Ga Tech [ppt] [pdf]	Architecture of a Quantum Computer, M. Oskin, U. Washington
2:30 pm	Charter to the Working Groups	TBD	Quantum Simulation, C. Williams, JPL [ppt] [pdf]
3:00 pm	Break	Break	Break
3:30 pm	WG Session 1	WG Session 2	WG Session 3
6:30 pm	Dinner	Dinner	Dinner
7:00 pm		What Else Can Physics Do for Us?, E. Fredkin, CMU West
7:45 pm	Social	Social	Social

Abstracts

David Bader - Climate Simulation for Climate Change Studies
Climate modeling is one of the most well-known simulation problems that require high-end computing, data and network resources. Because it is impossible to build a physical laboratory to study climate, climate simulation models are the only tools, with which scientists can integrate their knowledge to gain understanding of this highly nonlinear and complex system. Climate simulation has advanced dramatically over the last two decades, in large part because of demands to study the potential climate changes brought about by human activities, principally the increase in atmospheric carbon dioxide concentrations that result from combustion of carbon-based fossil fuels for energy production.

This presentation will provide a brief introduction to the scientific aspects of climate science and their representation within climate simulation models. The primary focus of the presentation will be on the use of models to simulate climate change. The practical limitations imposed by throughput considerations, computer architectures, programming tools, and available computer resources will be identified.

In the last two years, several visionary conceptual models have been proposed that assume computer hardware were no longer a limitation. These ideas take the concept of climate modeling much closer to a “first principles” approach to simulation that makes reliance on “parameterizations”, or closure schemes for important, unresolved processes, much less important. As will be shown, however, a “perfect” simulation is still impossible. Equally important to advances in computer hardware is the necessity for complementary advances in software and programming models, which has not been the case over the past decade. This talk will highlight some of the shortcomings in this area that have prevented climate modeling, and other simulation science applications, from advancing at a rate commensurate with the rate of advance in hardware.

Charles Bennett - Quantum computing -- its promise and limitations
The theory of reversible computation, and more recently quantum computation, have drawn attention to previously neglected physical aspects of information processing, and each has offered some hope of overcoming what were previously thought to be fundamental limits. Neither, however, offers general cure for the anticipated end of Moore's law.

Doug Burger - The End of Silicon: Implications and Predictions for HPC Systems
Moore's Law has persisted longer than many thought possible. Nevertheless, the end for CMOS is in sight. Before we get there, power, leakage, and reliability challenges will change computer systems substantively. In this talk, I will project how architectures and systems are likely to evolve between the present and the last conventional computer system.

W. Dally - Custom vs. Commodity Processors (and Memory Systems): The Right Hardware Makes Software Easier

Andre DeHon - Nanowire-based Computing Systems
Chemists can now construct wires which are a few atoms in diameter; these wires can be selectively field-effect gated, and wire crossings can act as programmable diodes. The tiny feature sizes offer a path to economically scale to atomic dimensions. However, the associated bottom-up synthesis techniques only produce highly regular structures and have high defect rates and minimal assembly control. We develop architectures to bridge between lithographic and atomic-scale dimensions and tolerate defective and stochastic assembly. Using 10nm pitch nanowires, these nanowire-based programmable architectures offer one to two orders of magnitude greater mapped-logic density than defect-free lithographic FPGAs at 22nm.

Michael Frank - The Reversible Computing Question: A Crucial Challenge for Computing
The computing world is rapidly approaching a power-performance crisis. Over the course of the next few decades, all the usual tricks and techniques for improving computer energy efficiency will, one by one, approach fixed limits. As this happens, the performance that can be maintained per watt consumed (and ops performed per dollar spent, barring cheaper energy) will gradually flatten out, and stay flat for ever!
For, as von Neumann first pointed out in 1949, fundamental thermodynamics imposes a strict limit on the energy efficiency of conventional "irreversible" binary operations, and conventional algorithms for a given task will always require some minimum number of these.

However, this crisis might be avoided, and computer energy efficiency might resume an indefinite upward climb, if only we can practically implement an unconventional approach known as "reversible computing," which avoids using irreversible operations. Instead, in a high-performance "ballistic" reversible computer, the physical and logical state of its circuits essentially "coasts" along the desired trajectory through the machine's configuration space, like a roller coaster along its track, with an energy dissipation that, in theory, can approach arbitrarily close to zero as the technology is further refined.

Unfortunately, the question of whether reversible computing can be made to work efficiently in practice remains open at this time. Various theoretical models and "proof of concept" prototypes of reversible machines exist, but can all be criticized as either too inefficient or too incomplete to be convincing. On the other hand, all of the many attempts by skeptics to prove reversible computing impossible (or permanently impractical) have also been invalid or incomplete, often relying on demonstrably incorrect assumptions about how a computer must work.

The reversible computing question is very deep and important, and it deserves increased attention. But it is also extremely subtle, and quite difficult to resolve. In this talk, I review the major results and open issues in the field, and propose what we must do in order to make progress towards answering this crucial question, and possibly opening the door to a future of unbounded improvements in computer energy efficiency.

Ed Fredkin - What Else Can Physics Do for Us?
It would be nice if the interplay between new computation technologies and the laws of physics could bring real progress. Quantum computing, so far, has been interesting physics with no resulting computation. Further, the range of applicability of QC, if it ever becomes practical, may be limited to a tiny slice of the universe of tomorrow’s computational workload. It is interesting to raise the question: “What else can physics offer with respect to real world computational problems?” The answers aren’t clear but there are new insights and possibilities. We should understand that QC is not the only way to bend the basic physical properties of matter and energy to the task of general computation. We will report on physics based concepts that result in conventional computational structures; differing only by having speed and capacity that leapfrog Moore’s Law.

William Gropp - How to Replace MPI as the Programming Model of the Future
There are now legacy MPI codes that future architectures and systems will need to support, either directly through a high-quality MPI implementation or with advanced code transformation aids. Why has MPI been so successful? What properties must any replacement have? This talk will look at some of the reasons (other than portability) for MPI's success and what lessons they provide for current challengers, such as the PGAS languages. The interaction of system architecture and hardware support for programming models and for algorithms will also be discussed, with particular emphasis on the importance of balanced performance features on programming models and algorithms for high end computing.

Bruce Hendrickson - Parallel Graph Algorithms: Architectural Demands of Pathological Applications
Many important applications of high performance computing involve frequent, unstructured memory accesses. Among these applications are graph algorithms which arise in a wide range of important applications including linear algebra, biology and informatics. Graph operations often involve following sequences of edges, which requires minimal computation but frequent accesses to unpredictable locations in global memory. These characteristics result in poor performance on traditional microprocessors, and even worse performance on common parallel computers.

In recent work, we have explored the performance of graph algorithms on the massively multithreaded Cray MTA-2.
The MTA's latency tolerance and fine-grained synchronization mechanisms allow for high performance of single processor and parallel graph algorithms. We will present these results and discuss their lessons for future developments in computer architecture.

Joint work with Jon Berry, Richard Murphy and Keith Underwood.

Steve Jardin - Towards Comprehensive Simulation of Fusion Plasmas
In Magnetic Fusion Energy (MFE) experiments, high-temperature (100 million degrees centigrade) plasmas are produced in the laboratory in order to create the conditions where hydrogen isotopes (deuterium and tritium) can undergo nuclear fusion and release energy (the same process that fuels our sun). Devices called tokamaks and stellarators are “magnetic bottles” that confine the hot plasma away from material walls, allowing fusion to occur. Confining the ultra-hot plasma is a daunting technical challenge. The level of micro-turbulence in the plasma determines the amount of time it takes for the plasma to “leak out” of the confinement region. Also, global stability considerations limit the amount of plasma a given magnetic configuration can confine and thus determines the maximum fusion rate and power output. Present capability is such that we can apply our most complete computational models to realistically simulate both nonlinear macroscopic stability and microscopic turbulent transport in the smaller fusion experiments that exist today, at least for short times. Anticipated increases in both hardware and algorithms during the next 5-10+ years will enable application of even more advanced models to the largest present-day experiments and to the proposed burning plasma experiments such as the International Thermonuclear Experimental Reactor (ITER). The present thrust in computational plasma science is to merge together the now separate macroscopic and microscopic models, and to extend the physical realism of these by the inclusion of detailed models of such phenomena as RF heating and atomic and molecular physical processes (important in plasma-material interactions), so as to provide a true integrated computational model of a fusion experiment. This is the goal of a new initiative known as the Fusion Simulation Project. Such an integrated modeling capability will greatly facilitate the process whereby plasma scientists develop understanding and insights into these amazingly complex systems that will be critical in realizing the long term goal of creating an environmentally and economically sustainable source of energy.

David Keyes - Drivers from Science and Engineering Applications
Relying on the input of hundreds of members of the U.S. computational science community at the 2003 Science-based Case for Large-scale Simulation (SCaLeS) workshop, the March 2004 whitepapers of Scientific Discovery through Advanced Computing (SciDAC) project of the U.S. DOE, and a collection of recent Gordon Bell Prize finalist papers, we define and motivate some aspirations for high-end science and engineering simulations in the five-year horizon. Looking at some hurdles to progress in high-end simulation, we note in passing that not all are architectural in nature, then concentrate further on those that apparently are. Looking at some kernels of high-end simulation, we note apparent hurdles to their scalability and draw inspiration from the flexibility of algorithm designers to get around hurdles that have presented themselves in the past.

Craig S. Lent - Molecular quantum-dot cellular automata and the limits of binary switch scaling
Molecular quantum-dot cellular automata (QCA) is an approach to electronic computing at the single-molecule level which encodes binary information using the molecular charge configuration. This approach differs fundamentally from efforts to reproduce conventional transistors and wires using molecules. A QCA molecular cell has multiple redox centers which act as quantum dots. The arrangement of mobile charge among these dots represents the bit. The interaction from one molecule to the next is through the Coulomb coupling—no charge flows from cell to cell. Prototype single-electron QCA devices have been built using small metal dots and tunnel junctions. Logic gates and shift registers have been demonstrated, though at cryogenic temperatures. Molecular QCA would work at room temperature. Molecular implementations have been explored and the basic switching mechanism confirmed. Clocked control of QCA device arrays is possible and requires creative rethinking of computer architecture paradigms. By not using molecules as current switches, the QCA paradigm may offer a solution to the fundamental problem of excess heat dissipation in computation.

Jeff Nichols - National Leadership Computing Facility - Bringing Capability Computing to Science
The National Center for Computational Sciences (NCCS) maintains and operates a user facility to develop and deploy leadership-computing systems with the goal of providing computational capability that is at least 100 times greater than what is generally available for advanced scientific and engineering problems. We work with industry, laboratories, and academia to deploy a computational environment that enables the scientific community to exploit this extraordinary capability, achieving substantially higher effective performance than is available elsewhere. A non-traditional access and support model has been proposed in order to achieve a high level of scientific productivity and address challenges in climate, fusion, astrophysics, nanoscience, chemistry, biology, combustion, accelerator physics, engineering, and other science disciplines. The NCCS brings together world-class researchers; a proven, aggressive, and sustainable hardware path; an experienced operational team; a strategy for delivering true capability computing; and modern computing facilities connected to the national infrastructure through state-of-the-art networking to deliver breakthrough science. Combining these resources and building on expertise and resources of the partnership, the NCCS enables scientific computation and breakthrough science at an unprecedented scale.

Michael Niemier - What can ‘baseline’ QCA do?
Whether or not the end of the CMOS curve does indeed come to pass, speculation – combined with other technological advances – have helped to fuel a wealth of research related to alternative means of computation. Much of this work has either focused on the lowest levels of device physics or, at best, very simple circuits. But most importantly, it has often led to a publication that discusses the demonstration and performance of a single device. While this is an undeniably important and necessary first step, we must ultimately consider how “Device X” will be used to form a computational system, as well as the fact that we will need many Device X’s to do so – not just one. Given the increasing number of proposed novel devices, we should explicitly consider both of the above issues beginning in the initial stages of a device’s development – even before the first paper demonstrating Device X is published.

However, by involving computer architects during device development, we will not just be looking at a single device in isolation – rather, we will be evaluating a reasonably sized system with an initial computational goal in mind. Moreover, by assuming a set of very pessimistic implementation constraints, we can establish a true baseline for Device X – defining our best-foreseen application in the worst-foreseen operational environment. If expectations (i.e. with regard to power, area, speed, etc.) for end-of-the-roadmap silicon and other emergent devices – for the same application – are plotted simultaneously, we can make significant headway into discovering what niche roles Device X can realistically play in computing.

What has been done and, in the speaker’s opinion, what needs to be done will be discussed in the context of Quantum-dot Cellular Automata (QCA).

Mark Oskin - Engineering a Quantum Computer: Bridging the Theoretical and Practical Divide
Theoretically, quantum computers offer great promise to solve formally intractable problems. Experimentally, small scale quantum computers have been demonstrated. The next phase of research is to construct large-scale quantum computers capable of proving the technology and further validating the theoretical foundations. Such devices will consist of 10's to 100's of quantum bits. At this scale, proper engineering of the devices becomes critical.

This talk will present a broad overview of our work in exploring the engineering challenges and design trade-offs involved with large scale quantum systems. We have found that noise will significantly constrain scalability and that the micro-architecture of these devices needs to be tuned to minimize decoherence. This talk will conclude by sketching future work to be done in this area.

Thomas Sterlling - Continuum Computer Architecture for Nano-scale Technologies
As the feature size of logic devices decreases with Moore’s Law ultimately achieving the domain of nano-scale technology, the ratio of ‘remote’ versus ‘local’ action will escalate dramatically demanding entirely new computing models and structures to efficiently exploit these future technologies and lead to Exaflops capability and beyond. Continuum Computer Architecture (CCA) is a new family of parallel computer architectures under development at LSU to harness convergent device technologies beyond Moore’s Law that respond to the challenges implied by the emerging disparity between local and global operations. CCA provides one possible framework for employing nano-scale technology for future convergent system architectures at the end of Moore’s Law. CCA is a cellular architecture merging data storage, logical manipulation, and nearest neighbor transfers in a single simple element or cell. In physical structure, CCA is reminiscent of cellular automata. But logically, CCA is very different. It supports a general global parallel model of computation through the management of a distributed virtual name space for both data and parallel continuations which are data structures that dynamically and adaptively govern fine grain parallel execution. The semantics of the CCA system borrows from the ParalleX model of computation that combines message driven computing, multi-threading, and the futures synchronization construct to replace the venerable and conventional barrier controlled communicating sequential processes. This presentation will describe ParalleX and its potential implementation through Continuum Computer Architecture with nano-scale technology for Exaflops and beyond.

Tom Theis - Devices for Computing: Present Problems and Future Solutions
The biggest problems limiting the further development of the silicon field-effect-transistor are power dissipation and device-to-device variability. Despite some pessimistic predictions, it looks like the technology can be extended for at least another 10 years. Research into transistors based on carbon nanotubes or semiconductor nanowires can be viewed as a quest for the "ultimate" field-effect-transistor. Looking beyond the field-effect transistor, major US Semiconductor manufacturers have recently announced the Nanoelectronics Research Initiative (NRI) which will fund university research aimed at entirely new logical switches. Beyond the stated research goals of NRI, I will briefly survey the prospects for devices that efficiently implement reversible logic and quantum logic.

Colin P. Williams - Introduction to Quantum Simulation
While it is widely known that quantum computers can factor composite integers and compute discrete logarithms in polynomial time, other applications of quantum computers have not been publicized as well. In this talk I will discuss some of the ways quantum computers could be used in scientific computation, especially in simulation, quantum chemistry, signal processing, and solving differential equations. Such applications of quantum computers have the potential to have greater scientific and commercial impact than those related to factoring and code-breaking.

Stan Williams - Manufacturability and Computability at the nano-Scale
Nano-Scale electronics offer the possibility to build much higher density circuits than those that are presently available, but there are major issues to resolve before they become a reality. A significant issue is the cost of manufacturing, which will lead to new fabrication technologies and geometrically simpler circuit designs. A second is that at some scale, the physics of the field-effect transistor will not longer operate, and new devices enabled by quantum effects will be needed. I will review our latest developments in the areas of nano-imprint lithography, switching devices, and crossbar architectures.

Peter Zeitzoff - MOSFET Scaling Trends, Challenges, and Key Technology Innovations through the End of the Roadmap

Sandia National Laboratories | Lawrence Berkeley National Laboratory | Privacy and Security

Modified on: July 3, 2007
Contact: erikdebenedictis@sandia.gov