The Third Workshop on...
Fault-Tolerant Spaceborne Computing Employing New Technologies, 2010


Location guide: Monday's activities and the dinner on Tuesday are at the Sheraton Albuquerque Uptown as noted. The main meeting place for the conference is at the CSRI building near Sandia labs. The high-level sessions will be elsewhere at Sandia, but meet at the CSRI building.
May 24
May 25
May 26
May 27
8:00 Breakfast
8:30 AM Registration and Breakfast Breakfast Technical working group meeting (Sandia CSRI) (Thursday schedule)
F: Architecture
9:00 AM Organizers, Workshop Process Michael Johnson, NASA Goddard, Enabling Next-Generation Capabilities at NASA
9:15 AM Rick Ridgley, Government/DOD, Nanotechnology developements (US Citizens Only)
9:30 AM Paul Murray (for Brett Koritnik), SEAKR, SEAKR's Next Generation Maestro Space Processor
10 AM Maj David Shultz, Government/DOD, CubeSats Break Break
10:30 AM Break Brian Wie, US Government (General Dynamics), AE9/AP9 Radiation Specification Development: Proton Spectrometer Belt Research (PSBR) Program Update Technical working group meeting (continued)
11 AM Lewis Cohen, Government/DOD, title TBD Ken Mighell, NOAO, Lessons learned on porting CRBLASTER to a many-core processor
11:30 AM Registration (Sandia CSRI Building) Mike Deliman, Wind River, Porting VxWorks To Many-Core Ian Troxel, US Government (Seakr), Advances in Fault Tolerance Techniques Possible By Using Emerging Hardware Technologies
NoonLunch (on your own) Lunch Lunch Closeout
1 PM Technical working group meeting
A: Trust (Sandia)
Trust in Space working group report
1:30 PM David Smyth, JPL, Software Perspectives: Mars Science Laboratory Rover and Future Mission Opportunities for Multicore Steve Crago, ISI East, Fault Tolerant Computing on Maestro
2 PM Bonnie Triezenberg, Boeing, Knowing What You Don't Know Ken Prager, Raytheon, Monarch: Recent Progress
2:30 PM Kenneth A. LaBel, NASA/GSFC, Memories and NASA Spacecraft: A Description of Memories, Radiation Failure Modes, and System Design Considerations Memory working group debrief and Doug Sheldon, NASA JPL, Memory Technology for Spaceborne Applications - Crossroads for Success
3 PM Break Break Break
3:30 PM Technical working group meeting (continued) Peter Kogge, Notre Dame, Multicore Energy Management & Memory Architectures Fault Tolerance and Software Working Group Outbriefs
4 PM Heather Quinn, LANL, Cross-Layer Reliability Activity
4:15 PM Steve Petersen, Government/DOD, Future Ground Enterprise (US Citizens only)
4:30 PM Michael Kokorowski, JPL, An Overview of Jovian Radiation Environments and Effects
5:00 PMReception (Sheraton Albuquerque Uptown)Dinner on your own
6:00 PMDinner and dinner speaker Brad Clement Using Multi-Core Systems for Rover Autonomy
7:00 PMTechnical working group meetings (Sheraton Albuquerque Uptown) (Monday night schedule)
B: Fault tolerance (system level)
C: Memory
Technical working group meetings (Sheraton Albuquerque Uptown) (Tuesday night schedule)
D: Software
E: Interconnect

Monday Working Groups

A. Trust in Space

(First Day: Mon, May 24, 1-5pm):

Sandia National Laboratories will host a working group to develop a preliminary statement of how trust affects, interacts, and enables space systems as well as how to measure trustworthiness. We invite representatives of government and contractor entities who are concerned about trust relationships in space systems, particularly those who have ideas or products in this area.

Trust affects space systems from multiple perspectives. Trust is present in communications of space systems and it also permeates communications with ground segments (both user and control) and between ground segment components. Trust is a supply chain and lifecycle issue for space systems - from concept to decommissioning and in all segments. Trust is required for multi-mission space systems - trust between secondary systems and the primary.

This working group will collectively determine where trust plays a role in space systems and how to measure the trustworthiness of those systems. Factors such as the time criticality of data and the potential consequences of failure of trust are part of the metric. The first task of the working group will be determining the trust areas the group will consider. The second task of the working group will be determining security metrics that apply to the chosen trust areas. The third task will be finding the common metrics that allow relative comparison of trust between areas.

We have approximately four hours allocated to this working group for this activity. The first hour will include presentations on trust areas and security. We currently plan on presenting on trust in cyber security and trust in supply chain. The working group hosts welcome other presentations about specific work in the area of space systems trust. Presentations will be limited to 10 minutes for a maximum of 3 additional presenters to leave two minutes for Q&A for each of the five presentations. In the second hour the working group will move on to the first task. Subsequent tasks' timing depends upon the timing of previous tasks.

The end product of the Trust Working Group will be a statement to be presented at the general Space Conference to an audience of seniors. This statement will directly reflect the results of each task of the working group - trust areas, security metrics, and common metrics.

The working group discussions will be at the Secret Collateral level. If you would like to participate, you must contact Robert Habbit immediately to arrange for clearance transfer. When you arrive at Sandia on Monday, go first to the badge office and get a cleared badge, then meet at the CSRI building for further directions. See these directions.

B. Fault Tolerance

(First Day: Mon, May 24, 7 pm):

Working Group Chair: Larry Bergman (JPL)
Working Group Co-Chair: Marti Bancroft (MBC)

The general goal of the Fault Tolerant Working Group will be to assess the current status of fault tolerance / management system technologies (both hardware and software) for space borne computing, future emerging mission needs / requirements, and specific problems and solutions that require further development for multicore space borne avionics.

C. Memory Systems

(First Day: Mon, May 24, 7 pm):

TIMES MAY CHANGE Advances in space system sensors and science instruments have produced unprecedented quantities of digital data which much be stored and then returned to ground stations. Options for digital storage include large SRAMs, DRAMS, and non-volatile memory. This working group will build upon the discussions at Space Computing 2009 and evaluate the state-of-the-art in commercial and radiation hardened memory systems. Gaps will be identified and proposed solutions will be explored.

Outcome: Spider chart (radar chart)

Working Group Chair: AJ Kleinosowski

Monday Night Schedule

B: Fault ToleranceC. Memory (no schedule provided)
7 PM Overview and Goals of Working Group (Larry Bergman/JPL)
7:20 PM Emerging Mission System Fault Tolerance Requirements (Rob Lampereur, Ball Aerospace)
7:40 PM Fault Tolerant Hypervisor for Multicore (Mike Deliman, WindRiver)
8:00 PM Fault Tolerance Principles for Spaceborne Computers: Historical Perspective and Future Implications in Multicore Era. (David Rennels, JPL/UCLA)
8:20 PM Discussion: (1) Identify low hanging fruit solutions, past or present, that can be applied to multicore systems, and (2) Identify remaining unsolved problems that need still need to be addressed with further research (ALL)
9:00 PM Closeout

Tuesday Working Groups

D. Software

(Second Day: Tue, May 25, 7 pm):

Sponsor: Jet Propulsion Lab


  • Marti Bancroft , MBC (contractor, OGA)
  • Dr. Larry Bergman, JPL
Additional Workshop Organizers:
  • Tim Gallagher, Lockheed Martin Space Systems
  • Rob Lamprereur, Ball Aerospace

Last year’s software working group covered the role of software in fault tolerance from the hypervisor (very close to the hardware) through all the layers of the software stack. Although this broad approach identified a number of opportunities for collaboration, it appeared that an area where collaboration and possibly joint research investment would have a large return on investment is the area of Application Based Fault Tolerance (ABFT).

Although the initial view by some was that this area was so application specific that there were few areas of potential collaboration, further discussions revealed that some collaboration opportunity in compiler support for ABFT and in tools might exist.

Therefore the focus of this working group will be to ensure consistent definitions of ABFT and to specifically identify components (in addition to the application) that serve to enable it and areas were those might of common use across missions. The goal is to identify areas where joint funding could benefit both the state of the art and multiple missions.

ABFT can benefit from operating system/hypervisor features, as well as middle ware, advance complier technology, libraries, novel approaches to rollback and replay (as compared with entire application checkpoint and restart, mostly not practical in time critical systems).

How these approaches and others combine to help ensure fault tolerant space borne computing, and where there are gaps that collaborative work could fill, will be the discussion themes of this workshop.

The following presenters will help ensure lively discussions:

  • Tim Gallagher -- introductory remarks including lessons learned from prior work
  • Rob Lamprereur -- introductory remarks including lessons learned from prior work
  • Kim Gostelow (JPL) -- "An Execution Model for Multicore: Fault Tolerance and Parallelism"
  • Mike Deliman (Wind River) -- "The Role of the Operating System and Hypervisor in Multi/Many Core ABFT -- Status and Potential Future Directions." Discussion will include possibilities of the hypervisor providing features for ram-scrubbing and/or core-scrubbing and how a type-1 hypervisor can aid the application by containing faults, preventing propagation, etc.
  • Ian Troxel and Steve Crago -- "Hardware vs Software support for ABFT: Two Views, One Goal"
Others who have a few slides to present are welcome -- please contact Marti at . Emailing slides in advance is best, but CD-R, DVD, or USB drives can (we hope!) be accommodated at the hotel.

Most of the working group time will be devoted to open discussions, with the goal to identify at least one area for potential collaboration on things that enable ABFT for multiple (ideally diverse) applications.

E. Bus/Network/Interconnection

(Second Day: Tue, May 25, 7 pm):

NOTE: As of the preparation time of this document, the organizer has notified us he will not be able to travel. Attendees with interest please show up and the organizers will figure something out.

Next generation space missions, incorporating high data rate sensors and high performance onboard computing systems, will require concomitantly high bandwidth avionics data networks. These networks must also support real time performance, guaranteed quality of service, highly reliable operation, and fault tolerance. Low power operation and flexibility of the network to meet widely varying mission requirements are highly desirable, if not required. In working group, we will:

  • Develop a roadmap of future mission interconnect needs
  • Review current spacecraft avionics networks development projects
  • Develop a recommendations for the development of future networks
The intent of this work is to identify those areas needing additional development and to provide a recommended course of action to stake holders and funding authorities.

Tuesday Night Schedule

D: SoftwareE. Interconnect (no schedule provided)
7 PM Organizational
7:30 PM Tim Gallagher, Rob Lamprereur introductory remarks
8 PM Kim Gostelow, An Execution Model for Multicore: Fault Tolerance and Parallelism
8:30 PM Mike Deliman, The Role of the Operating System and Hypervisor in Multi/Many Core ABFT -- Status and Potential Future Directions
9 PM Ian Troxel and Steve Crago, Hardware vs Software support for ABFT: Two Views, One Goal
9:30 PM Closeout

Thursday Working Groups

F. Architecture

This working group will be held Thursday morning and will have a open discussed on middleware standards for the space borne computing community with a emphasis on sensor and science instrument payloads.

Thursday Morning Schedule

F: Architecture
8:30 AM Richard Stempien: Organizational
8:40 AM
9:10 AM
9:40 AM
10 AM Break
10:20 AM
10:50 AM
11:30 AM Summary
Document date May 26, 2010.