The Second Workshop on...
Fault-Tolerant Spaceborne Computing Employing New Technologies, 2009

SCHEDULE

Location guide: Tuesday's activities and the dinner on Wednesday are at the Hotel Albuquerque as noted. The main meeting place for the conference is at the CSRI building near Sandia labs. The high-level sessions will be elsewhere at Sandia, but meet at the CSRI building.
Tuesday
May 26
Wednesday
May 27
Thursday
May 28
Friday
May 29
8:00 Breakfast
8:30 AM Registration and Breakfast Breakfast Technical working group meetings (Friday morning schedule)
E: Trust
F: Architecture
9:00 AM Organizers, Workshop Process
Mitch McCrory, Trust group
Bob Hodson, NASA LARC, Fault Tolerant Computing for Exploration
9:15 AM Rick Ridgley, Government/DOD keynote
9:30 AM Daniel Dvorak, JPL, Managing Complexity through Architecture
10 AM Break Break Break
10:30 AM Richard Berger, BAE, Interoperability of Standard Interfaces Within a Spaceborne Computer Ken Prager, Raytheon, Monarch High Performance Processing for Space Technical working group meetings (Friday morning schedule)
E: Trust
F: Architecture
11 AM John Samson, Honeywell, Dependable Multiprocessor Architecture for Space Applications (time tentative) Lt. John DeMello, AFRL, Warfighter Requirements and the Need for On-Board Processing
11:30 AM Registration (Hotel Albuquerque) Gayle Thayer, Sandia, MISSE-7 Materials International Space Station Experiemnt Ian Troxel, SEAKR, Concepts for Plug and Play Spacecraft Bus Payload Architectures (Govt FFRDC SETA audience only)
NoonLunch (on your own) Lunch Lunch Closeout
1 PM Technical working group meetings (Hotel Albuquerque) (Tuesday afternoon schedule)
A:IP
B:Fault
Warren Snapp, Boeing, RHBD Working group debriefs and discussion:
15 minutes IP
15 minutes Memory
15 minutes Fault
15 minutes Software
30 minutes discussion
1:30 PM Heather Quinn, LANL, Reliability modeling, case study and results
2 PM AJ Kleinosowski, Boeing, Maestro
2:30 PM Hans Zima, JPL, Towards High Productivity Languages for Reliable Flight Computing Break and Transportation
3 PM Break Break
3:30 PM Technical working group meetings (Hotel Albuquerque) (Tuesday afternoon schedule)
A:IP
B:Fault
Mike Deliman, Wind River, SMP/AMP multicore enhancements, hypervisor, Disruption Tolerant Networking, Deep Impact probe, and InterPlanetary Internet Meeting in building 810 room 1102
4 PM Duncan Crawford, Raytheon
4:50 PM Erik DeBenedictis, Sandia
5 PM Richard Doyle, JPL
5:20 Andrew Keys, NASA
5:30 30 minute discussion
4 PM Richard Lethin, Reservoir Labs, Auto Kernel Mapping to Spaceborne Computer Architectures using R-Stream Compiler and Results
4:30 PM Rob Lampereur, Ball Aerospace, Kepler Fault Protection and Its Applicability to Multi-Core
5:00 PMReception (Hotel Albuquerque)
6:00 PMDinner and dinner speaker Peter Kogge on DARPA Exascale report (Hotel Albuquerque)
7:00 PMTechnical working group meetings (Hotel Albuquerque) (Tuesday night schedule)
C: Software
D: Memory

Tuesday Working Groups

A. PLUG & PLAY IP

(First Day: Tue, May 26, 1-5pm):

This is a working group on IP for space applications. ("IP" is used here as a special term of art referring specifically to chip layout macros for ASICs and logic for FPGAs written in a hardware description languages.) The thesis of the working group is that the industry produces specialized IP either in because of its space-relevant function (e. g. SAR beamforming) or because of accommodation to the space environment (radiation hard, fault tolerant). The industry would benefit by the ability to use and reuse this IP for multiple applications. This requires such activities as organizing the industry's inventory of IP so potential users can know what is available and also constructing the IP with common interfaces so that IP designed for different purposes can be combined without a large amount of custom interface circuitry and redesign.

This group will recognize that IP for FPGAs and ASICs follow the same general principles but is usually incompatible in specific instances.

This group is seeking to have discussion leading to some initial steps. The group is not expecting to produce a comprehensive architecture, but may instead produce recommendations of incremental, voluntary steps that would tend to improve efficiency or enable more sophisticated missions.

B. FAULT TOLERANCE

Fault Tolerance Workshop (First Day: Tue, May 26, 1-5pm)
  1. ISHM (Integrated Systems Health and Management) - hardware and lower level software focus (definition of lower level software focus: identification and reporting of errors up to the component level, component being processor, I/O channel, memory, ...)
    • Modes of system failure
      • Hard (radiation total dose, thermal, total component failure)
      • Soft (e.g., SEU, noise)
    • System level fault tolerance system mitigation methods:
      • Diagnostics (design & execution)
      • Prognostics (design & execution)
      • State based designs (design)
      • Operational methods (design & executional)
  2. Hardware fault-tolerance advances:
    • Discussion of experiments in sending COTS directly to space
    • RHBD update
    • RHBP update
    • Alternatives to TMR - options and status
    • Prospects for fault tolerant (rad hard) storage - update
    • Fault tolerance for faster interconnects - could include advances plus any updates from commercial lessons learned (such as separation of traffic types, protocols, etc)
  3. Calculating reliability with new methods, new technology:
    • Traditional calculations force traditional methods of fault tolerance
    • What can the group do (collectively) to ensure alternate calculations are vetted and accepted so that new technology and approaches can be used on upcoming acquisitions?
  4. Possible Standards for Fault Tolerance/Fault Management (?):
    • Software service layers
    • Common frameworks and libraries for reuse
    • Operating system enhancements
    • Redundant memory systems / architechures (e.g., PIM)

Tuesday Afternoon Schedule

A: IP B: Fault
1 PM Organizational Organizational
1:30 PM Warren Snapp/AJ Kleinosowski, Boeing Hans Zima, JPL, Introspection-based adaptive fault tolerance
2 PM Mike Malone, Draper Labs Rob Lampereur, Ball Aerospace, Kepler Fault Protection Approach and Extensions to Multi-Core
2:30 PM Yutao He, JPL Mitch Fletcher, Honeywell, Emerging Fault Protection Strategies
3 PM Break Break
3:30 PM TBD TBD, Aeroflex, Unconfirmed
4 PM Jeff Kalb, Sandia Discussion session
4:30 PM Closeout Closeout

C. SOFTWARE TOPICS

Multicore Software Workshop (First Day: Tue, May 26, 7 pm):
  1. ISHM (Integrated Systems Health and Management) - architecture and higher level software focus (definition of higher level software focus - component reports errors, what to do about them, detecting and doing something about performance anomalies though no errors have been reported, software architecture for ISHM, planning for dynamic "plug and play" of both H/W and S/W components).
    • Diagnostics
    • Prognostics
    • State based designs
    • Operational methods
  2. Software architecture and fault tolerance: the role of the operating system in the multcore age
    • Traditionally a real time OS has been required to guarantee predictable behavior
    • Does multicore change that paradigm?
    • In general, from a systems perspective, what resources need to be assured and how if the goal is predictable system behavior
    • What about separation of control and data plane (also called mission control vs mission science)?
  3. Software architecture and multi-mission surge:
    • Methods to separate and bound different missions using multi-core
    • Methods to accommodate surges and restore to "normal" state taking advantage of multicore options, these could include (but are not limited to):
      • Partial reconfiguration of FPGAs
      • Changes in programs on multicore
      • Changes in partitions on multicore
      • Activating idles cores and then re-snoozing them
      • Operating system support
  4. Security as a fault tolerance component
  5. Techniques for managing software complexity and simplifying V&V
    • multi-core ARINC for isolating and simplifying resource management
    • combining system engineering and software engineering principles
  6. Inter Government software standards for enhancing reuse for spacebone multicore software, e.g., common:
    • Fault protection avionics functions (including ISHM)
    • Basic power management
    • Telecom
    • Resource management

D. MEMORY

(First Day: Tue, May 26, 7 pm):

There is a memory technology working group organizing as this schedule is being put together. The organizers are Rich Murphy, Sandia, Jeff Draper, ISI, Rafi Some, JPL, Andrew Keys, NASA, Larry Bergman, JPL.

The topics being discussed (via extraction of emails) are:

  • Ground vs. Space requirements for DRAM
  • Non-volatile memory (the usual suspects)
  • Potential integration possibilities -- 3D integration, Quilt
  • Packaging, other advanced packaging
  • Improving radiation tolerance for commodity components
  • Perhaps a topic of technology scaling effects on radiation responses of memories would be good. For instance, in our 90nm testing we found that a single strike could upset as many as 13 physically adjacent SRAM bits. That number will grow as SRAM cells shrink in size, which has significant implications for spacing, interleaving, and ECC strength needed to achieve a target BER.
  • AFRL has three different contractors developing three different approaches for "Rad Hard DRAM"; Boise State is doing some work on a new form of spintronics for memories that would also be rad hard, and NASA has done some testing of the Sasmsung commercial PCRAM. If you'd like we can include some of these folks, and/or others of similar ilk, to get a flavor for what is possible and where we are wrt developing high density, low power, low mass/volume, memories (volatile and non). Clearly, we can also include a summary of recent testing of specific memory parts (culled from recent reports/papers).
Note: Working group time assignments are currently arbitrary. Subject to future scheduling, working groups could be moved between Tuesday and Friday.

Tuesday Night Schedule

C: Software D: Memory
7 PM Organizational Memory workshop schedule TBD
7:30 PM name tbr - Wind River: Role of the Operating System and Hypervisor in Fault Tolerant Multicore
8 PM Hans Zima, JPL: Towards High Productivity Languages for Reliable Flight Computing
8:30 PM Bill Lundgren, Gedae: Software-Enabled Systems Engineering Facilitates Certification of New Platforms
9 PM Kirk Reinholtz, JPL: Mission Data System Break
9:30 PM Closeout Closeout

Friday Working Groups

E. TRUST AND SPACEBORNE COMPUTING -- WORKSHOP

Information Technology is a global commodity and utilized in significant aspects of almost all new systems at multiple levels including all new foreseeable space systems. Engineering designs of new systems rely on a significant amount of COTS hardware and software. Opportunities exist for the loss of trust in both hardware and software at multiple stages of the design, manufacturing, deployment, and operations lifecycle. Additionally, the ability of information systems used to create, design, and store system information have been challenged to keep sensitive information secure from those seeking illicit access to that information. There have been recent and recurring headlines of companies and government agencies losing information from sensitive programs. This workshop is focused on discussing various threats to spaceborne systems and areas where trust could be improved during the lifecycle. Examples of a Red Team exercise on a simple spaceborne system will be used a discussion points.

Attendance is restricted. Contact erikdebenedictis@sandia.gov for more information. Will meet in 729/247; meet at CSRI for transportation if needed.

F. ARCHITECTURE THEME AND WORKING GROUP

The architecture theme will be applicable to both next generation satellites and manned space platforms in terms of sensor payload processing. The theme will comprise presentations in the body of the meeting, possible dialog in the closed Government session, and the agenda for the architecture working group meeting on Friday. Collectively, the architecture theme will ask the following questions and try to develop answers as a group:
  1. Does it make sense and is it reasonable to develop a Common Space borne Supercomputer Architecture for the U.S space community (NASA, DoD, DoE, IC)?
  2. What computer architecture makes sense and what is the right set of hardware and software building blocks?
  3. What technologies can we leverage from other communities such as the HPC community and the airborne embedded computing community?
To carry out the architecture theme, the meeting makes the following requests of participants:
  1. Government agency representatives are requested to consider a plenary presentation (i. e. Wednesday or Thursday) on requirements in their agency that would benefit by a common architecture. The organizers will solicit Government agency representatives, but others not contacted may volunteer and will be scheduled for a speaking spot subject to available space.
  2. Commercial, university, FFRDC, and Government participants are requested to consider a presentation on hardware and software building blocks that have been created in their organization that they feel could contribute to a common architecture. They may similarly offer methods of organizing building blocks into an architecture. The organizers will solicit speakers, but those not contacted may volunteer and will be scheduled to a plenary slot or the architecture working group subject to avialable space.
  3. The closed Government session may address the archtecture theme. Government, FFRDC, and SETA participants with ideas may contact the organizers.

Friday Morning Schedule

E: Trust F: Architecture
8:30 AM The Trust working group will be held in building 729 room 247. Contact erikdebenedictis@sandia.gov for more information. Meet at the CSRI building. Richard Stempien: Organizational
8:40 AM Mitch Fletcher, Honeywell, Progression of an OPEN ARCHITECTURE from Orion to Altair & LSS - A concept expandable to general purpose space processing. (time tentative)
9:10 AM Ian Troxel, SEAKR, Leveraging COTS to Develop Standard Space Architectures
9:40 AM Ken Hunt, AFRL, Computing: One of Several Trades for AFRL Space Electronics R&D
10 AM Break
10:20 AM TBD, Sandia, TBD
10:50 AM Review Plenaries two viewgraphs per plenary speaker, Duncan Crawford, Ken Prager, John Samson, Richard Berger, Richard Lethin
11:30 AM Summary
Document date May 31, 2009.