The Second Workshop on... Fault-Tolerant Spaceborne Computing Employing New Technologies, 2009
SCHEDULE
Location guide: Tuesday's activities and the dinner on Wednesday are at the Hotel Albuquerque as noted.
The main meeting place for the conference is at the CSRI building near Sandia labs.
The high-level sessions will be elsewhere at Sandia, but meet at the CSRI building.
|
Tuesday May 26 |
Wednesday May 27 |
Thursday May 28 |
Friday May 29 |
8:00 |
| |
|
Breakfast |
8:30 AM |
| Registration and Breakfast |
Breakfast |
Technical working group meetings (Friday morning schedule) E: Trust F: Architecture |
9:00 AM |
| Organizers, Workshop Process Mitch McCrory, Trust group |
Bob Hodson, NASA LARC, Fault Tolerant Computing for Exploration |
9:15 AM |
| Rick Ridgley, Government/DOD keynote |
9:30 AM |
|
Daniel Dvorak, JPL, Managing Complexity through Architecture |
10 AM |
| Break |
Break |
Break |
10:30 AM |
| Richard Berger, BAE, Interoperability of Standard Interfaces Within a Spaceborne Computer |
Ken Prager, Raytheon, Monarch High Performance Processing for Space |
Technical working group meetings (Friday morning schedule) E: Trust F: Architecture |
11 AM |
|
John Samson, Honeywell, Dependable Multiprocessor Architecture for Space Applications (time tentative) |
Lt. John DeMello, AFRL, Warfighter Requirements and the Need for On-Board Processing |
11:30 AM |
Registration (Hotel Albuquerque) |
Gayle Thayer, Sandia, MISSE-7 Materials International Space Station Experiemnt |
Ian Troxel, SEAKR, Concepts for Plug and Play Spacecraft Bus Payload Architectures (Govt FFRDC SETA audience only) |
Noon | Lunch (on your own) |
Lunch |
Lunch |
Closeout |
1 PM |
Technical working group meetings (Hotel Albuquerque) (Tuesday afternoon schedule) A:IP B:Fault
|
Warren Snapp, Boeing, RHBD |
Working group debriefs and discussion:
15 minutes IP
15 minutes Memory
15 minutes Fault
15 minutes Software
30 minutes discussion
|
|
1:30 PM |
Heather Quinn, LANL, Reliability modeling, case study and results |
|
2 PM |
AJ Kleinosowski, Boeing, Maestro |
|
2:30 PM |
Hans Zima, JPL, Towards High Productivity Languages for Reliable Flight Computing |
Break and Transportation |
|
3 PM |
Break |
Break |
3:30 PM |
Technical working group meetings (Hotel Albuquerque) (Tuesday afternoon schedule) A:IP B:Fault |
Mike Deliman, Wind River, SMP/AMP multicore enhancements, hypervisor, Disruption Tolerant Networking, Deep Impact probe, and InterPlanetary Internet |
Meeting in building 810 room 1102
4 PM Duncan Crawford, Raytheon
4:50 PM Erik DeBenedictis, Sandia
5 PM Richard Doyle, JPL
5:20 Andrew Keys, NASA
5:30 30 minute discussion |
|
4 PM |
Richard Lethin, Reservoir Labs, Auto Kernel Mapping to Spaceborne Computer Architectures using R-Stream Compiler and Results |
|
4:30 PM |
Rob Lampereur, Ball Aerospace, Kepler Fault Protection and Its Applicability to Multi-Core |
|
5:00 PM | Reception (Hotel Albuquerque) | | |
6:00 PM | Dinner and dinner speaker Peter Kogge on DARPA Exascale report (Hotel Albuquerque) | | |
7:00 PM | Technical working group meetings (Hotel Albuquerque) (Tuesday night schedule) C: Software D: Memory | | |
Tuesday Working Groups
(First Day: Tue, May 26, 1-5pm):
This is a working group on IP for space applications. ("IP" is used here as a special term of art referring specifically to
chip layout macros for ASICs and logic for FPGAs written in a hardware description languages.) The thesis of the working group is that
the industry produces specialized IP either in because of its space-relevant function (e. g. SAR beamforming) or because of accommodation
to the space environment (radiation hard, fault tolerant).
The industry would benefit by the ability to use and reuse this IP for multiple applications. This requires such activities as organizing
the industry's inventory of IP so potential users can know what is available and also constructing the IP with common interfaces so that
IP designed for different purposes can be combined without a large amount of custom interface circuitry and redesign.
This group will recognize that IP for FPGAs and ASICs follow the same general principles but is usually incompatible in specific instances.
This group is seeking to have discussion leading to some initial steps. The group is not expecting to produce a comprehensive architecture, but
may instead produce recommendations of incremental, voluntary steps that would tend to improve efficiency or enable more sophisticated missions.
Fault Tolerance Workshop (First Day: Tue, May 26, 1-5pm)
- ISHM (Integrated Systems Health and Management) - hardware and lower level software focus (definition of lower level software focus: identification and reporting of errors up to the component level, component being processor, I/O channel, memory, ...)
- Modes of system failure
- Hard (radiation total dose, thermal, total component failure)
- Soft (e.g., SEU, noise)
- System level fault tolerance system mitigation methods:
- Diagnostics (design & execution)
- Prognostics (design & execution)
- State based designs (design)
- Operational methods (design & executional)
- Hardware fault-tolerance advances:
- Discussion of experiments in sending COTS directly to space
- RHBD update
- RHBP update
- Alternatives to TMR - options and status
- Prospects for fault tolerant (rad hard) storage - update
- Fault tolerance for faster interconnects - could include advances plus any updates from commercial lessons learned (such as separation of traffic types, protocols, etc)
- Calculating reliability with new methods, new technology:
- Traditional calculations force traditional methods of fault tolerance
- What can the group do (collectively) to ensure alternate calculations are vetted and accepted so that new technology and approaches can be used on upcoming acquisitions?
- Possible Standards for Fault Tolerance/Fault Management (?):
- Software service layers
- Common frameworks and libraries for reuse
- Operating system enhancements
- Redundant memory systems / architechures (e.g., PIM)
|
A: IP |
B: Fault |
1 PM |
Organizational |
Organizational |
|
1:30 PM |
Warren Snapp/AJ Kleinosowski, Boeing |
Hans Zima, JPL, Introspection-based adaptive fault tolerance |
2 PM |
Mike Malone, Draper Labs |
Rob Lampereur, Ball Aerospace, Kepler Fault Protection Approach and Extensions to Multi-Core |
|
2:30 PM |
Yutao He, JPL |
Mitch Fletcher, Honeywell, Emerging Fault Protection Strategies |
|
3 PM |
Break |
Break |
|
3:30 PM |
TBD |
TBD, Aeroflex, Unconfirmed |
|
4 PM |
Jeff Kalb, Sandia |
Discussion session |
|
4:30 PM |
Closeout |
Closeout |
|
Multicore Software Workshop (First Day: Tue, May 26, 7 pm):
- ISHM (Integrated Systems Health and Management) - architecture and higher level software focus (definition of higher level software focus - component reports errors, what to do about them, detecting and doing something about performance anomalies though no errors have been reported, software architecture for ISHM, planning for dynamic "plug and play" of both H/W and S/W components).
- Diagnostics
- Prognostics
- State based designs
- Operational methods
- Software architecture and fault tolerance: the role of the operating system in the multcore age
- Traditionally a real time OS has been required to guarantee predictable behavior
- Does multicore change that paradigm?
- In general, from a systems perspective, what resources need to be assured and how if the goal is predictable system behavior
- What about separation of control and data plane (also called mission control vs mission science)?
- Software architecture and multi-mission surge:
- Methods to separate and bound different missions using multi-core
- Methods to accommodate surges and restore to "normal" state taking advantage of multicore options, these could include (but are not limited to):
- Partial reconfiguration of FPGAs
- Changes in programs on multicore
- Changes in partitions on multicore
- Activating idles cores and then re-snoozing them
- Operating system support
- Security as a fault tolerance component
- Techniques for managing software complexity and simplifying V&V
- multi-core ARINC for isolating and simplifying resource management
- combining system engineering and software engineering principles
- Inter Government software standards for enhancing reuse for spacebone multicore software, e.g., common:
- Fault protection avionics functions (including ISHM)
- Basic power management
- Telecom
- Resource management
(First Day: Tue, May 26, 7 pm):
There is a memory technology working group organizing as this schedule is being put together. The organizers are
Rich Murphy, Sandia,
Jeff Draper, ISI,
Rafi Some, JPL,
Andrew Keys, NASA,
Larry Bergman, JPL.
The topics being discussed (via extraction of emails) are:
- Ground vs. Space requirements for DRAM
- Non-volatile memory (the usual suspects)
- Potential integration possibilities -- 3D integration, Quilt
- Packaging, other advanced packaging
- Improving radiation tolerance for commodity components
- Perhaps a topic of technology scaling effects
on radiation responses of memories would be good. For instance,
in our 90nm testing we found that a single strike could upset
as many as 13 physically adjacent SRAM bits. That number will grow
as SRAM cells shrink in size, which has significant implications for
spacing, interleaving, and ECC strength needed to achieve a
target BER.
- AFRL has three
different contractors developing three different approaches for "Rad
Hard DRAM"; Boise State is doing some work on a new
form of spintronics for memories that would also be rad hard, and NASA
has done some testing of the Sasmsung commercial PCRAM. If you'd
like we can include some of these folks, and/or others of similar
ilk, to get a flavor for what is possible and where we are wrt
developing high density, low power, low mass/volume, memories
(volatile and non). Clearly, we can also include a summary of recent
testing of specific memory parts (culled from recent reports/papers).
Note: Working group time assignments are currently arbitrary. Subject to future scheduling, working groups could be moved between Tuesday and Friday.
|
C: Software |
D: Memory |
7 PM |
Organizational |
Memory workshop schedule TBD |
|
7:30 PM |
name tbr - Wind River: Role of the Operating System and Hypervisor in Fault Tolerant Multicore |
8 PM |
Hans Zima, JPL: Towards High Productivity Languages for Reliable Flight Computing |
|
8:30 PM |
Bill Lundgren, Gedae: Software-Enabled Systems Engineering Facilitates Certification of New Platforms |
|
9 PM |
Kirk Reinholtz, JPL: Mission Data System |
Break |
|
9:30 PM |
Closeout |
Closeout |
|
Friday Working Groups
Information Technology is a global commodity and utilized in significant aspects of almost all new systems at multiple levels including all new foreseeable space systems. Engineering designs of new systems rely on a significant amount of COTS hardware and software. Opportunities exist for the loss of trust in both hardware and software at multiple stages of the design, manufacturing, deployment, and operations lifecycle. Additionally, the ability of information systems used to create, design, and store system information have been challenged to keep sensitive information secure from those seeking illicit access to that information. There have been recent and recurring headlines of companies and government agencies losing information from sensitive programs. This workshop is focused on discussing various threats to spaceborne systems and areas where trust could be improved during the lifecycle. Examples of a Red Team exercise on a simple spaceborne system will be used a discussion points.
Attendance is restricted. Contact erikdebenedictis@sandia.gov for more information. Will meet in 729/247; meet at CSRI for transportation if needed.
The architecture theme will be applicable to both next generation
satellites and manned space platforms in terms of sensor payload
processing. The theme will comprise presentations in the body of the
meeting, possible dialog in the closed Government session, and the
agenda for the architecture working group meeting on Friday.
Collectively, the architecture theme will ask the following questions
and try to develop answers as a group:
- Does it make sense and is it reasonable to develop a Common Space
borne Supercomputer Architecture for the U.S space community (NASA, DoD,
DoE, IC)?
- What computer architecture makes sense and what is the right set of
hardware and software building blocks?
- What technologies can we leverage from other communities such as the
HPC community and the airborne embedded computing community?
To carry out the architecture theme, the meeting makes the following
requests of participants:
- Government agency representatives are requested to consider a plenary
presentation (i. e. Wednesday or Thursday) on requirements in their
agency that would benefit by a common architecture. The organizers will
solicit Government agency representatives, but others not contacted may
volunteer and will be scheduled for a speaking spot subject to available
space.
- Commercial, university, FFRDC, and Government participants are
requested to consider a presentation on hardware and software building
blocks that have been created in their organization that they feel could
contribute to a common architecture. They may similarly offer methods of
organizing building blocks into an architecture. The organizers will
solicit speakers, but those not contacted may volunteer and will be
scheduled to a plenary slot or the architecture working group subject to
avialable space.
- The closed Government session may address the archtecture theme.
Government, FFRDC, and SETA participants with ideas may contact the
organizers.
|
E: Trust |
F: Architecture |
8:30 AM |
The Trust working group will be held in building 729 room 247. Contact erikdebenedictis@sandia.gov for more information. Meet at the CSRI building. |
Richard Stempien: Organizational |
|
8:40 AM |
Mitch Fletcher, Honeywell, Progression of an OPEN ARCHITECTURE from Orion to Altair & LSS - A concept expandable to general purpose space processing. (time tentative) |
9:10 AM |
Ian Troxel, SEAKR, Leveraging COTS to Develop Standard Space Architectures |
|
9:40 AM |
Ken Hunt, AFRL, Computing: One of Several Trades for AFRL Space Electronics R&D |
|
10 AM |
Break |
|
10:20 AM |
TBD, Sandia, TBD |
|
10:50 AM |
Review Plenaries two viewgraphs per plenary speaker, Duncan Crawford, Ken Prager, John Samson, Richard Berger, Richard Lethin |
|
11:30 AM |
Summary |
|
Document date May 31, 2009.
|