Preference Appraisal Reinforcement Learning for Space Applications
Aleksandra Faust
University of New Mexico and Sandia

Space applications aim to autonomously process large amounts of data in unpredictable environments. The operations must be physically safe, yet space-based tasks are difficult to demonstrate. Manually or analytically deriving optimal system behavior is difficult. One solution has been to learn to automatically perform the near-optimal behavior that completes a task. One learning method in particular, reinforcement learning (RL), has proven highly successful in robotics at learning near-optimal action sequences through experimentation. However, high-dimensional problems often prove challenging for RL. We address the dimensionality constraint with PrEference Appraisal Reinforcement Learning (PEARL). PEARL solves a particular class of tasks, preference tasks, that are described with a set of opposing preferences (soft-constraints). PEARL works efficiently in high-dimensional spaces; it learns on small problems, generalizes the knowledge, and performs larger tasks with guaranteed convergence to the solution.

This talk presents PEARL and describes its applications for control of a UAV equipped with a suspended payload, rendezvous of two agents with different dynamics without predetermined meeting time and location, and multi-agent pursuit where a cooperative team of agents pursues a prey with no knowledge of its intentions. The last application is a computing problem, array sorting, where the same approach results in a resilient sorting agent, robust to unreliable components. Finally, we focus on the PEARL's future extensions and adaptations as an autonomous agent for anomaly detection of unobservable events using an example of detecting changes in deep see currents through observation of surface conditions.

Document date May 23, 2014.