Learning Safety Constraints and Safe Learning

  A toy model with a 2-dimensional state-action space Copyright: © Steve Heim / MPI-IS
 

To learn control directly on robot hardware, it is important to encode safety constraints into the learning behavior.
Although failures are often easy to classify failures (e.g. a legged robot has fallen if the body touches the ground), the actual state-space constraint (e.g. the robot is stumbling and can no longer catch itself) is typically difficult to compute.
Consequently, safe learning algorithms often use very conservative approximations of the constraint function, which limits performance.
We have formalized sharp constraints in our paper CoRL 2019 using viability theory, and shown that we can learn this constraint in a model-free setting by making use of a measure taken over the set of viable state-action pairs. While safety can only be guaranteed after learning of the constraint has converged, failures can already be greatly reduced during learning by using the constraint estimate.
We are currently extending this theoretical work with a focus on applicability.
The main open challenges we are tackling:

  • How can we improve sample efficiency by trading off between cost and constraint information
  • How can failures be futher reduced during learning using model-based prior knowledge
  • What system-specific knowledge is required in practice, with legged-locomotion as a case study.