Controller Learning using Bayesian Optimization

  A graphic explaining Controller Learning using Bayesian Optimization

Left: Humanoid robot Apollo learning to balance an inverted pole using Bayesian optimization. Right: One-dimensional synthetic example of an unknown cost J(θ) modeled as a Gaussian process for controller parameter θ, conditioned on observed data points.  


Autonomous systems such as humanoid robots are characterized by a multitude of feedback control loops operating at different hierarchical levels and time-scales. Designing and tuning these controllers typically requires significant manual modeling and design effort and exhaustive experimental testing. For managing the ever greater complexity and striving for greater autonomy, it is desirable to tailor intelligent algorithms that allow autonomous systems to learn from experimental data. In our research, we leverage automatic control theory, machine learning, and optimization to develop automatic control design and tuning algorithms.

The basic framework as proposed in our previous work
ICRA 2016 at the Max Planck Institut for Intelligent Systems, MPI-IS for short, is based on Bayesian Optimization, BO from here on. In this framework, we model the control objective as a Gaussian process or GP for short (see above figure) and evaluate the controller in experiments. Bayesian Optimization sequentially suggests new experiments to gather more information about the cost function. The overall goal is to find good or even optimal parameters in as few experiments as possible by deciding on experiments that are informative.

At the MPI-IS we have extended this framework in different directions to further improve data efficiency. When auto-tuning real complex systems, like humanoid robots, simulations of the system dynamics are typically available. They provide less accurate information than real experiments, but at a cheaper cost. Our work
ICRA 2017 extends the BO acquisition function to include the simulator as an additional information source and automatically trade off information vs. cost. In our paper CDC 2017 , the covariance function of the GP model is tailored to the control problem at hand by incorporating its mathematical structure into the kernel design. In this way, the control objective can be modeled more accurately. This ultimately speeds up the convergence of the Bayesian optimizer.

Bayesian optimization provides a powerful framework for controller learning, which we have successfully applied on very different settings: for humanoid robots in
ICRA 2016 , for micro robots in IROS 2018 and for the automotive industry TCST 2020 .

We have recently extended this framework to include failures, i.e. unstable controllers, in the objective function in our paper
arXiv 2019 and to deal with a limited budget of constraint violations during the optimization in arXiv 2020.

Current research directions include combining BO and gradient based optimizers and event triggered BO in dynamic environments.