DSME Colloquium: Semih Cayci on the Convergence of Regularized Natural Actor-Critic with Neural Network Approximation

Monday, 8th of May 2023

On Monday, 8th of May 2023, at 10:00, we welcome Semih Cayci, Juniorprofessor of Mathematics at RWTH Aachen University, at the DSME Colloquium where he will give a talk on the "Convergence of Regularized Natural Actor-Critic with Neural Network Approximation", followed by an Q&A session. Please find the abstract of the talk and the biography of the speaker below.

This DSME Colloquium will be in a hybrid format, with the talk taking place at the DSME seminar room and streamed via Zoom. In order to receive the link to Zoom room, please subscribe to the DSME Colloquium mailing list dsme-colloquium@lists.rwth-aachen.de (by sending an email with subject "subscribe" to dsme-colloquium-join@lists.rwth-aachen.de). We will also announce future talks of the DSME Colloquium on this list.

You are welcome to attend the DSME Colloquium in person. If you do, please contact our office at office@dsme.rwth-aachen.de to register and receive directions.

Title

Convergence of Regularized Natural Actor-Critic with Neural Network Approximation

Abstract

Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving complicated reinforcement learning problems with large state spaces. In this talk, we present a finite-time analysis of NAC with neural network approximation in the kernel regime, and identify the roles of neural networks, regularization and optimization techniques (e.g., gradient clipping and averaging) to achieve provably good performance in terms of sample complexity, iteration complexity and overparametrization bounds for the actor and the critic. In particular, we prove that (i) entropy regularization and averaging ensure stability by providing sufficient exploration to avoid near-deterministic and strictly suboptimal policies and (ii) regularization leads to sharp sample complexity and network width bounds in the regularized Markov decision processes, yielding a favorable bias-variance tradeoff in policy optimization. In the process, we identify the importance of uniform approximation power of the actor neural network to achieve near-optimality in policy optimization due to distributional shift. 

Bio

Semih Cayci is a Juniorprofessor for Mathematics of Machine Learning in the Department of Mathematics at RWTH Aachen University. Previously, he was an NSF TRIPODS Postdoctoral Fellow at the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign. His research interests lie in machine learning, optimization and applied probability, with a focus on the mathematical foundations of deep learning and reinforcement learning.