Global optimality of Elman-type RNNs in the mean-field regime
A. Agazzi, J. Lu, and S. Mukherjee. Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, page 196--227. PMLR, (23--29 Jul 2023)
Abstract
We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.
%0 Conference Paper
%1 pmlr-v202-agazzi23a
%A Agazzi, Andrea
%A Lu, Jianfeng
%A Mukherjee, Sayan
%B Proceedings of the 40th International Conference on Machine Learning
%D 2023
%E Krause, Andreas
%E Brunskill, Emma
%E Cho, Kyunghyun
%E Engelhardt, Barbara
%E Sabato, Sivan
%E Scarlett, Jonathan
%I PMLR
%K topic_mathfoundation Elman-type Optimality, mean-field regime {RNN}s,
%P 196--227
%T Global optimality of Elman-type RNNs in the mean-field regime
%U https://proceedings.mlr.press/v202/agazzi23a.html
%V 202
%X We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.
@inproceedings{pmlr-v202-agazzi23a,
abstract = {We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.},
added-at = {2024-11-12T11:59:42.000+0100},
author = {Agazzi, Andrea and Lu, Jianfeng and Mukherjee, Sayan},
biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/29f717b21aa9a7c6135b886650bd5aef8/scadsfct},
booktitle = {Proceedings of the 40th International Conference on Machine Learning},
editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
interhash = {8c957a7614dc6abef74dcef42eaba307},
intrahash = {9f717b21aa9a7c6135b886650bd5aef8},
keywords = {topic_mathfoundation Elman-type Optimality, mean-field regime {RNN}s,},
month = {23--29 Jul},
pages = {196--227},
pdf = {https://proceedings.mlr.press/v202/agazzi23a/agazzi23a.pdf},
publisher = {PMLR},
series = {Proceedings of Machine Learning Research},
timestamp = {2024-11-22T15:49:10.000+0100},
title = {Global optimality of Elman-type {RNN}s in the mean-field regime},
url = {https://proceedings.mlr.press/v202/agazzi23a.html},
volume = 202,
year = 2023
}