Deep Stable neural networks: large-width asymptotics and convergence rates Stefano Favaro∗1, Sandra Fortini†2, and Stefano Peluchetti‡3 1Department of Economics and Statistics, University of Torino and Collegio Carlo Alberto, Italy 2Department of Decision Sciences, Bocconi University, Italy 3Cogent Labs, Tokyo, Japan June 27, 2022 Abstract In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotic properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed weights, and Gaussian stochastic processes (SPs). Such an interplay has proved to be critical in Bayesian inference under Gaussian SP priors, kernel regression for infinitely wide deep NNs trained via gradient descent, and information propagation within infinitely wide NNs. Motivated by empirical analyses that show the potential of replacing Gaussian distributions with Stable distributions for the NN’s weights, in this paper we present a rigorous analysis of the large-width asymptotic behaviour of (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stable-distributed weights. We show that as the width goes to infinity jointly over the NN’s layers, i.e. the “joint growth” setting, a rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized recursively through the NN’s layers. Because of the non-triangular structure of the NN, this is a non-standard asymptotic problem, to which we propose an inductive approach of independent interest. Then, we establish sup-norm convergence rates of the rescaled deep Stable NN to the Stable SP, under the “joint growth” and a “sequential growth” of the width over the NN’s layers. Such a result provides the difference between the “joint growth” and the “sequential growth” settings, showing that the former leads to a slower rate than the latter, depending on the depth of the layer and the number of inputs of the NN. Our work extends some recent results on infinitely wide limits for deep Gaussian NNs to the more general deep Stable NNs, providing the first result on convergence rates in the “joint growth” setting.

Deep Stable neural networks: large-width asymptotics and convergence rates

Stefano Favaro;
2023-01-01

Abstract

Deep Stable neural networks: large-width asymptotics and convergence rates Stefano Favaro∗1, Sandra Fortini†2, and Stefano Peluchetti‡3 1Department of Economics and Statistics, University of Torino and Collegio Carlo Alberto, Italy 2Department of Decision Sciences, Bocconi University, Italy 3Cogent Labs, Tokyo, Japan June 27, 2022 Abstract In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotic properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed weights, and Gaussian stochastic processes (SPs). Such an interplay has proved to be critical in Bayesian inference under Gaussian SP priors, kernel regression for infinitely wide deep NNs trained via gradient descent, and information propagation within infinitely wide NNs. Motivated by empirical analyses that show the potential of replacing Gaussian distributions with Stable distributions for the NN’s weights, in this paper we present a rigorous analysis of the large-width asymptotic behaviour of (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stable-distributed weights. We show that as the width goes to infinity jointly over the NN’s layers, i.e. the “joint growth” setting, a rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized recursively through the NN’s layers. Because of the non-triangular structure of the NN, this is a non-standard asymptotic problem, to which we propose an inductive approach of independent interest. Then, we establish sup-norm convergence rates of the rescaled deep Stable NN to the Stable SP, under the “joint growth” and a “sequential growth” of the width over the NN’s layers. Such a result provides the difference between the “joint growth” and the “sequential growth” settings, showing that the former leads to a slower rate than the latter, depending on the depth of the layer and the number of inputs of the NN. Our work extends some recent results on infinitely wide limits for deep Gaussian NNs to the more general deep Stable NNs, providing the first result on convergence rates in the “joint growth” setting.
2023
29
2574
2597
Bayesian inference, deep neural network, depth limit, exchangeable sequence, Gaussian stochastic process, neural tangent kernel, infinitely wide limit, Stable stochastic process, spectral measure, sup-norm convergence rate
Stefano Favaro; Sandra Fortini; Stefano Peluchetti
File in questo prodotto:
File Dimensione Formato  
2108.02316v2.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 1.52 MB
Formato Adobe PDF
1.52 MB Adobe PDF Visualizza/Apri
22-BEJ1553.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 341.85 kB
Formato Adobe PDF
341.85 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2042310
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 4
social impact