The multi-index model is a simple yet powerful high-dimensional regression model which circumvents the curse of dimensionality assuming E[Y vertical bar X] = g(A(inverted perpendicular) X) for some unknown index space A and link function g. In this paper we introduce a method for the estimation of the index space, and study the propagation error of an index space estimate in the regression of the link function. The proposed method approximates the index space by the span of linear regression slope coefficients computed over level sets of the data. Being based on ordinary least squares, our approach is easy to implement and computationally efficient. We prove a tight concentration bound that shows N -1/2- convergence, but also faithfully describes the dependence on the chosen partition of level sets, hence providing guidance on the hyperparameter tuning. The estimator's competitiveness is confirmed by extensive comparisons with state-of-the-art methods, both on synthetic and real data sets. As a second contribution, we establish minimax optimal generalization bounds for k-nearest neighbors and piecewise polynomial regression when trained on samples projected onto any N -1/2-consistent estimate of the index space, thus providing complete and provable estimation of the multi-index model.

Estimating multi-index models with response-conditional least squares

Lanteri, A;
2021-01-01

Abstract

The multi-index model is a simple yet powerful high-dimensional regression model which circumvents the curse of dimensionality assuming E[Y vertical bar X] = g(A(inverted perpendicular) X) for some unknown index space A and link function g. In this paper we introduce a method for the estimation of the index space, and study the propagation error of an index space estimate in the regression of the link function. The proposed method approximates the index space by the span of linear regression slope coefficients computed over level sets of the data. Being based on ordinary least squares, our approach is easy to implement and computationally efficient. We prove a tight concentration bound that shows N -1/2- convergence, but also faithfully describes the dependence on the chosen partition of level sets, hence providing guidance on the hyperparameter tuning. The estimator's competitiveness is confirmed by extensive comparisons with state-of-the-art methods, both on synthetic and real data sets. As a second contribution, we establish minimax optimal generalization bounds for k-nearest neighbors and piecewise polynomial regression when trained on samples projected onto any N -1/2-consistent estimate of the index space, thus providing complete and provable estimation of the multi-index model.
2021
15
1
589
629
Multi-index model; sufficient dimension reduction; nonparametric regression; finite sample bounds
Klock, T; Lanteri, A; Vigogna, S
File in questo prodotto:
File Dimensione Formato  
euclid.ejs.1611046876.pdf

Accesso riservato

Descrizione: EJS2021_RCLS
Tipo di file: PDF EDITORIALE
Dimensione 1.91 MB
Formato Adobe PDF
1.91 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1936470
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact