Random forest (RF) is a well-known data-driven algorithm applied in several fields, thanks to its flexibility in modeling the relationship between the response variable and the predictors, also in case of strong non-linearities. In environmental applications, it often occurs that the phenomenon of interest may present spatial and/or temporal dependence that is not taken explicitly into account by RF in its standard version. In this work, we propose a taxonomy to classify strategies according to when (Pre-, In-, and/or Post-processing) they try to include the spatial information into regression RF. Moreover, we provide a systematic review and classify the most recent strategies adopted to “adjust” regression RF to spatially dependent data, based on the criteria provided by the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA). The latter consists of a reproducible methodology for collecting and processing existing literature on a specified topic from different sources. PRISMA starts with a query and ends with a set of scientific documents to review: we performed an online query on the 25th of October 2022, and in the end, 32 documents were considered for review. The employed methodological strategies and the application fields considered in the 32 scientific documents are described and discussed.

A path in regression Random Forest looking for spatial dependence: a taxonomy and a systematic review

Natalia Golini;Rosaria Ignaccolo
2024-01-01

Abstract

Random forest (RF) is a well-known data-driven algorithm applied in several fields, thanks to its flexibility in modeling the relationship between the response variable and the predictors, also in case of strong non-linearities. In environmental applications, it often occurs that the phenomenon of interest may present spatial and/or temporal dependence that is not taken explicitly into account by RF in its standard version. In this work, we propose a taxonomy to classify strategies according to when (Pre-, In-, and/or Post-processing) they try to include the spatial information into regression RF. Moreover, we provide a systematic review and classify the most recent strategies adopted to “adjust” regression RF to spatially dependent data, based on the criteria provided by the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA). The latter consists of a reproducible methodology for collecting and processing existing literature on a specified topic from different sources. PRISMA starts with a query and ends with a set of scientific documents to review: we performed an online query on the 25th of October 2022, and in the end, 32 documents were considered for review. The employed methodological strategies and the application fields considered in the 32 scientific documents are described and discussed.
2024
Advanced Statistical Methods in Process Monitoring, Finance, and Environmental Science - Essays in Honour of Wolfgang Schmid
Springer Cham
467
489
978-3-031-69110-2
https://arxiv.org/abs/2303.04693
Luca Patelli, Michela Cameletti, Natalia Golini, Rosaria Ignaccolo
File in questo prodotto:
File Dimensione Formato  
2024_Patelli et al_A path in regression Random Forest looking for spatial dependence- a taxonomy and a systematic review.pdf

Accesso riservato

Descrizione: Manoscritto
Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 1.29 MB
Formato Adobe PDF
1.29 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
2024_A_path_in_RF_SPRINGER_final.pdf

Accesso riservato

Descrizione: pdf editoriale
Tipo di file: PDF EDITORIALE
Dimensione 21.81 MB
Formato Adobe PDF
21.81 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1952355
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact