Presence-only data are referred to situations in which a censoring mechanism acts on a binary response which can be partially observed only with respect to one outcome, usually denoting the presence of an attribute of interest. A typical example is the recording of species presence in ecological surveys. In this work a Bayesian approach to the analysis of presence-only data based on a two levels scheme is presented. A probability law and a case-control design are combined to handle the double source of uncertainty: one due to censoring and the other one due to sampling. In the paper, through the use of a stratified sampling design with non-overlapping strata, a new formulation of the logistic model for presence-only data is proposed. In particular, the logistic regression with linear predictor is considered. Estimation is carried out with a new Markov Chain Monte Carlo algorithm with data augmentation, which does not require the a priori knowledge of the population prevalence. The performance of the new algorithm is validated by means of extensive simulation experiments using three scenarios and comparison with optimal benchmarks. An application to data existing in literature is reported in order to discuss the model behaviour in real world situations together with the results of an original study on termites occurrences data.

Bayesian logistic regression for presence-only data

Golini N.
Co-first
;
2015-01-01

Abstract

Presence-only data are referred to situations in which a censoring mechanism acts on a binary response which can be partially observed only with respect to one outcome, usually denoting the presence of an attribute of interest. A typical example is the recording of species presence in ecological surveys. In this work a Bayesian approach to the analysis of presence-only data based on a two levels scheme is presented. A probability law and a case-control design are combined to handle the double source of uncertainty: one due to censoring and the other one due to sampling. In the paper, through the use of a stratified sampling design with non-overlapping strata, a new formulation of the logistic model for presence-only data is proposed. In particular, the logistic regression with linear predictor is considered. Estimation is carried out with a new Markov Chain Monte Carlo algorithm with data augmentation, which does not require the a priori knowledge of the population prevalence. The performance of the new algorithm is validated by means of extensive simulation experiments using three scenarios and comparison with optimal benchmarks. An application to data existing in literature is reported in order to discuss the model behaviour in real world situations together with the results of an original study on termites occurrences data.
2015
29
6
1721
1736
Case-control design; Censored data; Data augmentation; Markov Chain Monte Carlo algorithm; Stratified sampling; Two levels scheme
Divino F.; Golini N.; Jona Lasinio G.; Penttinen A.
File in questo prodotto:
File Dimensione Formato  
SERRA 2016 Divino Golini JonaLasinio Penttinen.pdf

Accesso riservato

Descrizione: Articolo principale
Tipo di file: PDF EDITORIALE
Dimensione 546.2 kB
Formato Adobe PDF
546.2 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1739917
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 4
social impact