: We present SangsterLogP, the largest publicly available curated dataset of experimental logP values, comprising more than 23k unique molecules, with experimental logP values ranging from -3.8 to 11.7 (about 15.9 log units). The dataset originated from Dr. James Sangster's comprehensive literature review of over 3k sources. We implemented a systematic curation workflow including a) logD-to-logP adjustment for ionised compounds and b) consensus-based residual analysis for outliers and duplicates removal. External validation using retrospective and prospective test sets demonstrated robust predictive performance (RMSE of 0.34 and 0.47 log units, respectively). SangsterLogP also substantially expands coverage of chemical space compared to the widely used legacy PHYSPROP database, including compounds in the beyond-Rule-of-5 domain. The fully annotated dataset, including experimental conditions and sources, is freely accessible via the Zenodo repository and on the Online Chemical database and Modelling Environment website.
SangsterLogP - the largest publicly available dataset of logP values
Cirino, Thalita
;Caron, Giulia;Ermondi, Giuseppe;
2026-01-01
Abstract
: We present SangsterLogP, the largest publicly available curated dataset of experimental logP values, comprising more than 23k unique molecules, with experimental logP values ranging from -3.8 to 11.7 (about 15.9 log units). The dataset originated from Dr. James Sangster's comprehensive literature review of over 3k sources. We implemented a systematic curation workflow including a) logD-to-logP adjustment for ionised compounds and b) consensus-based residual analysis for outliers and duplicates removal. External validation using retrospective and prospective test sets demonstrated robust predictive performance (RMSE of 0.34 and 0.47 log units, respectively). SangsterLogP also substantially expands coverage of chemical space compared to the widely used legacy PHYSPROP database, including compounds in the beyond-Rule-of-5 domain. The fully annotated dataset, including experimental conditions and sources, is freely accessible via the Zenodo repository and on the Online Chemical database and Modelling Environment website.| File | Dimensione | Formato | |
|---|---|---|---|
|
s41597-026-07357-2_reference.pdf
Accesso aperto
Dimensione
646.99 kB
Formato
Adobe PDF
|
646.99 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



