A novel virtual screening paradigm via unsupervised iterative alignment and COSMOsar3D descriptors

Tosco, Paolo; Klamt, A.; Balle, T.; Harpsøe, K.; Peters, D.; Ahring, P. K.

We present a novel multi-conformational alignment strategy based on iterative refinement of an initial binding mode hypothesis. Firstly, ligand conformations having the best chemical and coordinate match are superimposed on templates extracted from crystallographic ligand-target complexes; when the latter are not available, the ligand conformation yielding the most consistent overlay of the training set is chosen as template. Subsequently, this initial, raw alignment is iteratively refined, using the best-aligned training-set ligands as possible alternative templates for those still misplaced, until convergence is reached. The high computational efficiency of the algorithm allowed obtaining consistent alignments of the eight datasets of the Sutherland benchmark suite (1337 molecules, >150K conformations overall) overnight on a common workstation. 3D-QSAR models were built on these alignments using quantum-mechanical COSMOsar3D descriptors based on COSMO local sigma-profiles. Our COSMOsar3D models were comparable to CoMFA models of the benchmark suite in terms of both internal and external predictive power; however, while Sutherland used individually tailored supervised alignment procedures, our procedure is completely automated and unsupervised, and as such suited for high-throughput virtual screening of potential drug candidates. Compared to pKi values predicted by docking with AutoDock VINA on the same targets from which co-crystallized ligands were extracted to serve as superposition templates, our predictions were far more reliable. We also present the application of the new methodology to a dataset of 59 diazabicyclo[3.2.2]nonanes recently characterized as partial agonists at the human alpha7 nicotinic receptor subtype, whose 3D structure is unknown. Our purely ligand-based binding mode hypothesis is compatible with site-directed mutagenesis data and fits a completely unrelated alpha7 receptor homology model; furthermore, the associated COSMOsar3D model has excellent predictive power (training set: q2[LOO] = 0.69, q2[L20%O] = 0.66; test set: r2[pred] = 0.92, SDEP[ext] = 0.31).