Motivation: Large language models (LLMs) are rapidly becoming indispensable across the life‑sciences spectrum, from literature mining through clinical decision support to experimental design. Yet, in single‑cell RNA‑sequencing (scRNA‑seq) analysis, most LLM‑enabled tools remain opaque: they output a single label per cluster without disclosing the chain‑of‑ thought that led to that decision. This opaqueness undermines reproducibility, complicates peer‑review, and ultimately slows the adoption of otherwise powerful methods. Results: We developed GPTBioInsightor, an LLM‑powered assistant that not only annotates cell types, cell states, and pathway activities but also narrates how it arrived at each conclusion, step-by-step. Across benchmark datasets-including peripheral blood mononuclear cells (PBMC3K) and pancreatic ductal adenocarcinoma-GPTBioInsightor achieved at least parity with expert manual curation while delivering transparent reasoning, confidence scores, and literature‑based evidence. By closing the "interpretability gap," GPTBioInsightor equips wet‑lab biologists, computational scientists, and reviewers with an audit‑ready trail, thereby accelerating discovery and fostering trust in AI‑assisted bioinformatics. Availability and implementation: GPTBioInsightor is freely available on GitHub under a BSD-3-Clause license (https://github.com/huang-sh/GPTBioInsightor).

GPTBioInsightor—leveraging large language models for transparent scRAN-seq cell type annotations

Shenghui Huang
First
Membro del Collaboration Group
;
Berina Sabanovic
Membro del Collaboration Group
;
Luca Alessandrì;
2026-01-01

Abstract

Motivation: Large language models (LLMs) are rapidly becoming indispensable across the life‑sciences spectrum, from literature mining through clinical decision support to experimental design. Yet, in single‑cell RNA‑sequencing (scRNA‑seq) analysis, most LLM‑enabled tools remain opaque: they output a single label per cluster without disclosing the chain‑of‑ thought that led to that decision. This opaqueness undermines reproducibility, complicates peer‑review, and ultimately slows the adoption of otherwise powerful methods. Results: We developed GPTBioInsightor, an LLM‑powered assistant that not only annotates cell types, cell states, and pathway activities but also narrates how it arrived at each conclusion, step-by-step. Across benchmark datasets-including peripheral blood mononuclear cells (PBMC3K) and pancreatic ductal adenocarcinoma-GPTBioInsightor achieved at least parity with expert manual curation while delivering transparent reasoning, confidence scores, and literature‑based evidence. By closing the "interpretability gap," GPTBioInsightor equips wet‑lab biologists, computational scientists, and reviewers with an audit‑ready trail, thereby accelerating discovery and fostering trust in AI‑assisted bioinformatics. Availability and implementation: GPTBioInsightor is freely available on GitHub under a BSD-3-Clause license (https://github.com/huang-sh/GPTBioInsightor).
2026
1
6
https://academic.oup.com/bioinformaticsadvances/article/6/1/vbag025/8436076?login=true
Shenghui Huang, Berina Sabanovic, Yuzhong Peng, Quan Zheng, Luca Alessandrì, Christopher Heeschen
File in questo prodotto:
File Dimensione Formato  
GPTBIOINSIGHTOR.pdf

Accesso aperto

Dimensione 680.18 kB
Formato Adobe PDF
680.18 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2138150
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact