Motivation: Large language models (LLMs) are rapidly becoming indispensable across the life‑sciences spectrum, from literature mining through clinical decision support to experimental design. Yet, in single‑cell RNA‑sequencing (scRNA‑seq) analysis, most LLM‑enabled tools remain opaque: they output a single label per cluster without disclosing the chain‑of‑ thought that led to that decision. This opaqueness undermines reproducibility, complicates peer‑review, and ultimately slows the adoption of otherwise powerful methods. Results: We developed GPTBioInsightor, an LLM‑powered assistant that not only annotates cell types, cell states, and pathway activities but also narrates how it arrived at each conclusion, step-by-step. Across benchmark datasets-including peripheral blood mononuclear cells (PBMC3K) and pancreatic ductal adenocarcinoma-GPTBioInsightor achieved at least parity with expert manual curation while delivering transparent reasoning, confidence scores, and literature‑based evidence. By closing the "interpretability gap," GPTBioInsightor equips wet‑lab biologists, computational scientists, and reviewers with an audit‑ready trail, thereby accelerating discovery and fostering trust in AI‑assisted bioinformatics. Availability and implementation: GPTBioInsightor is freely available on GitHub under a BSD-3-Clause license (https://github.com/huang-sh/GPTBioInsightor).
GPTBioInsightor—leveraging large language models for transparent scRAN-seq cell type annotations
Shenghui HuangFirst
Membro del Collaboration Group
;Berina SabanovicMembro del Collaboration Group
;Luca Alessandrì;
2026-01-01
Abstract
Motivation: Large language models (LLMs) are rapidly becoming indispensable across the life‑sciences spectrum, from literature mining through clinical decision support to experimental design. Yet, in single‑cell RNA‑sequencing (scRNA‑seq) analysis, most LLM‑enabled tools remain opaque: they output a single label per cluster without disclosing the chain‑of‑ thought that led to that decision. This opaqueness undermines reproducibility, complicates peer‑review, and ultimately slows the adoption of otherwise powerful methods. Results: We developed GPTBioInsightor, an LLM‑powered assistant that not only annotates cell types, cell states, and pathway activities but also narrates how it arrived at each conclusion, step-by-step. Across benchmark datasets-including peripheral blood mononuclear cells (PBMC3K) and pancreatic ductal adenocarcinoma-GPTBioInsightor achieved at least parity with expert manual curation while delivering transparent reasoning, confidence scores, and literature‑based evidence. By closing the "interpretability gap," GPTBioInsightor equips wet‑lab biologists, computational scientists, and reviewers with an audit‑ready trail, thereby accelerating discovery and fostering trust in AI‑assisted bioinformatics. Availability and implementation: GPTBioInsightor is freely available on GitHub under a BSD-3-Clause license (https://github.com/huang-sh/GPTBioInsightor).| File | Dimensione | Formato | |
|---|---|---|---|
|
GPTBIOINSIGHTOR.pdf
Accesso aperto
Dimensione
680.18 kB
Formato
Adobe PDF
|
680.18 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



