BioTEA: Containerized Methods of Analysis for Microarray-Based Transcriptomics Data

Visentin, Luca; Scarpellino, Giorgia; Chinigo', Giorgia; Munaron, Luca; Ruffinatti, Federico Alessandro

doi:10.3390/biology11091346

Simple Summary Researchers are often interested in detecting whether there are any differences in gene expression levels between two types of cells. To do this, gene expression levels are measured, and specific computer programs are used to detect these differences. Historically, microarrays were used to measure gene expression, but they are now being supplanted by newer, more efficient technologies such as RNA sequencing. BioTEA allows users to perform the differential expression analysis of microarray-derived data easily, quickly and in a reproducible way. It combines all the steps needed to directly start from the gene expression levels and obtain a list of genes that are differentially expressed between the two cell types of interest. In this way, the large amount of publicly available microarray data can still be analyzed in the modern era. Differential expression analyses can be rather complex to run, but BioTEA makes them straightforward, so that even non-bioinformaticians can perform them with ease. BioTEA is free and open-source. Tens of thousands of gene expression data sets describing a variety of model organisms in many different pathophysiological conditions are currently stored in publicly available databases such as the Gene Expression Omnibus (GEO) and ArrayExpress (AE). As microarray technology is giving way to RNA-seq, it becomes strategic to develop high-level tools of analysis to preserve access to this huge amount of information through the most sophisticated methods of data preparation and processing developed over the years, while ensuring, at the same time, the reproducibility of the results. To meet this need, here we present bioTEA (biological Transcript Expression Analyzer), a novel software tool that combines ease of use with the versatility and power of an R/Bioconductor-based differential expression analysis, starting from raw data retrieval and preparation to gene annotation. BioTEA is an R-coded pipeline, wrapped in a Python-based command line interface and containerized with Docker technology. The user can choose among multiple options-including gene filtering, batch effect handling, sample pairing, statistical test type-to adapt the algorithm flow to the structure of the particular data set. All these options are saved in a single text file, which can be easily shared between different laboratories to deterministically reproduce the results. In addition, a detailed log file provides accurate information about each step of the analysis. Overall, these features make bioTEA an invaluable tool for both bioinformaticians and wet-lab biologists interested in transcriptomics. BioTEA is free and open-source.