In recent years we have understood the importance of analyzing and sequencing human genetic variation. A relevant aspect that emerged from the Covid-19 pandemic was the need to obtain results very quickly; this involved using High-Performance Computing (HPC) environments to execute the Next Generation Sequencing (NGS) pipeline. However, HPC is not always the most suitable environment for the entire execution of a pipeline, especially when it involves many heterogeneous tools. The ability to execute parts of the pipeline on different environments can lead to higher performance but also cheaper executions. This work shows the design and optimization process that led us to a state-of-the-art Variant Calling hybrid workflow based on the StreamFlow Workflow Management System (WfMS). We also compare StreamFlow with Snakemake, an established WfMS targeting HPC facilities, observing comparable performance on single environments and satisfactory improvements with a hybrid cloud-HPC configuration.

Porting the Variant Calling Pipeline for NGS data in cloud-HPC environment

Alberto Mulone
First
;
Marco Aldinucci
Last
2023-01-01

Abstract

In recent years we have understood the importance of analyzing and sequencing human genetic variation. A relevant aspect that emerged from the Covid-19 pandemic was the need to obtain results very quickly; this involved using High-Performance Computing (HPC) environments to execute the Next Generation Sequencing (NGS) pipeline. However, HPC is not always the most suitable environment for the entire execution of a pipeline, especially when it involves many heterogeneous tools. The ability to execute parts of the pipeline on different environments can lead to higher performance but also cheaper executions. This work shows the design and optimization process that led us to a state-of-the-art Variant Calling hybrid workflow based on the StreamFlow Workflow Management System (WfMS). We also compare StreamFlow with Snakemake, an established WfMS targeting HPC facilities, observing comparable performance on single environments and satisfactory improvements with a hybrid cloud-HPC configuration.
2023
Inglese
contributo
4 - Workshop
1st Workshop on Workflows in Distributed Environments
Torino
June 27-29
Internazionale
2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)
Comitato scientifico
Hossain Shahriar, Yuuichi Teranishi, Alfredo Cuzzocrea, Moushumi Sharmin, Dave Towey, AKM Jahangir Alam Majumder, Hiroki Kashiwazaki, Ji-Jiang Yang, Michiharu Takemoto, Nazmus Sakib, Ryohei Banno, Sheikh Iqbal Ahamed
Kennesaw, Georgia, Stati Uniti
STATI UNITI D'AMERICA
1858
1863
6
979-8-3503-2697-0
cloud computing, High Performance Computing, Hybrid workflow, StreamFlow
GERMANIA
REGNO UNITO DI GRAN BRETAGNA
   Third Party - "ACROSS - HPC Big DAta ArtifiCial Intelligence cross Stack PlatfoRm TOwards ExaScale" (EuroHPC-02-2019)
   ACROSS
   EUROPEAN COMMISSION
   H2020
   ALDINUCCI M. - Prog. UE-RIA - G.A. n.955648
1 – prodotto con file in versione Open Access (allegherò il file al passo 6 - Carica)
4
info:eu-repo/semantics/conferenceObject
04-CONTRIBUTO IN ATTI DI CONVEGNO::04A-Conference paper in volume
Alberto Mulone; Sherine Awad; Davide Chiarugi; Marco Aldinucci
273
partially_open
File in questo prodotto:
File Dimensione Formato  
paper.pdf

Accesso aperto

Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 691.67 kB
Formato Adobe PDF
691.67 kB Adobe PDF Visualizza/Apri
269700b858.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 679.35 kB
Formato Adobe PDF
679.35 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1919364
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
social impact