In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo’s programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: (1) unifying batch and stream data access models, (2) decoupling processing from data layout, and (3) exploiting a stream-oriented, scalable, efficient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to reuse the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo, when compared to Spark and Flink, can attain better performances in terms of execution times and can hugely improve memory utilization, both for batch and stream processing.

PiCo: High-performance data analytics pipelines in modern C++

Claudia Misale;Maurizio Drocco;Guy Tremblay;MARTINELLI, Alberto Riccardo;Marco Aldinucci
2018-01-01

Abstract

In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo’s programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: (1) unifying batch and stream data access models, (2) decoupling processing from data layout, and (3) exploiting a stream-oriented, scalable, efficient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to reuse the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo, when compared to Spark and Flink, can attain better performances in terms of execution times and can hugely improve memory utilization, both for batch and stream processing.
2018
87
392
403
https://www.sciencedirect.com/science/article/pii/S0167739X1732681X
Big data, High performance data analytics, Domain specific language, C++, Stream computing, Fog computing, Edge computing
Claudia Misale, Maurizio Drocco, Guy Tremblay, Alberto R. Martinelli, Marco Aldinucci
File in questo prodotto:
File Dimensione Formato  
fgcs_pico.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 859.24 kB
Formato Adobe PDF
859.24 kB Adobe PDF Visualizza/Apri
1-s2.0-S0167739X1732681X-main.pdf

Accesso riservato

Descrizione: Editoriale
Tipo di file: PDF EDITORIALE
Dimensione 1.06 MB
Formato Adobe PDF
1.06 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1668444
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 15
social impact