CINECA IRIS Institutional Research Information System

In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo’s programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: (1) unifying batch and stream data access models, (2) decoupling processing from data layout, and (3) exploiting a stream-oriented, scalable, efficient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to reuse the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo, when compared to Spark and Flink, can attain better performances in terms of execution times and can hugely improve memory utilization, both for batch and stream processing.

PiCo: High-performance data analytics pipelines in modern C++

Claudia Misale;Maurizio Drocco;Guy Tremblay;MARTINELLI, Alberto Riccardo;Marco Aldinucci

2018-01-01

Abstract

In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo’s programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: (1) unifying batch and stream data access models, (2) decoupling processing from data layout, and (3) exploiting a stream-oriented, scalable, efficient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to reuse the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo, when compared to Spark and Flink, can attain better performances in terms of execution times and can hugely improve memory utilization, both for batch and stream processing.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Titolo rivista
	
				FUTURE GENERATION COMPUTER SYSTEMS
			
	N. Volume
	
				87
			
	Pagine (da)
	
				392
			
	Pagine (a)
	
				403
			
	DOI
	
				https://dx.doi.org/10.1016/j.future.2018.05.030
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://www.sciencedirect.com/science/article/pii/S0167739X1732681X
			
	Parole Chiave
	
				Big data, High performance data analytics, Domain specific language, C++, Stream computing, Fog computing, Edge computing
			
	Tutti gli autori
	
						Claudia Misale, Maurizio Drocco, Guy Tremblay, Alberto R. Martinelli,
 Marco Aldinucci
					
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
fgcs_pico.pdf Accesso aperto Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 859.24 kB Formato Adobe PDF Visualizza/Apri	859.24 kB	Adobe PDF	Visualizza/Apri
1-s2.0-S0167739X1732681X-main.pdf Accesso riservato Descrizione: Editoriale Tipo di file: PDF EDITORIALE Dimensione 1.06 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.06 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1668444

Citazioni

ND

17

16

social impact