CINECA IRIS Institutional Research Information System

This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency. The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its (complex) data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation. We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neigh- borhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x (on a 32-core platform). Finally, the proposed C++ implementation outperforms the first R NuChart prototype, by which it was not possible to complete the graph generation because of strong memory-saturation problems.

Memory-Optimised Parallel Processing of Hi-C Data

DROCCO, MAURIZIO;MISALE, CLAUDIA;PERETTI PEZZI, GUILHERME;TORDINI, FABIO;ALDINUCCI, MARCO

2015-01-01

Abstract

This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency. The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its (complex) data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation. We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neigh- borhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x (on a 32-core platform). Finally, the proposed C++ implementation outperforms the first R NuChart prototype, by which it was not possible to complete the graph generation because of strong memory-saturation problems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2015
		
	Titolo dell'evento
	
			PDP 2015: Parallel Distributed and network-based Processing
		
	Luogo dell'evento
	
			Turku (FI)
		
	Data dell'evento
	
			March 2015
		
	Titolo del volume
	
			PDP 2015 - 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing
		
	Nome editore
	
			IEEE Computer Society
		
	Pagine (da)
	
			741
		
	Pagine (a)
	
			746
		
	Codice ISBN
	
			978-1-4799-8490-9
		
	DOI
	
			https://dx.doi.org/10.1109/PDP.2015.63
		
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
			http://calvados.di.unipi.it/storage/paper_files/2015_pdp_memopt.pdf
		
	Parole Chiave
	
			fastflow,bioinformatics
		
	Tutti gli autori
	
			Drocco, Maurizio; Misale, Claudia; Peretti Pezzi,Guilherme; Tordini, Fabio; Aldinucci, Marco
		
	Appare nelle tipologie:
	
			04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
2015_pdp_memopt.pdf Accesso aperto Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 344.83 kB Formato Adobe PDF Visualizza/Apri	344.83 kB	Adobe PDF	Visualizza/Apri
2015_PDP_memory.pdf Accesso riservato Tipo di file: PDF EDITORIALE Dimensione 299.69 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	299.69 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1521910

Citazioni

ND

4

0

social impact