CINECA IRIS Institutional Research Information System

The rising usage of compute-intensive AI applications with fast response time requirements, such as text generation using large language models, underscores the need for more efficient and versatile hardware solutions. This drives the exploration of emerging architectures like RISC-V, which has the potential to deliver strong performance within tight power constraints. The recent commercial release of processors with RISC-V Vector (RVV) silicon-enabled extensions further amplifies the significance of RISC-V architectures, offering enhanced capabilities for parallel processing and accelerating tasks critical to large language models and other AI applications. This work aims to evaluate the BERT and GPT-2 language models inference performance on the SOPHON SG2042 64-core RISC-V architecture with silicon-enabled RVV v0.7.1. We benchmarked the models with and without RVV, using OpenBLAS and BLIS as BLAS backends for PyTorch to enable vectorization. Enabling RVV in OpenBLAS improved the inference performance by up to 40% in some cases.

Assessing Large Language Models Inference Performance on a 64-core RISC-V CPU with Silicon-Enabled Vectors

Adriano Marques Garcia^First;Giulio Malenza;Robert Birke;Marco Aldinucci

2024-01-01

Abstract

The rising usage of compute-intensive AI applications with fast response time requirements, such as text generation using large language models, underscores the need for more efficient and versatile hardware solutions. This drives the exploration of emerging architectures like RISC-V, which has the potential to deliver strong performance within tight power constraints. The recent commercial release of processors with RISC-V Vector (RVV) silicon-enabled extensions further amplifies the significance of RISC-V architectures, offering enhanced capabilities for parallel processing and accelerating tasks critical to large language models and other AI applications. This work aims to evaluate the BERT and GPT-2 language models inference performance on the SOPHON SG2042 64-core RISC-V architecture with silicon-enabled RVV v0.7.1. We benchmarked the models with and without RVV, using OpenBLAS and BLIS as BLAS backends for PyTorch to enable vectorization. Enabling RVV in OpenBLAS improved the inference performance by up to 40% in some cases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo dell'evento
	
				Big Data and High-Performance Computing (BigHPC 2024)
			
	Luogo dell'evento
	
				Pisa, Italy
			
	Data dell'evento
	
				17/09/2024
			
	Titolo del volume
	
				BigHPC2024: Special Track on Big Data and High-Performance Computing
			
	Nome editore
	
				CEUR Workshop Proceedings
			
	N. Volume
	
				3785
			
	Pagine (da)
	
				1
			
	Pagine (a)
	
				9
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://ceur-ws.org/Vol-3785/paper110.pdf
			
	Parole Chiave
	
				RISC-V,  RVV,  PyTorch,  LLM,  XuanTie C920,  SOPHON SG2042,  OpenBLAS, BLIS
			
	Tutti gli autori
	
						Adriano Marques Garcia, Giulio Malenza, Robert Birke, Marco Aldinucci
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
paper110.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 1.68 MB Formato Adobe PDF Visualizza/Apri	1.68 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2027926

Citazioni

ND

ND

ND

social impact