The rising usage of compute-intensive AI applications with fast response time requirements, such as text generation using large language models, underscores the need for more efficient and versatile hardware solutions. This drives the exploration of emerging architectures like RISC-V, which has the potential to deliver strong performance within tight power constraints. The recent commercial release of processors with RISC-V Vector (RVV) silicon-enabled extensions further amplifies the significance of RISC-V architectures, offering enhanced capabilities for parallel processing and accelerating tasks critical to large language models and other AI applications. This work aims to evaluate the BERT and GPT-2 language models inference performance on the SOPHON SG2042 64-core RISC-V architecture with silicon-enabled RVV v0.7.1. We benchmarked the models with and without RVV, using OpenBLAS and BLIS as BLAS backends for PyTorch to enable vectorization. Enabling RVV in OpenBLAS improved the inference performance by up to 40% in some cases.

Assessing Large Language Models Inference Performance on a 64-core RISC-V CPU with Silicon-Enabled Vectors

Adriano Marques Garcia
First
;
Giulio Malenza;Robert Birke;Marco Aldinucci
2024-01-01

Abstract

The rising usage of compute-intensive AI applications with fast response time requirements, such as text generation using large language models, underscores the need for more efficient and versatile hardware solutions. This drives the exploration of emerging architectures like RISC-V, which has the potential to deliver strong performance within tight power constraints. The recent commercial release of processors with RISC-V Vector (RVV) silicon-enabled extensions further amplifies the significance of RISC-V architectures, offering enhanced capabilities for parallel processing and accelerating tasks critical to large language models and other AI applications. This work aims to evaluate the BERT and GPT-2 language models inference performance on the SOPHON SG2042 64-core RISC-V architecture with silicon-enabled RVV v0.7.1. We benchmarked the models with and without RVV, using OpenBLAS and BLIS as BLAS backends for PyTorch to enable vectorization. Enabling RVV in OpenBLAS improved the inference performance by up to 40% in some cases.
2024
Big Data and High-Performance Computing (BigHPC 2024)
Pisa, Italy
17/09/2024
BigHPC2024: Special Track on Big Data and High-Performance Computing
CEUR Workshop Proceedings
3785
1
9
https://ceur-ws.org/Vol-3785/paper110.pdf
RISC-V, RVV, PyTorch, LLM, XuanTie C920, SOPHON SG2042, OpenBLAS, BLIS
Adriano Marques Garcia, Giulio Malenza, Robert Birke, Marco Aldinucci
File in questo prodotto:
File Dimensione Formato  
paper110.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 1.68 MB
Formato Adobe PDF
1.68 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2027926
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact