The rising usage of compute-intensive AI applications with fast response time requirements, such as text generation using large language models (LLMs), underscores the need for more efficient and versatile hardware solutions. This drives the exploration of emerging architectures like RISC-V, which have the potential to deliver strong performance within tight power constraints. The recent commercial release of processors with RISC-V Vector (RVV) silicon-enabled extensions further amplifies the significance of RISC-V architectures, offering enhanced capabilities for parallel processing and accelerating tasks critical to LLMs. This work evaluates the BERT, GPT-2, Gemma-2, LLaMA-3.2, and DeepSeek-LLM language model inference performance on the SOPHON SG2042 64-core RISC-V architecture with silicon-enabled RVV v0.7.1. We benchmarked the models with and without RVV, using OpenBLAS and BLIS as backends for PyTorch to enable vectorization. Our results show that the performance impact of RVV is closely tied to the matrix shape and arithmetic intensity. Indeed, vectorization can actually slow down GEMM operations due to memory-bound behavior, whereas higher batch sizes shift execution to the compute-bound region, where RVV shows clear benefits. We validate this behavior experimentally using roofline modeling and traced GEMM timing, revealing performance bottlenecks that are invisible to synthetic micro-benchmarks. While enabling RVV in OpenBLAS can speed up inference performance by up to 1.3x, its benefits are highly configuration-dependent. These insights suggest that workload characteristics, threading behavior, and datatype must be carefully aligned to unlock RVV’s full potential. Our findings highlight both the promise and the current software limitations of running LLMs on RVV-enabled RISC-V platforms.

Inference Performance of Large Language Models on a 64-core RISC-V CPU with Silicon-Enabled Vectors

Garcia, Adriano Marques
;
Malenza, Giulio;Birke, Robert;Aldinucci, Marco
2025-01-01

Abstract

The rising usage of compute-intensive AI applications with fast response time requirements, such as text generation using large language models (LLMs), underscores the need for more efficient and versatile hardware solutions. This drives the exploration of emerging architectures like RISC-V, which have the potential to deliver strong performance within tight power constraints. The recent commercial release of processors with RISC-V Vector (RVV) silicon-enabled extensions further amplifies the significance of RISC-V architectures, offering enhanced capabilities for parallel processing and accelerating tasks critical to LLMs. This work evaluates the BERT, GPT-2, Gemma-2, LLaMA-3.2, and DeepSeek-LLM language model inference performance on the SOPHON SG2042 64-core RISC-V architecture with silicon-enabled RVV v0.7.1. We benchmarked the models with and without RVV, using OpenBLAS and BLIS as backends for PyTorch to enable vectorization. Our results show that the performance impact of RVV is closely tied to the matrix shape and arithmetic intensity. Indeed, vectorization can actually slow down GEMM operations due to memory-bound behavior, whereas higher batch sizes shift execution to the compute-bound region, where RVV shows clear benefits. We validate this behavior experimentally using roofline modeling and traced GEMM timing, revealing performance bottlenecks that are invisible to synthetic micro-benchmarks. While enabling RVV in OpenBLAS can speed up inference performance by up to 1.3x, its benefits are highly configuration-dependent. These insights suggest that workload characteristics, threading behavior, and datatype must be carefully aligned to unlock RVV’s full potential. Our findings highlight both the promise and the current software limitations of running LLMs on RVV-enabled RISC-V platforms.
2025
177
1
15
https://doi.org/10.1016/j.future.2025.108242
RISC-V, RVV, PyTorch, LLM, XuanTie C920, SOPHON SG2042, OpenBLAS, BLIS, GEMM
Garcia, Adriano Marques; Malenza, Giulio; Birke, Robert; Aldinucci, Marco
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0167739X25005369-main_compressed.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 9.99 MB
Formato Adobe PDF
9.99 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
FGCS_RVV_PrePrint-1.pdf

Accesso aperto

Descrizione: PrePrint
Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 3.4 MB
Formato Adobe PDF
3.4 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2105617
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact