Since the Transformer architecture was introduced in 2017, there has been many attempts to bring the self-attention paradigm in the field of computer vision. In this paper, we propose LHC: Local multi-Head Channel self-attention, a novel self-attention module that can be easily integrated into virtually every convolutional neural network, and that is specifically designed for computer vision, with a specific focus on facial expression recognition. LHC is based on two main ideas: first, we think that in computer vision, the best way to leverage the self-attention paradigm is the channel-wise application instead of the more well explored spatial attention. Secondly, a local approach has the potential to better overcome the limitations of convolution than global attention, at least in those scenarios where images have a constant general structure, as in facial expression recognition. LHC-Net achieves a new state-of-the-art in the FER2013 dataset, with a significantly lower complexity and impact on the “host” architecture in terms of computational cost when compared with the previous state-of-the-art.

Local Multi-Head Channel Self-Attention for Facial Expression Recognition

Pecoraro Roberto
;
Basile Valerio;Bono Viviana
2022-01-01

Abstract

Since the Transformer architecture was introduced in 2017, there has been many attempts to bring the self-attention paradigm in the field of computer vision. In this paper, we propose LHC: Local multi-Head Channel self-attention, a novel self-attention module that can be easily integrated into virtually every convolutional neural network, and that is specifically designed for computer vision, with a specific focus on facial expression recognition. LHC is based on two main ideas: first, we think that in computer vision, the best way to leverage the self-attention paradigm is the channel-wise application instead of the more well explored spatial attention. Secondly, a local approach has the potential to better overcome the limitations of convolution than global attention, at least in those scenarios where images have a constant general structure, as in facial expression recognition. LHC-Net achieves a new state-of-the-art in the FER2013 dataset, with a significantly lower complexity and impact on the “host” architecture in terms of computational cost when compared with the previous state-of-the-art.
2022
13
9
1
17
computer vision; convolutional neural networks; facial expression recognition; self-attention
Pecoraro Roberto; Basile Valerio; Bono Viviana
File in questo prodotto:
File Dimensione Formato  
information-13-00419-v2.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 1.74 MB
Formato Adobe PDF
1.74 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1878900
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
  • ???jsp.display-item.citation.isi??? 9
social impact