CINECA IRIS Institutional Research Information System

Vehicle Make and Model Recognition (VMMR) deals with the problem of classifying vehicles whose appearance may vary significantly when captured from different perspectives. A number of successful approaches to this problem rely on part-based models, requiring however labor-intensive parts annotations. In this work, we address the VMMR problem proposing a deep convolutional architecture built upon multi-scale attention windows. The proposed architecture classifies a vehicle over attention windows which are predicted to minimize the classification error. Through these windows, the visual representations of the most discriminative part of the vehicle are aggregated over different scales which in fact provide more representative features for the classifier. In addition, we define a loss function accounting for the joint classification error across make and model. Besides, a training methodology is devised to stabilize the training process and to impose multi-scale constraints on predicted attention windows. The proposed architecture outperforms state-of-the-art schemes reducing the model classification error over the Stanford dataset by 1.7 % and improving the classification accuracy by 0.2 % and 0.3 % on model and make respectively over the CompCar dataset.

Vehicle joint make and model recognition with multiscale attention windows

Ghassemi, Sina;Fiandrotti, Attilio^Co-first;Caimotti, Emanuele;Francini, Gianluca;Magli, Enrico

2019-01-01

Abstract

Vehicle Make and Model Recognition (VMMR) deals with the problem of classifying vehicles whose appearance may vary significantly when captured from different perspectives. A number of successful approaches to this problem rely on part-based models, requiring however labor-intensive parts annotations. In this work, we address the VMMR problem proposing a deep convolutional architecture built upon multi-scale attention windows. The proposed architecture classifies a vehicle over attention windows which are predicted to minimize the classification error. Through these windows, the visual representations of the most discriminative part of the vehicle are aggregated over different scales which in fact provide more representative features for the classifier. In addition, we define a loss function accounting for the joint classification error across make and model. Besides, a training methodology is devised to stabilize the training process and to impose multi-scale constraints on predicted attention windows. The proposed architecture outperforms state-of-the-art schemes reducing the model classification error over the Stanford dataset by 1.7 % and improving the classification accuracy by 0.2 % and 0.3 % on model and make respectively over the CompCar dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo rivista
	
				SIGNAL PROCESSING-IMAGE COMMUNICATION
			
	N. Volume
	
				72
			
	Pagine (da)
	
				69
			
	Pagine (a)
	
				79
			
	DOI
	
				https://dx.doi.org/10.1016/j.image.2018.12.009
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://www.sciencedirect.com/science/article/abs/pii/S0923596518307331
			
	Tutti gli autori
	
						Ghassemi, Sina; Fiandrotti, Attilio; Caimotti, Emanuele; Francini, Gianluca; Magli, Enrico
					
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0923596518307331-main.pdf Accesso riservato Dimensione 2.28 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.28 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1770132

Citazioni

ND

14

11

social impact