Notable Site Recognition using Deep Learning on Mobile and Crowd-sourced Imagery

Tan, Jimin; Noulas, Anastasios; Sáez, Diego; Schifanella, Rossano

doi:10.1109/MDM48529.2020.00036

Being able to automatically recognize notable sites in the physical world using artificial intelligence embedded in mobile devices can pave the way to new forms of urban exploration and open novel channels of interactivity between residents, travellers, and cities. Although the development of outdoor recognition systems has been a topic of interest for a while, most works have been limited in geographic coverage due to the lack of high-quality image data that can be used for training site recognition engines. As a result, prior systems usually lack generality and operate on a limited scope of pre-elected sites. In this work, we design a mobile system that can automatically recognise sites of interest and project relevant information to a user that navigates the city. We build a collection of notable sites using Wikipedia and then exploit online services such as Google Images and Flickr to collect large collections of crowd-sourced imagery describing those sites. These images are then used to train minimal deep learning architectures that can be effectively deployed to dedicated applications on mobile devices. By conducting an evaluation and performing a series of online and real-world experiments, we recognise a number of key challenges in deploying a site recognition system and highlight the importance of incorporating mobile contextual information to facilitate the visual recognition task. The similarity in the feature maps of objects that undergo identification, the presence of noise in crowd-sourced imagery and arbitrary user-induced inputs are among the factors the impede correct classification for deep learning models. We show how curating the training data through the application of a class-specific image de-noising method and the incorporation of information such as user location, orientation, and attention patterns can allow for significant improvement in classification accuracy and the election of an end-to-end system that can effectively be used to recognise sites in the wild.