CINECA IRIS Institutional Research Information System

Keywords categorization is an essential tool for SEO (Search Engine Optimization), digital marketers, and online advertising. Keywords represent one of the most valuable pieces of information to infer the users' intents and interests. An effective keyword categorization method allows understanding what types of content are in the greatest demand and can help improve future content strategies or marketing/ad campaigns.In this paper, we present a novel deep learning model for multilingual keyword categorization. The model relies on fastText multilingual word embeddings, and its architecture is inspired by the DeepSets model. To make use of (training) words not included in the pre-trained fastText embeddings, we initialize them as the average embedding overall of the co-occurrent words. Then, we fine-tune these representations by allowing the network to back-propagate the error to the input. We assess the quality of our proposal on a real-world dataset provided by a Spanish company where keywords are categorized upon the Google Product Taxonomy (GPT). Empirical results show that our model can achieve high accuracy scores while being extremely efficient.

Efficient Multilingual Deep Learning Model for Keyword Categorization

Polato, M;Demchenko, D;Kuanyshkereyev, A;Navarin, N

2021-01-01

Abstract

Keywords categorization is an essential tool for SEO (Search Engine Optimization), digital marketers, and online advertising. Keywords represent one of the most valuable pieces of information to infer the users' intents and interests. An effective keyword categorization method allows understanding what types of content are in the greatest demand and can help improve future content strategies or marketing/ad campaigns.In this paper, we present a novel deep learning model for multilingual keyword categorization. The model relies on fastText multilingual word embeddings, and its architecture is inspired by the DeepSets model. To make use of (training) words not included in the pre-trained fastText embeddings, we initialize them as the average embedding overall of the co-occurrent words. Then, we fine-tune these representations by allowing the network to back-propagate the error to the input. We assess the quality of our proposal on a real-world dataset provided by a Spanish company where keywords are categorized upon the Google Product Taxonomy (GPT). Empirical results show that our model can achieve high accuracy scores while being extremely efficient.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Titolo dell'evento
	
				IEEE Symposium Series on Computational Intelligence
			
	Luogo dell'evento
	
				Online
			
	Data dell'evento
	
				05-07/12/2021
			
	Titolo del volume
	
				2021 IEEE Symposium Series on Computational Intelligence (SSCI)
			
	Nome editore
	
				IEEE
			
	Pagine (da)
	
				01
			
	Pagine (a)
	
				08
			
	Codice ISBN
	
				978-1-7281-9048-8
			
	DOI
	
				https://dx.doi.org/10.1109/SSCI50451.2021.9660132
			
	Parole Chiave
	
				Keyword Categorization; Word Embeddings; Deep Neural Networks; Deep Learning; Natural Language Processing
			
	Tutti gli autori
	
						Polato, M; Demchenko, D; Kuanyshkereyev, A; Navarin, N
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
SSCI2021___keyword_categorization.pdf Accesso aperto Tipo di file: PREPRINT (PRIMA BOZZA) Dimensione 345.31 kB Formato Adobe PDF Visualizza/Apri	345.31 kB	Adobe PDF	Visualizza/Apri
Efficient_Multilingual_Deep_Learning_Model_for_Keyword_Categorization.pdf Accesso riservato Tipo di file: PDF EDITORIALE Dimensione 226.91 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	226.91 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1874877

Citazioni

ND

3

1

social impact