Keywords categorization is an essential tool for SEO (Search Engine Optimization), digital marketers, and online advertising. Keywords represent one of the most valuable pieces of information to infer the users' intents and interests. An effective keyword categorization method allows understanding what types of content are in the greatest demand and can help improve future content strategies or marketing/ad campaigns.In this paper, we present a novel deep learning model for multilingual keyword categorization. The model relies on fastText multilingual word embeddings, and its architecture is inspired by the DeepSets model. To make use of (training) words not included in the pre-trained fastText embeddings, we initialize them as the average embedding overall of the co-occurrent words. Then, we fine-tune these representations by allowing the network to back-propagate the error to the input. We assess the quality of our proposal on a real-world dataset provided by a Spanish company where keywords are categorized upon the Google Product Taxonomy (GPT). Empirical results show that our model can achieve high accuracy scores while being extremely efficient.

Efficient Multilingual Deep Learning Model for Keyword Categorization

Polato, M
;
2021-01-01

Abstract

Keywords categorization is an essential tool for SEO (Search Engine Optimization), digital marketers, and online advertising. Keywords represent one of the most valuable pieces of information to infer the users' intents and interests. An effective keyword categorization method allows understanding what types of content are in the greatest demand and can help improve future content strategies or marketing/ad campaigns.In this paper, we present a novel deep learning model for multilingual keyword categorization. The model relies on fastText multilingual word embeddings, and its architecture is inspired by the DeepSets model. To make use of (training) words not included in the pre-trained fastText embeddings, we initialize them as the average embedding overall of the co-occurrent words. Then, we fine-tune these representations by allowing the network to back-propagate the error to the input. We assess the quality of our proposal on a real-world dataset provided by a Spanish company where keywords are categorized upon the Google Product Taxonomy (GPT). Empirical results show that our model can achieve high accuracy scores while being extremely efficient.
2021
IEEE Symposium Series on Computational Intelligence (SSCI)
Online
05-07/12/2021
2021 IEEE Symposium Series on Computational Intelligence (SSCI)
IEEE
01
08
978-1-7281-9048-8
Keyword Categorization; Word Embeddings; Deep Neural Networks; Deep Learning; Natural Language Processing
Polato, M; Demchenko, D; Kuanyshkereyev, A; Navarin, N
File in questo prodotto:
File Dimensione Formato  
SSCI2021___keyword_categorization.pdf

Accesso aperto

Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 345.31 kB
Formato Adobe PDF
345.31 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1874877
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact