Subdomain enumeration is a fundamental step of many security processes (i.e., vulnerability discovery, OSINT, host enumeration, etc.). Up to now, this has been achieved with deterministic procedures that have shown some limitations. For instance, the process typically requires the generation of a candidate, which is subsequently checked for validity. While the validation is a straightforward procedure, the definition of an optimal candidate generation strategy is still an open problem. This paper presents a novel subdomain enumeration tool that allows the generation of high-quality sub-domain candidates. We employ a Generative Adversarial Network (GAN) to sample unseen candidates from the distribution of valid subdomain names. The model learns this distribution from publicly available datasets. Moreover, by sampling from the trained model, we address the limitations of traditional algorithms. Our experiments were carried out against 15 domains and a ground truth of 1164 other targets. The 15 domains were carefully selected from bug bounty platforms to avoid terms of use violations. Several factors influenced the choices, including the popularity, the expected number of subdomains, and the available services. Our experiments aim to validate our approach by testing the performance increase in subdomain enumeration processes against the state-of-the-art. We benchmark our proposal in terms of candidates' validity and sample uniqueness. The results showed that, with our GAN, the performance of a traditional subdomain enumeration workflow increased by up to 61%. In addition, according to our ground truth experiments, the GAN was able to guess, on average, 32% of subdomains.

Generative adversarial networks for subdomain enumeration

Francesco Bergadano;
2022-01-01

Abstract

Subdomain enumeration is a fundamental step of many security processes (i.e., vulnerability discovery, OSINT, host enumeration, etc.). Up to now, this has been achieved with deterministic procedures that have shown some limitations. For instance, the process typically requires the generation of a candidate, which is subsequently checked for validity. While the validation is a straightforward procedure, the definition of an optimal candidate generation strategy is still an open problem. This paper presents a novel subdomain enumeration tool that allows the generation of high-quality sub-domain candidates. We employ a Generative Adversarial Network (GAN) to sample unseen candidates from the distribution of valid subdomain names. The model learns this distribution from publicly available datasets. Moreover, by sampling from the trained model, we address the limitations of traditional algorithms. Our experiments were carried out against 15 domains and a ground truth of 1164 other targets. The 15 domains were carefully selected from bug bounty platforms to avoid terms of use violations. Several factors influenced the choices, including the popularity, the expected number of subdomains, and the available services. Our experiments aim to validate our approach by testing the performance increase in subdomain enumeration processes against the state-of-the-art. We benchmark our proposal in terms of candidates' validity and sample uniqueness. The results showed that, with our GAN, the performance of a traditional subdomain enumeration workflow increased by up to 61%. In addition, according to our ground truth experiments, the GAN was able to guess, on average, 32% of subdomains.
2022
37th ACM/SIGAPP Symposium On Applied Computing
virtual conference
April 25 - April 29, 2022
Proceedings of the 37th ACM/SIGAPP Symposium On Applied Computing
ACM
1636
1645
978-1-4503-8713-2
https://dl.acm.org/doi/abs/10.1145/3477314.3506967
web application security, subdomain enumeration, vulnerability detection
Luca Degani, Francesco Bergadano, Seyed Ali Mirheidari, Fabio Martinelli, Bruno Crispo
File in questo prodotto:
File Dimensione Formato  
SAC2022paper.pdf

Accesso riservato

Descrizione: articolo in atti di congresso - copyright standard di ACM
Tipo di file: PDF EDITORIALE
Dimensione 1.2 MB
Formato Adobe PDF
1.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1862162
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact