CINECA IRIS Institutional Research Information System

Detecting explicit user actions, i.e., requests for web pages such as hyper-link clicks, from passive traces is fundamental for many applications, such as network forensics or content popularity estimation. Every URL explicitly visited by a user usually triggers further automatic URL requests to obtain all objects that compose the web page. HTTP traces provide a summary of all URLs requested by users, but no information that could be used to separate explicit from automatic requests. Previous works have targeted this problem and ad-hoc heuristics have been proposed. Validation has been typically done using synthetic traces. This paper investigates whether an approach based solely on machine learning can successfully detect user actions from HTTP traces. A machine learning approach would come with many advantages - e.g., it minimizes manual tuning of parameters and can easily adapt to page structure changes. We build both real and synthetic traces to assess the performance and gain insights on the features that bring most advantages in classification. Our results show that machine learning reaches similar or better performance as previous heuristics. Furthermore, we show that models built with machine learning algorithms are robust, presenting consistent performance in different scenarios.

Detecting user actions from HTTP traces: Toward an automatic approach

VASSIO, LUCA;DRAGO, IDILIO;MELLIA, Marco

2016-01-01

Abstract

Detecting explicit user actions, i.e., requests for web pages such as hyper-link clicks, from passive traces is fundamental for many applications, such as network forensics or content popularity estimation. Every URL explicitly visited by a user usually triggers further automatic URL requests to obtain all objects that compose the web page. HTTP traces provide a summary of all URLs requested by users, but no information that could be used to separate explicit from automatic requests. Previous works have targeted this problem and ad-hoc heuristics have been proposed. Validation has been typically done using synthetic traces. This paper investigates whether an approach based solely on machine learning can successfully detect user actions from HTTP traces. A machine learning approach would come with many advantages - e.g., it minimizes manual tuning of parameters and can easily adapt to page structure changes. We build both real and synthetic traces to assess the performance and gain insights on the features that bring most advantages in classification. Our results show that machine learning reaches similar or better performance as previous heuristics. Furthermore, we show that models built with machine learning algorithms are robust, presenting consistent performance in different scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Titolo dell'evento
	
				7th International Workshop on TRaffic Analysis and Characterization
			
	Luogo dell'evento
	
				Paphos, Cyprus
			
	Data dell'evento
	
				September 2016
			
	Titolo del volume
	
				Wireless Communications and Mobile Computing Conference (IWCMC), 2016 International
			
	Nome editore
	
				IEEE
			
	Pagine (da)
	
				50
			
	Pagine (a)
	
				55
			
	Codice ISBN
	
				978-1-5090-0304-4
			
	DOI
	
				https://dx.doi.org/10.1109/IWCMC.2016.7577032
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				http://ieeexplore.ieee.org/document/7577032/
			
	Parole Chiave
	
				Browsers; Training; Web pages; History; Selenium; Uniform resource locators; Machine learning algorithms
			
	Tutti gli autori
	
						VASSIO, LUCA; DRAGO, IDILIO; MELLIA, Marco
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
07577032.pdf Accesso riservato Dimensione 628.3 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	628.3 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1767116

Citazioni

ND

18

15

social impact