The problem of learning a classifier from examples is a fundamental task in Machine Learnig and is nowadays actively studied. Most approaches follow a Discrimination based paradigm, where the aim is to find the best way to separate the examples of one class from those of other classes. A less popular approach follows a Characterization based paradigm, where the aim is to build a description (model) of every class from examples, and use this model to recognizes instances of that class. This paper focuses on the specific task of classifying and tagging symbolic sequences, by introducing a Characterization approach, based on Structured Hidden Markov Models, and compares its perfomances against a widely used discriminative approach, i.e. kernel machines. This task is particulary relevant to several applications in many fields, such as molecular biology, web log analysis, network traffic analysis, user profiling, and so on. In order to assess the validity of the proposed approach, an artificial benchmark has been designed in such a way that the regularities to be discovered are well known and allow for a controlled evaluation of the real capabilities the learning algorithms investigated. The obtained results allow to point out the major advantages and weaknesses of the investigated approaches in the specific classification task addressed.

Structured Hidden Markov Model versus String Kernel Machines for Symbolic Sequence Classification

BOTTA, Marco;
2010-01-01

Abstract

The problem of learning a classifier from examples is a fundamental task in Machine Learnig and is nowadays actively studied. Most approaches follow a Discrimination based paradigm, where the aim is to find the best way to separate the examples of one class from those of other classes. A less popular approach follows a Characterization based paradigm, where the aim is to build a description (model) of every class from examples, and use this model to recognizes instances of that class. This paper focuses on the specific task of classifying and tagging symbolic sequences, by introducing a Characterization approach, based on Structured Hidden Markov Models, and compares its perfomances against a widely used discriminative approach, i.e. kernel machines. This task is particulary relevant to several applications in many fields, such as molecular biology, web log analysis, network traffic analysis, user profiling, and so on. In order to assess the validity of the proposed approach, an artificial benchmark has been designed in such a way that the regularities to be discovered are well known and allow for a controlled evaluation of the real capabilities the learning algorithms investigated. The obtained results allow to point out the major advantages and weaknesses of the investigated approaches in the specific classification task addressed.
2010
Advances in Machine Learning I
Springer
262/2010
275
295
9783642051760
http://www.springerlink.com/content/10v31x63l6p85j6k/
Ugo Galassi; Marco Botta; Lorenza Saitta
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/71242
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact