The development of wide-coverage grammars is at the core of robust NLP systems. This paper addresses the problem of grammar extraction from treebanks with respect to the issue of broad coverage along three dimensions: the grammar formalism (contextfree grammar, dependency grammar, lexicalized tree adjoining grammar), the domain of the annotated corpus (press reports, civil law) and the language of the corpus (English, Korean, Chinese, Italian). We have extracted three grammars from an annotated corpus of Italian and we have comparatively analyzed the coverage of a test set; then, working on two different domain subcorpora we have compared the cross-domain coverage of the extracted grammars; finally, we have compared the grammars for four different languages. The results are that there are relevant differences in coverage among formalisms and domains; a more limited difference appears in the crosslinguistic comparison.
A Comparative Analysis of Extracted Grammars
MAZZEI, Alessandro;LOMBARDO, Vincenzo
2004-01-01
Abstract
The development of wide-coverage grammars is at the core of robust NLP systems. This paper addresses the problem of grammar extraction from treebanks with respect to the issue of broad coverage along three dimensions: the grammar formalism (contextfree grammar, dependency grammar, lexicalized tree adjoining grammar), the domain of the annotated corpus (press reports, civil law) and the language of the corpus (English, Korean, Chinese, Italian). We have extracted three grammars from an annotated corpus of Italian and we have comparatively analyzed the coverage of a test set; then, working on two different domain subcorpora we have compared the cross-domain coverage of the extracted grammars; finally, we have compared the grammars for four different languages. The results are that there are relevant differences in coverage among formalisms and domains; a more limited difference appears in the crosslinguistic comparison.File | Dimensione | Formato | |
---|---|---|---|
mazzei04Analysis.pdf
Accesso aperto
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
88.4 kB
Formato
Adobe PDF
|
88.4 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.