The paper describes the methodology which is currently being defined for the construction of a “Merged Italian Dependency Treebank” (MIDT) starting from already existing resources. In particular, it reports the results of a case study carried out on two available dependency treebanks, i.e. TUT and ISST–TANL. The issues raised during the comparison of the annotation schemes underlying the two treebanks are discussed and investigated with a particular emphasis on the definition of a set of linguistic categories to be used as a “bridge” between the specific schemes. As an encoding format, the CoNLL de facto standard is used.
Harmonization and Merging of two Italian Dependency Treebanks
BOSCO, CRISTINA;
2012-01-01
Abstract
The paper describes the methodology which is currently being defined for the construction of a “Merged Italian Dependency Treebank” (MIDT) starting from already existing resources. In particular, it reports the results of a case study carried out on two available dependency treebanks, i.e. TUT and ISST–TANL. The issues raised during the comparison of the annotation schemes underlying the two treebanks are discussed and investigated with a particular emphasis on the definition of a set of linguistic categories to be used as a “bridge” between the specific schemes. As an encoding format, the CoNLL de facto standard is used.File | Dimensione | Formato | |
---|---|---|---|
BoscoMontemSimiLREC12wsMerging.pdf
Accesso aperto
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
180.54 kB
Formato
Adobe PDF
|
180.54 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.