The quantitative approach to morphological productivity first proposed by Baayen crucially refers to the relation between the number of hapax legomena formed with a given affix occurring in a sufficiently large corpus and the total number of tokens of that affix sampled in the corpus. Most criticism against this measure focuses on its neglecting the role played by frequency in the evaluation of productivity. As an improvement of Baayen’s procedure, a variable-corpus approach is proposed. Accordingly, the productivity values should be calculated at equal token numbers for different affixes instead of taking the different token numbers which result from sampling the whole corpus for all affixes, as in Baayen’s works. This implies that variably-sized subcorpora must be sampled to compare affixes displaying different frequencies. On the basis of a 75,000,000-tokens newspaper corpus, the productivity values for several Italian affixes in the deverbal and deadjectival domain are calculated. The resulting rank proves linguistically plausible, avoiding the overestimation of productivity for low-frequency affixes typically occurring in fixed-corpus calculations. As a further advantage, the procedure proposed here makes it possible to deal satisfactorily with two problematic aspects usually neglected in previous investigations, namely the quantitative impact of (i) allomorphies and lexicalizations and (ii) inner-cycle derivations on productivity measures.
Productivity in Italian word formation: A variable-corpus approach
GAETA, Livio;RICCA, Davide
2006-01-01
Abstract
The quantitative approach to morphological productivity first proposed by Baayen crucially refers to the relation between the number of hapax legomena formed with a given affix occurring in a sufficiently large corpus and the total number of tokens of that affix sampled in the corpus. Most criticism against this measure focuses on its neglecting the role played by frequency in the evaluation of productivity. As an improvement of Baayen’s procedure, a variable-corpus approach is proposed. Accordingly, the productivity values should be calculated at equal token numbers for different affixes instead of taking the different token numbers which result from sampling the whole corpus for all affixes, as in Baayen’s works. This implies that variably-sized subcorpora must be sampled to compare affixes displaying different frequencies. On the basis of a 75,000,000-tokens newspaper corpus, the productivity values for several Italian affixes in the deverbal and deadjectival domain are calculated. The resulting rank proves linguistically plausible, avoiding the overestimation of productivity for low-frequency affixes typically occurring in fixed-corpus calculations. As a further advantage, the procedure proposed here makes it possible to deal satisfactorily with two problematic aspects usually neglected in previous investigations, namely the quantitative impact of (i) allomorphies and lexicalizations and (ii) inner-cycle derivations on productivity measures.File | Dimensione | Formato | |
---|---|---|---|
Linguistics.2006.003.pdf
Accesso riservato
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
254.13 kB
Formato
Adobe PDF
|
254.13 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
2006_ling.2006.003.pdf
Accesso aperto
Dimensione
222.14 kB
Formato
Unknown
|
222.14 kB | Unknown | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.