The quantitative approach to morphological productivity first proposed by Baayen crucially refers to the relation between the number of hapax legomena formed with a given affix occurring in a sufficiently large corpus and the total number of tokens of that affix sampled in the corpus. Most criticism against this measure focuses on its neglecting the role played by frequency in the evaluation of productivity. As an improvement of Baayen’s procedure, a variable-corpus approach is proposed. Accordingly, the productivity values should be calculated at equal token numbers for different affixes instead of taking the different token numbers which result from sampling the whole corpus for all affixes, as in Baayen’s works. This implies that variably-sized subcorpora must be sampled to compare affixes displaying different frequencies. On the basis of a 75,000,000-tokens newspaper corpus, the productivity values for several Italian affixes in the deverbal and deadjectival domain are calculated. The resulting rank proves linguistically plausible, avoiding the overestimation of productivity for low-frequency affixes typically occurring in fixed-corpus calculations. As a further advantage, the procedure proposed here makes it possible to deal satisfactorily with two problematic aspects usually neglected in previous investigations, namely the quantitative impact of (i) allomorphies and lexicalizations and (ii) inner-cycle derivations on productivity measures.

Productivity in Italian word formation: A variable-corpus approach

GAETA, Livio;RICCA, Davide
2006-01-01

Abstract

The quantitative approach to morphological productivity first proposed by Baayen crucially refers to the relation between the number of hapax legomena formed with a given affix occurring in a sufficiently large corpus and the total number of tokens of that affix sampled in the corpus. Most criticism against this measure focuses on its neglecting the role played by frequency in the evaluation of productivity. As an improvement of Baayen’s procedure, a variable-corpus approach is proposed. Accordingly, the productivity values should be calculated at equal token numbers for different affixes instead of taking the different token numbers which result from sampling the whole corpus for all affixes, as in Baayen’s works. This implies that variably-sized subcorpora must be sampled to compare affixes displaying different frequencies. On the basis of a 75,000,000-tokens newspaper corpus, the productivity values for several Italian affixes in the deverbal and deadjectival domain are calculated. The resulting rank proves linguistically plausible, avoiding the overestimation of productivity for low-frequency affixes typically occurring in fixed-corpus calculations. As a further advantage, the procedure proposed here makes it possible to deal satisfactorily with two problematic aspects usually neglected in previous investigations, namely the quantitative impact of (i) allomorphies and lexicalizations and (ii) inner-cycle derivations on productivity measures.
2006
44, 1
57
89
morphology; derivation; Italian; corpus linguistics; quantitative productivity; hapax legomena
Livio Gaeta; Davide Ricca
File in questo prodotto:
File Dimensione Formato  
Linguistics.2006.003.pdf

Accesso riservato

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 254.13 kB
Formato Adobe PDF
254.13 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
2006_ling.2006.003.pdf

Accesso aperto

Dimensione 222.14 kB
Formato Unknown
222.14 kB Unknown Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/101527
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact