Comparing type counts


Autoria(s): Säily, Tanja; Suomela, Jukka
Contribuinte(s)

University of Helsinki, Department of Modern Languages

University of Helsinki, Helsinki Institute for Information Technology HIIT

Data(s)

2009

Resumo

This work is a case study of applying nonparametric statistical methods to corpus data. We show how to use ideas from permutation testing to answer linguistic questions related to morphological productivity and type richness. In particular, we study the use of the suffixes -ity and -ness in the 17th-century part of the Corpus of Early English Correspondence within the framework of historical sociolinguistics. Our hypothesis is that the productivity of -ity, as measured by type counts, is significantly low in letters written by women. To test such hypotheses, and to facilitate exploratory data analysis, we take the approach of computing accumulation curves for types and hapax legomena. We have developed an open source computer program which uses Monte Carlo sampling to compute the upper and lower bounds of these curves for one or more levels of statistical significance. By comparing the type accumulation from women’s letters with the bounds, we are able to confirm our hypothesis.

Formato

23

Identificador

http://hdl.handle.net/10138/27924

Idioma(s)

eng

Relação

Corpus Linguistics Refinements and Reassessments

Language and Computers – Studies in Practical Linguistics

Fonte

Säily , T & Suomela , J 2009 , ' Comparing type counts : The case of women, men and -ity in early English letters ' in Corpus Linguistics : Refinements and Reassessments , pp. 87-109 Language and Computers – Studies in Practical Linguistics , vol. 69 .

Tipo

A4 Article in conference publication (refereed)

info:eu-repo/semantics/conferencePaper

info:eu-repo/semantics/acceptedVersion