Statistically-constrained shallow text marking:Techniques, evaluation paradigm and results


Autoria(s): Murphy, B.; Vogel, C.
Data(s)

01/01/2007

Resumo

We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of structural and semantic fit, using both lexical resources, and the web as a corpus. A representative sample of marks is given to 25 human judges to evaluate for acceptability and preservation of meaning. This establishes a correlation between corpus based felicity measures and perceived quality, and makes qualified predictions. Grammatical acceptability correlates with our automatic measure strongly (Pearson's r = 0.795, p = 0.001), allowing us to account for about two thirds of variability in human judgements. A moderate but statistically insignificant (Pearson's r = 0.422, p = 0.356) correlation is found with judgements of meaning preservation, indicating that the contextual window of five content words used for our automatic measure may need to be extended. © 2007 SPIE-IS&T.

Identificador

http://pure.qub.ac.uk/portal/en/publications/statisticallyconstrained-shallow-text-marking(b4373446-9063-4714-a235-3d4bbe663d91).html

http://www.scopus.com/inward/record.url?eid=2-s2.0-34548217225&partnerID=8YFLogxK

Idioma(s)

eng

Direitos

info:eu-repo/semantics/restrictedAccess

Fonte

Murphy , B & Vogel , C 2007 , Statistically-constrained shallow text marking : Techniques, evaluation paradigm and results . in Proceedings of SPIE - The International Society for Optical Engineering . vol. 6505 .

Tipo

contributionToPeriodical