Statistically-constrained shallow text marking:Techniques, evaluation paradigm and results
Data(s) |
01/01/2007
|
---|---|
Resumo |
We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of structural and semantic fit, using both lexical resources, and the web as a corpus. A representative sample of marks is given to 25 human judges to evaluate for acceptability and preservation of meaning. This establishes a correlation between corpus based felicity measures and perceived quality, and makes qualified predictions. Grammatical acceptability correlates with our automatic measure strongly (Pearson's r = 0.795, p = 0.001), allowing us to account for about two thirds of variability in human judgements. A moderate but statistically insignificant (Pearson's r = 0.422, p = 0.356) correlation is found with judgements of meaning preservation, indicating that the contextual window of five content words used for our automatic measure may need to be extended. © 2007 SPIE-IS&T. |
Identificador |
http://www.scopus.com/inward/record.url?eid=2-s2.0-34548217225&partnerID=8YFLogxK |
Idioma(s) |
eng |
Direitos |
info:eu-repo/semantics/restrictedAccess |
Fonte |
Murphy , B & Vogel , C 2007 , Statistically-constrained shallow text marking : Techniques, evaluation paradigm and results . in Proceedings of SPIE - The International Society for Optical Engineering . vol. 6505 . |
Tipo |
contributionToPeriodical |