861 resultados para Fiction Authorship


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we explore the use of text-mining methods for the identification of the author of a text. We apply the support vector machine (SVM) to this problem, as it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of a text. We performed a number of experiments with texts from a German newspaper. With nearly perfect reliability the SVM was able to reject other authors and detected the target author in 60–80% of the cases. In a second experiment, we ignored nouns, verbs and adjectives and replaced them by grammatical tags and bigrams. This resulted in slightly reduced performance. Author detection with SVMs on full word forms was remarkably robust even if the author wrote about different topics.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In response to Chaski’s article (published in this volume) an examination is made of the methodological understanding necessary to identify dependable markers for forensic (and general) authorship attribution work. This examination concentrates on three methodological areas of concern which researchers intending to identify markers of authorship must address. These areas are sampling linguistic data, establishing the reliability of authorship markers and establishing the validity of authorship markers. It is suggested that the complexity of sampling problems in linguistic data is often underestimated and that theoretical issues in this area are both difficult and unresolved. It is further argued that the concepts of reliability and validity must be well understood and accounted for in any attempts to identify authorship markers and that largely this is not done. Finally, Principal Component Analysis is identified as an alternative approach which avoids some of the methodological problems inherent in identifying reliable, valid markers of authorship.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The judicial interest in ‘scientific’ evidence has driven recent work to quantify results for forensic linguistic authorship analysis. Through a methodological discussion and a worked example this paper examines the issues which complicate attempts to quantify results in work. The solution suggested to some of the difficulties is a sampling and testing strategy which helps to identify potentially useful, valid and reliable markers of authorship. An important feature of the sampling strategy is that these markers identified as being generally valid and reliable are retested for use in specific authorship analysis cases. The suggested approach for drawing quantified conclusions combines discriminant function analysis and Bayesian likelihood measures. The worked example starts with twenty comparison texts for each of three potential authors and then uses a progressively smaller comparison corpus, reducing to fifteen, ten, five and finally three texts per author. This worked example demonstrates how reducing the amount of data affects the way conclusions can be drawn. With greater numbers of reference texts quantified and safe attributions are shown to be possible, but as the number of reference texts reduces the analysis shows how the conclusion which should be reached is that no attribution can be made. The testing process at no point results in instances of a misattribution.