A Copy detection Method for Malayalam Text Documents using N-grams Model


Autoria(s): Sumam, Mary Idicula; Bindu, Baby Thomas; Sindhu, L
Data(s)

18/07/2014

18/07/2014

09/02/2013

Resumo

In this paper a method of copy detection in short Malayalam text passages is proposed. Given two passages one as the source text and another as the copied text it is determined whether the second passage is plagiarized version of the source text. An algorithm for plagiarism detection using the n-gram model for word retrieval is developed and found tri-grams as the best model for comparing the Malayalam text. Based on the probability and the resemblance measures calculated from the n-gram comparison , the text is categorized on a threshold. Texts are compared by variable length n-gram(n={2,3,4}) comparisons. The experiments show that trigram model gives the average acceptable performance with affordable cost in terms of complexity

Cochin University Of Science And Technology

Identificador

http://dyuthi.cusat.ac.in/purl/4104

Idioma(s)

en

Palavras-Chave #Copy detection #N-gram Model #Bi-gram #Tri-gram #Malayalam #Plagiarism
Tipo

Article