3 resultados para text vector space model

em Cochin University of Science


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes about an English-Malayalam Cross-Lingual Information Retrieval system. The system retrieves Malayalam documents in response to query given in English or Malayalam. Thus monolingual information retrieval is also supported in this system. Malayalam is one of the most prominent regional languages of Indian subcontinent. It is spoken by more than 37 million people and is the native language of Kerala state in India. Since we neither had any full-fledged online bilingual dictionary nor any parallel corpora to build the statistical lexicon, we used a bilingual dictionary developed in house for translation. Other language specific resources like Malayalam stemmer, Malayalam morphological root analyzer etc developed in house were used in this work

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is a recent trend to describe physical phenomena without the use of infinitesimals or infinites. This has been accomplished replacing differential calculus by the finite difference theory. Discrete function theory was first introduced in l94l. This theory is concerned with a study of functions defined on a discrete set of points in the complex plane. The theory was extensively developed for functions defined on a Gaussian lattice. In 1972 a very suitable lattice H: {Ci qmxO,I qnyo), X0) 0, X3) 0, O < q < l, m, n 5 Z} was found and discrete analytic function theory was developed. Very recently some work has been done in discrete monodiffric function theory for functions defined on H. The theory of pseudoanalytic functions is a generalisation of the theory of analytic functions. When the generator becomes the identity, ie., (l, i) the theory of pseudoanalytic functions reduces to the theory of analytic functions. Theugh the theory of pseudoanalytic functions plays an important role in analysis, no discrete theory is available in literature. This thesis is an attempt in that direction. A discrete pseudoanalytic theory is derived for functions defined on H.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper a method of copy detection in short Malayalam text passages is proposed. Given two passages one as the source text and another as the copied text it is determined whether the second passage is plagiarized version of the source text. An algorithm for plagiarism detection using the n-gram model for word retrieval is developed and found tri-grams as the best model for comparing the Malayalam text. Based on the probability and the resemblance measures calculated from the n-gram comparison , the text is categorized on a threshold. Texts are compared by variable length n-gram(n={2,3,4}) comparisons. The experiments show that trigram model gives the average acceptable performance with affordable cost in terms of complexity