56 resultados para Malayalam language

em Cochin University of Science


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The span of writer identification extends to broad domes like digital rights administration, forensic expert decisionmaking systems, and document analysis systems and so on. As the success rate of a writer identification scheme is highly dependent on the features extracted from the documents, the phase of feature extraction and therefore selection is highly significant for writer identification schemes. In this paper, the writer identification in Malayalam language is sought for by utilizing feature extraction technique such as Scale Invariant Features Transform (SIFT).The schemes are tested on a test bed of 280 writers and performance evaluated

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis summarizes the results on the studies on a syntax based approach for translation between Malayalam, one of Dravidian languages and English and also on the development of the major modules in building a prototype machine translation system from Malayalam to English. The development of the system is a pioneering effort in Malayalam language unattempted by previous researchers. The computational models chosen for the system is first of its kind for Malayalam language. An in depth study has been carried out in the design of the computational models and data structures needed for different modules: morphological analyzer , a parser, a syntactic structure transfer module and target language sentence generator required for the prototype system. The generation of list of part of speech tags, chunk tags and the hierarchical dependencies among the chunks required for the translation process also has been done. In the development process, the major goals are: (a) accuracy of translation (b) speed and (c) space. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required. The space requirement also has to be minimised

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A methodology for translating text from English into the Dravidian language, Malayalam using statistical models is discussed in this paper. The translator utilizes a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase and generates automatically the Malayalam translation of an unseen English sentence. Various techniques to improve the alignment model by incorporating the morphological inputs into the bilingual corpus are discussed. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in producing better alignments. Difficulties in translation process that arise due to the structural difference between the English Malayalam pair is resolved in the decoding phase by applying the order conversion rules. The handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Suffix separation plays a vital role in improving the quality of training in the Statistical Machine Translation from English into Malayalam. The morphological richness and the agglutinative nature of Malayalam make it necessary to retrieve the root word from its inflected form in the training process. The suffix separation process accomplishes this task by scrutinizing the Malayalam words and by applying sandhi rules. In this paper, various handcrafted rules designed for the suffix separation process in the English Malayalam SMT are presented. A classification of these rules is done based on the Malayalam syllable preceding the suffix in the inflected form of the word (check_letter). The suffixes beginning with the vowel sounds like ആല, ഉെെ, ഇല etc are mainly considered in this process. By examining the check_letter in a word, the suffix separation rules can be directly applied to extract the root words. The quick look up table provided in this paper can be used as a guideline in implementing suffix separation in Malayalam language

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Optical Character Recognition plays an important role in Digital Image Processing and Pattern Recognition. Even though ambient study had been performed on foreign languages like Chinese and Japanese, effort on Indian script is still immature. OCR in Malayalam language is more complex as it is enriched with largest number of characters among all Indian languages. The challenge of recognition of characters is even high in handwritten domain, due to the varying writing style of each individual. In this paper we propose a system for recognition of offline handwritten Malayalam vowels. The proposed method uses Chain code and Image Centroid for the purpose of extracting features and a two layer feed forward network with scaled conjugate gradient for classification

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we propose a handwritten character recognition system for Malayalam language. The feature extraction phase consists of gradient and curvature calculation and dimensionality reduction using Principal Component Analysis. Directional information from the arc tangent of gradient is used as gradient feature. Strength of gradient in curvature direction is used as the curvature feature. The proposed system uses a combination of gradient and curvature feature in reduced dimension as the feature vector. For classification, discriminative power of Support Vector Machine (SVM) is evaluated. The results reveal that SVM with Radial Basis Function (RBF) kernel yield the best performance with 96.28% and 97.96% of accuracy in two different datasets. This is the highest accuracy ever reported on these datasets

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Performance of any continuous speech recognition system is dependent on the accuracy of its acoustic model. Hence, preparation of a robust and accurate acoustic model lead to satisfactory recognition performance for a speech recognizer. In acoustic modeling of phonetic unit, context information is of prime importance as the phonemes are found to vary according to the place of occurrence in a word. In this paper we compare and evaluate the effect of context dependent tied (CD tied) models, context dependent (CD) and context independent (CI) models in the perspective of continuous speech recognition of Malayalam language. The database for the speech recognition system has utterance from 21 speakers including 11 female and 10 males. Our evaluation results show that CD tied models outperforms CI models over 21%.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objective of the study is to develop a hand written character recognition system that could recognisze all the characters in the mordern script of malayalam language at a high recognition rate

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a writer identification scheme for Malayalam documents. As the accomplishment rate of a scheme is highly dependent on the features extracted from the documents, the process of feature selection and extraction is highly relevant. The paper describes a set of novel features exclusively for Malayalam language. The features were studied in detail which resulted in a comparative study of all the features. The features are fused to form the feature vector or knowledge vector. This knowledge vector is then used in all the phases of the writer identification scheme. The scheme has been tested on a test bed of 280 writers of which 50 writers having only one page, 215 writers with at least 2 pages and 15 writers with at least 4 pages. To perform a comparative evaluation of the scheme the test is conducted using WD-LBP method also. A recognition rate of around 95% was obtained for the proposed approach

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Medical fields requires fast, simple and noninvasive methods of diagnostic techniques. Several methods are available and possible because of the growth of technology that provides the necessary means of collecting and processing signals. The present thesis details the work done in the field of voice signals. New methods of analysis have been developed to understand the complexity of voice signals, such as nonlinear dynamics aiming at the exploration of voice signals dynamic nature. The purpose of this thesis is to characterize complexities of pathological voice from healthy signals and to differentiate stuttering signals from healthy signals. Efficiency of various acoustic as well as non linear time series methods are analysed. Three groups of samples are used, one from healthy individuals, subjects with vocal pathologies and stuttering subjects. Individual vowels/ and a continuous speech data for the utterance of the sentence "iruvarum changatimaranu" the meaning in English is "Both are good friends" from Malayalam language are recorded using a microphone . The recorded audio are converted to digital signals and are subjected to analysis.Acoustic perturbation methods like fundamental frequency (FO), jitter, shimmer, Zero Crossing Rate(ZCR) were carried out and non linear measures like maximum lyapunov exponent(Lamda max), correlation dimension (D2), Kolmogorov exponent(K2), and a new measure of entropy viz., Permutation entropy (PE) are evaluated for all three groups of the subjects. Permutation Entropy is a nonlinear complexity measure which can efficiently distinguish regular and complex nature of any signal and extract information about the change in dynamics of the process by indicating sudden change in its value. The results shows that nonlinear dynamical methods seem to be a suitable technique for voice signal analysis, due to the chaotic component of the human voice. Permutation entropy is well suited due to its sensitivity to uncertainties, since the pathologies are characterized by an increase in the signal complexity and unpredictability. Pathological groups have higher entropy values compared to the normal group. The stuttering signals have lower entropy values compared to the normal signals.PE is effective in charaterising the level of improvement after two weeks of speech therapy in the case of stuttering subjects. PE is also effective in characterizing the dynamical difference between healthy and pathological subjects. This suggests that PE can improve and complement the recent voice analysis methods available for clinicians. The work establishes the application of the simple, inexpensive and fast algorithm of PE for diagnosis in vocal disorders and stuttering subjects.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Farm communication and extension programs are vital part of the farm development attempts. Electronic media plays a major role in farm extension activities. Kerala, the consumer state, which was a complete agricultural state in pre-independence period, is the sprouting land of agricultural extension and publication activities in print media. Later AIR (All India Radio) farm programs and farm broadcasting of Doordarshan enriched the role of electronic media in farm extension activities. The media saturated southern state of India received this new electronic media farm communication revolution whole heartedly. However, after 1990, Kerala witnessed a flood of private T V channels and currently there are 24 channels in this regional language, named Malayalam. All major news and entertainment channels are broadcasting farm programs. Farm programs of AIR and Doordarshan, broadcasted in Malayalam language, have been well accepted to the farmers‘ in Kerala. However, post-independence period, witnessed the formation of Kerala state in Indian Union and the first ballot-elected communist Government started its administration. After the land reform bills, the state witnessed a gradual decrease in agricultural production. Even if it is not reflected much in the attitude and practices of farm community and farm broadcast of traditional electronic broadcasting, a change is observable after the post-liberalization era of India. Private Television channels, which were focused on entertainment value of programs, started broadcasting farm programs and the parameters of program production went through certain changes. In this situation, there is ample relevance for a study about the farm programs of electronic media in terms of a comparative study of audience perception. The study is limited in the state of Kerala as it is the most media saturated state in India. The study analyzes the rate, nature and scope of adoption of farming methods transmitted through electronic media (T.V. and Radio) in Malayalam language.All kinds of Farm programs including comprehensive program serials, success stories, seasonal cropping methods, experts opinion, been analyzed on the basis of the following objectives.  To find whether propagating new farm methods through farm programs in electronic media or the availability of adequate infrastructure and economic factors make a farmer to adopt a new farming method.  To find which electronic media has more influence on farmers to adopt agricultural programs.  To find which form of electronic media gets better feedback from farmers  To find out whether the programs of T.V. or Radio is more acceptable to farmers than the print media.  To find whether farmers gets the message through their preferred medium for the message. The researcher recorded opinions from a panel of agricultural officers, farm Information officers, agro extension researchers and experts. According to their opinions and guidelines, a pilot study is designed and conducted in Kanjikuzhy Panchayath, in Alappuzha district, Kerala. The Panchayath is selected by considering its ideal nature of being the sample for a social Science research. Besides, the nature of farming in the Panchayath, which devoid of the cultivation of cash crops also supported its sample value. As per the observations from the pilot study, researcher confirmed the Triangulation method as the methodology of research. The questionnaire survey, being the primary part contained 42 Questions with 6 independent and 32 dependent variables. The survey is conducted among 400 respondents in Idukki, Alappuzha and Pathanamthitta districts considering geographical differences and distribution of different types of crops. The response from a total of 360 respondents, 120 from each district, finally selected for tabulation and data analysis.The data analysis, based on percentage analysis, along with the results from focus group discussion among a selected group of 20 farmers, together produced the results as follows. Farmers, who are the audience of farm programs, have a very serious approach towards the medium. They are maintaining a critical point of view towards the content of the programs. Farmers are reasonably aware about the financial side of the programs and the monitory aspirations of both private and Government owned Television channels. Even though, the farmers are not aware on the technical terminology and jargons, they have ideas about success stories, program serials and they are even informed about channels are not maintaining an audience research section like AIR. Though the farmers accept Doordarshan as the credential source of farm information and methods, they are inclined to the entertainment value of programs too. They prefer to have more entertainment value for the programs of Doordarshan. Surprisingly, they have very solid suggestions on even about the shots which add entertainment value to the farm broadcasting methods of Doordarshan. Farmers are very much aware about the fact that media is just an instrument for inspiration and persuasion. They strongly believe that the source of information and new methods is agricultural research and an effective change happens only when there are adequate infrastructure and marketing facilities, along with the proper support from Government agricultural guideline and support systems like Krishi Bhavans. They strongly believe that media alone cannot create any magic in increasing agricultural production. Farmers are pointing out the lack of response to the feedback and queries of farmers on farming methods, as an evidence for the difference in levels of commitment of Government and private owned Television channels.Farmers are still perceiving AIR farm programs are far more committed to farmers and farming than any other electronic medium. However, they are seriously lacking Radio receivers with medium wave reception facility. Farmers perceive that the farming methods on new crops are more adoptable than the farming methods of traditional crops in both private and Government owned Television channels. There are multiple factors behind this observation from farmers. Farmers changed in terms of viewing habits and they prefer success stories, which are totally irrelevant and they even think that such stories encourage people to go for farming and they opined that such stories are good sources of inspiration. However, they are all very much sure about the importance and particular about the presence of entertainment factor even in farm programs. Farmers expect direct interaction of any expert of the new farming method to implement the method in their agriculture practices. Though introduction of a new idea in the T.V. is acceptable, farmers need the direct instruction of expert on field to start implementing the new farming practices Farmers still have an affinity towards print media reports and agricultural pages and they have complaints to print media on the removal of agricultural information pages from news papers. They prefer the reports in print media as it facilitates them to collect and refer articles when they need it. Farmers are having an eye of doubt about the credibility of farm programs by private T.V. channels. Even if they prefer private Television channels for listening and adopting new farming methods and other farm information, they scrutinize programs to know whether they are sponsored programs by agrochemical or agro-fertilizer manufacturer.