3 resultados para Verb phrase ellipsis

em Cochin University of Science


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis entitled Fish habitats and species assemblage in the selected rivers of kerala and investigation on life history traits of puntius carnaticus (JERDON,1849). Ecology is a new and exceedingly complex field of study, even though its concept was recognized by the Apostles in their use of the phrase ‘all flesh is grass.central role to play both in order to understand better the biodiversity phenomenon and to be able to draw up clear guidelines for careful resource management. In a review by WWF, IUCN and UNEP on the ways of conserving genetic diversity of freshwater fish it was recommended that the best way to conserve species diversity is to conserve habitat.The habitat studies in freshwater ecosystems are very essential for the proper understanding and management of human impact on fish diversity, to study the relationship between habitat variables and fish species assemblage structure, quantification of ecosystem degradation, habitat quality and biotic integrity of the ecosystems, development of habitat suitability index (I-ISI) models and classification of river reaches based on their physico-chemical properties. Therefore in the present study an attempt was made to assess the biodiversity potential and the relationship between habitat variables and fish species assemblage structure in six major river systems of Kerala which would be very useful in impressing upon the seriousness of habitat degradIn the present study, in Kabbini river system 15 locations encompassing between 721 946m above MSL were surveyed.ation and biotic devastation undergone in the major river systems of Kerala.During the present study the Habitat Quality Score (HQ) developed by the Ohio EPA was applied for the first time in India.The result of the present study revealed that, among various variables analysed, altitude has a very significant influence in deciding the fish diversity in six major river systems of Kerala. The fish diversity studied on the basis of Shanon-Weiner and Simpson diversity indices revealed that even though some minor variations occur with the suitability and complexity of habitats, the altitude showed inverse relationship with fish diversity.The present study revealed that the National Policy on the interlinking of rivers would permanently alter the HSI indices of the above mentioned fish species, which are now solely protected by the individuality of the rivers where their limited occurrence was notice.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements