6 resultados para Java language
em Cochin University of Science
Resumo:
This work is aimed at building an adaptable frame-based system for processing Dravidian languages. There are about 17 languages in this family and they are spoken by the people of South India.Karaka relations are one of the most important features of Indian languages. They are the semabtuco-syntactic relations between verbs and other related constituents in a sentence. The karaka relations and surface case endings are analyzed for meaning extraction. This approach is comparable with the borad class of case based grammars.The efficiency of this approach is put into test in two applications. One is machine translation and the other is a natural language interface (NLI) for information retrieval from databases. The system mainly consists of a morphological analyzer, local word grouper, a parser for the source language and a sentence generator for the target language. This work make contributios like, it gives an elegant account of the relation between vibhakthi and karaka roles in Dravidian languages. This mapping is elegant and compact. The same basic thing also explains simple and complex sentence in these languages. This suggests that the solution is not just ad hoc but has a deeper underlying unity. This methodology could be extended to other free word order languages. Since the frame designed for meaning representation is general, they are adaptable to other languages coming in this group and to other applications.
Resumo:
This work aims to study the variation in subduction zone geometry along and across the arc and the fault pattern within the subducting plate. Depth of penetration as well as the dip of the Benioff zone varies considerably along the arc which corresponds to the curvature of the fold- thrust belt which varies from concave to convex in different sectors of the arc. The entire arc is divided into 27 segments and depth sections thus prepared are utilized to investigate the average dip of the Benioff zone in the different parts of the entire arc, penetration depth of the subducting lithosphere, the subduction zone geometry underlying the trench, the arctrench gap, etc.The study also describes how different seismogenic sources are identified in the region, estimation of moment release rate and deformation pattern. The region is divided into broad seismogenic belts. Based on these previous studies and seismicity Pattern, we identified several broad distinct seismogenic belts/sources. These are l) the Outer arc region consisting of Andaman-Nicobar islands 2) the back-arc Andaman Sea 3)The Sumatran fault zone(SFZ)4)Java onshore region termed as Jave Fault Zone(JFZ)5)Sumatran fore arc silver plate consisting of Mentawai fault(MFZ)6) The offshore java fore arc region 7)The Sunda Strait region.As the Seismicity is variable,it is difficult to demarcate individual seismogenic sources.Hence, we employed a moving window method having a window length of 3—4° and with 50% overlapping starting from one end to the other. We succeeded in defining 4 sources each in the Andaman fore arc and Back arc region, 9 such sources (moving windows) in the Sumatran Fault zone (SFZ), 9 sources in the offshore SFZ region and 7 sources in the offshore Java region. Because of the low seismicity along JFZ, it is separated into three seismogenic sources namely West Java, Central Java and East Java. The Sunda strait is considered as a single seismogenic source.The deformation rates for each of the seismogenic zones have been computed. A detailed error analysis of velocity tensors using Monte—Carlo simulation method has been carried out in order to obtain uncertainties. The eigen values and the respective eigen vectors of the velocity tensor are computed to analyze the actual deformation pattem for different zones. The results obtained have been discussed in the light of regional tectonics, and their implications in terms of geodynamics have been enumerated.ln the light of recent major earthquakes (26th December 2004 and 28th March 2005 events) and the ongoing seismic activity, we have recalculated the variation in the crustal deformation rates prior and after these earthquakes in Andaman—Sumatra region including the data up to 2005 and the significant results has been presented.ln this chapter, the down going lithosphere along the subduction zone is modeled using the free air gravity data by taking into consideration the thickness of the crustal layer, the thickness of the subducting slab, sediment thickness, presence of volcanism, the proximity of the continental crust etc. Here a systematic and detailed gravity interpretation constrained by seismicity and seismic data in the Andaman arc and the Andaman Sea region in order to delineate the crustal structure and density heterogeneities a Io nagnd across the arc and its correlation with the seismogenic behaviour is presented.
Resumo:
This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements
Resumo:
Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.
Resumo:
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.
Resumo:
The span of writer identification extends to broad domes like digital rights administration, forensic expert decisionmaking systems, and document analysis systems and so on. As the success rate of a writer identification scheme is highly dependent on the features extracted from the documents, the phase of feature extraction and therefore selection is highly significant for writer identification schemes. In this paper, the writer identification in Malayalam language is sought for by utilizing feature extraction technique such as Scale Invariant Features Transform (SIFT).The schemes are tested on a test bed of 280 writers and performance evaluated