19 resultados para Natural language techniques, Semantic spaces, Random projection, Documents

em Cochin University of Science


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work is aimed at building an adaptable frame-based system for processing Dravidian languages. There are about 17 languages in this family and they are spoken by the people of South India.Karaka relations are one of the most important features of Indian languages. They are the semabtuco-syntactic relations between verbs and other related constituents in a sentence. The karaka relations and surface case endings are analyzed for meaning extraction. This approach is comparable with the borad class of case based grammars.The efficiency of this approach is put into test in two applications. One is machine translation and the other is a natural language interface (NLI) for information retrieval from databases. The system mainly consists of a morphological analyzer, local word grouper, a parser for the source language and a sentence generator for the target language. This work make contributios like, it gives an elegant account of the relation between vibhakthi and karaka roles in Dravidian languages. This mapping is elegant and compact. The same basic thing also explains simple and complex sentence in these languages. This suggests that the solution is not just ad hoc but has a deeper underlying unity. This methodology could be extended to other free word order languages. Since the frame designed for meaning representation is general, they are adaptable to other languages coming in this group and to other applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of this work was developing a query processing system using software agents. Open Agent Architecture framework is used for system development. The system supports queries in both Hindi and Malayalam; two prominent regional languages of India. Natural language processing techniques are used for meaning extraction from the plain query and information from database is given back to the user in his native language. The system architecture is designed in a structured way that it can be adapted to other regional languages of India. . This system can be effectively used in application areas like e-governance, agriculture, rural health, education, national resource planning, disaster management, information kiosks etc where people from all walks of life are involved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of this work is to develop an Open Agent Architecture for Multilingual information retrieval from Relational Database. The query for information retrieval can be given in plain Hindi or Malayalam; two prominent regional languages of India. The system supports distributed processing of user requests through collaborating agents. Natural language processing techniques are used for meaning extraction from the plain query and information is given back to the user in his/ her native language. The system architecture is designed in a structured way so that it can be adapted to other regional languages of India

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper compares statistical technique of paraphrase identification to semantic technique of paraphrase identification. The statistical techniques used for comparison are word set and word-order based methods where as the semantic technique used is the WordNet similarity matrix method described by Stevenson and Fernando in [3].

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study is about the stability of random sums and extremes.The difficulty in finding exact sampling distributions resulted in considerable problems of computing probabilities concerning the sums that involve a large number of terms.Functions of sample observations that are natural interest other than the sum,are the extremes,that is , the minimum and the maximum of the observations.Extreme value distributions also arise in problems like the study of size effect on material strengths,the reliability of parallel and series systems made up of large number of components,record values and assessing the levels of air pollution.It may be noticed that the theories of sums and extremes are mutually connected.For instance,in the search for asymptotic normality of sums ,it is assumed that at least the variance of the population is finite.In such cases the contributions of the extremes to the sum of independent and identically distributed(i.i.d) r.vs is negligible.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Polymers exhibit low electron density and they are radiolucent. Polymers can be made radiopaque by different techniques. We report a method for the preparation of radiopaque material from natural rubber (NR). NR in its latex form was iodinated. Iodinated natural rubber (INR) was characterized by using UV, thermo gravimetric analysis (TGA), and X-ray images. INR was compounded at high and low temperatures and its physical properties were measured. The low temperature cured samples show good radiopacity and conductivity. The optical density of low temperature cured samples was measured.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Compounding of styrene-butadiene copolymer/polybutadiene , natural rubber/ ethylene-propylene-diene terpolymer and natural rubber/butadiene-acrylonitrile copolymer blends was done in three different ways and their curing behaviour and the tensile properties of the es are compared.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ABSTRACT: Zinc salts of ethyl, isopropyl, and butyl xanthates were prepared in the laboratory. They were purified by reprecipitation and were characterized by IR, NMR, and thermogravimetric analysis techniques. The melting points were also determined. The rubber compounds with different xanthate accelerators were cured at temperatures from 30 to 150°C. The sheets were molded and properties such as tensile strength, tear strength, crosslink density, elongation at break, and modulus at 300% elongation were evaluated. The properties showed that all three xanthate accelerators are effective for room temperature curing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Chemically modified novel thermo-reversible zinc sulphonated ionomers based on natural rubber (NR), radiation induced styrene grafted natural rubber (RI-SGNR), and chemically induced styrene grafted natural rubber (CI-SGNR) were synthesized using acetyl sulphate/zinc acetate reagent system. Evidence for the attachment of sulphonate groups has been furnished by FTIR spectra. which was supplanted by FTNMR results. Estimation of the zinc sulphonate group was done using spectroscopic techniques such as XRFS and ICPAES. The TGA results prove improvement in the therrno-oxidative stability of the modified natural rubber. Both DSC and DMTA studies show that the incorporation of the ionic groups affect the thermal transition of the base polymer. Retention of the improved physical properties of the novel ionomers even after three repeated cycles of mastication and molding at 120 degree C may be considered as the evidence for the reprocessabiJity of the ionomer. Effect of both particulate (carbon black. silica & zinc stearate) and fibrous fillers (nylon & glass) on the properties of the radiation induced styrene grafted natural rubber ionomer has been evaluated. Incorporation of HAF carbon black results in maximum improvement in physical properties. Silica reinforces the backbone chain and weakens the ionic associations. Zinc stearate plays the dual role of reinforcement and ptasticization. The nylon and glass filled lonorner compounds show good improvement in the physical properties in comparison with the neat ionomer. Dispersion and adhesion of the fillers in the ionomer matrix has been amply supported by their SEM micrographs. Microwave probing of the electrical behavior of the 26.5 ZnSRISGNR ionomer reveals that the maximum relative complex conductivity and the complex permittivity appear at the frequency of 2.6 GHz. The complex conductivity of the base polymer increases from 1.8x 10.12 S/cm to 3.3xlO·4 S/cm. Influence of fillers on the dielectric constant and conductivity of the new ionic thermoplastic elastomer has been studied. The ionomer I nylon compound shows the highest microwave conductivity. Use of the 26.5 ZnS-RISGNR ionomer as a compatibilizer for obtaining the technologically compatible blends from the immiscible SBR/NBR system has been verified. The heat fugitive ionic cross-linked natural rubber may be, therefore, useful as an alternative to vulcanized rubber and thermoplastic elastomer

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present research problem is to study the existing encryption methods and to develop a new technique which is performance wise superior to other existing techniques and at the same time can be very well incorporated in the communication channels of Fault Tolerant Hard Real time systems along with existing Error Checking / Error Correcting codes, so that the intention of eaves dropping can be defeated. There are many encryption methods available now. Each method has got it's own merits and demerits. Similarly, many crypt analysis techniques which adversaries use are also available.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Medical fields requires fast, simple and noninvasive methods of diagnostic techniques. Several methods are available and possible because of the growth of technology that provides the necessary means of collecting and processing signals. The present thesis details the work done in the field of voice signals. New methods of analysis have been developed to understand the complexity of voice signals, such as nonlinear dynamics aiming at the exploration of voice signals dynamic nature. The purpose of this thesis is to characterize complexities of pathological voice from healthy signals and to differentiate stuttering signals from healthy signals. Efficiency of various acoustic as well as non linear time series methods are analysed. Three groups of samples are used, one from healthy individuals, subjects with vocal pathologies and stuttering subjects. Individual vowels/ and a continuous speech data for the utterance of the sentence "iruvarum changatimaranu" the meaning in English is "Both are good friends" from Malayalam language are recorded using a microphone . The recorded audio are converted to digital signals and are subjected to analysis.Acoustic perturbation methods like fundamental frequency (FO), jitter, shimmer, Zero Crossing Rate(ZCR) were carried out and non linear measures like maximum lyapunov exponent(Lamda max), correlation dimension (D2), Kolmogorov exponent(K2), and a new measure of entropy viz., Permutation entropy (PE) are evaluated for all three groups of the subjects. Permutation Entropy is a nonlinear complexity measure which can efficiently distinguish regular and complex nature of any signal and extract information about the change in dynamics of the process by indicating sudden change in its value. The results shows that nonlinear dynamical methods seem to be a suitable technique for voice signal analysis, due to the chaotic component of the human voice. Permutation entropy is well suited due to its sensitivity to uncertainties, since the pathologies are characterized by an increase in the signal complexity and unpredictability. Pathological groups have higher entropy values compared to the normal group. The stuttering signals have lower entropy values compared to the normal signals.PE is effective in charaterising the level of improvement after two weeks of speech therapy in the case of stuttering subjects. PE is also effective in characterizing the dynamical difference between healthy and pathological subjects. This suggests that PE can improve and complement the recent voice analysis methods available for clinicians. The work establishes the application of the simple, inexpensive and fast algorithm of PE for diagnosis in vocal disorders and stuttering subjects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Handwriting is an acquired tool used for communication of one's observations or feelings. Factors that inuence a person's handwriting not only dependent on the individual's bio-mechanical constraints, handwriting education received, writing instrument, type of paper, background, but also factors like stress, motivation and the purpose of the handwriting. Despite the high variation in a person's handwriting, recent results from different writer identification studies have shown that it possesses sufficient individual traits to be used as an identification method. Handwriting as a behavioral biometric has had the interest of researchers for a long time. But recently it has been enjoying new interest due to an increased need and effort to deal with problems ranging from white-collar crime to terrorist threats. The identification of the writer based on a piece of handwriting is a challenging task for pattern recognition. The main objective of this thesis is to develop a text independent writer identification system for Malayalam Handwriting. The study also extends to developing a framework for online character recognition of Grantha script and Malayalam characters

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech is a natural mode of communication for people and speech recognition is an intensive area of research due to its versatile applications. This paper presents a comparative study of various feature extraction methods based on wavelets for recognizing isolated spoken words. Isolated words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. This work includes two speech recognition methods. First one is a hybrid approach with Discrete Wavelet Transforms and Artificial Neural Networks and the second method uses a combination of Wavelet Packet Decomposition and Artificial Neural Networks. Features are extracted by using Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD). Training, testing and pattern recognition are performed using Artificial Neural Networks (ANN). The proposed method is implemented for 50 speakers uttering 20 isolated words each. The experimental results obtained show the efficiency of these techniques in recognizing speech