26 resultados para computer-based
em Cochin University of Science
Resumo:
Photoplethysmography (PPG) is a simple and inexpensive optical technique that can be used to detect blood volume changes in the microvascular bed of tissues. There has been a resurgence of interest in the technique in recent years, driven by the demand for low cost, simple and portable technology for the primary care and community based clinical settings and the wide availability of low cost and small semiconductor components, and the advancement of computer-based pulse wave analysis techniques. The present research work deals with the design of a PPG sensor for recording the blood volume pulse signals and carry out selected cardiovascular studies based on these signals. The interaction of light with tissue, early and recent history of PPG, instrumentation, measurement protocol and pulse wave analysis are also discussed in this study. The effect of aging, mild cold exposure, and variation in the body posture on the PPG signal have been experimentally studied.
Resumo:
Ship recycling has been considered as the best means to dispose off an obsolete ship. The current state of art of technology combined with the demands of sustainable developments from the global maritime industrial sector has modified the status of erstwhile ‘ship breaking’ involving ship scrap business to a modern industry undertaking dismantling of ships and recycling/reusing the dismantled products in a supply chain of pre owned product market by following the principles of recycling. Industries will have to formulate a set of best practices and blend them with the engineering activities for producing better quality products, improving the productivity and for achieving improved performances related to sustainable development. Improved performance by industries in a sustainable development perspective is accomplished only by implementing the 4E principles, ie.,. ecofriendliness, engineering efficiency, energy conservation and ergonomics in their core operations. The present study has done a comprehensive investigation into various ship recycling operations for formulating a set of best practices.Being the ultimate life cycle stage of a ship, ship recycling activities incorporate certain commercial procedures well in advance to facilitate the objectives of dismantling and recycling/reusing of various parts of the vessel. Thorough knowledge regarding these background procedures in ship recycling is essential for examining and understanding the industrial business operations associated with it. As a first step, the practices followed in merchant shipping operations regarding the decision on decommissioning have been and made available in the thesis. Brief description about the positioning methods and important preparations for the most feasible ship recycling method ie.,. beach method have been provided as a part of the outline of the background information. Available sources of guidelines, codes and rules & regulations for ship recycling have been compiled and included in the discussion.Very brief summary of practices in major ship recycling destinations has been prepared and listed for providing an overview of the global ship recycling activities. The present status of ship recycling by treating it as a full fledged engineering industry has been brought out to establish the need for looking into the development of the best practices. Major engineering attributes of ship as a unique engineering product and the significant influencing factors on her life cycle stage operations have been studied and added to the information base on ship recycling. Role of ship recycling industry as an important player in global sustainable development efforts has been reviewed by analysing the benefits of ship recycling. A brief synopsis on the state of art of ship recycling in major international ship recycling centres has also been incorporated in the backdrop knowledgebase generation on ship recycling processes.Publications available in this field have been reviewed and classified into five subject categories viz., Infrastructure for recycling yards and methods of dismantling, Rules regarding ship recycling activities, Environmental and safety aspects of ship recycling, Role of naval architects and ship classification societies, Application of information technology and Demand forecasting. The inference from the literature survey have been summarised and recorded. Noticeable observations in the inference include need of creation of a comprehensive knowledgebase on ship recycling and its effective implementation in the industry and the insignificant involvement of naval architects and shipbuilding engineers in ship recycling industry. These two important inferences and the message conveyed by them have been addressed with due importance in the subsequent part of the present study.As a part of the study the importance of demand forecasting in ship recycling has been introduced and presented. A sample input for ship recycling data for implementation of computer based methods of demand forecasting has been presented in this section of the thesis.The interdisciplinary nature of engineering processes involved in ship recycling has been identified as one of the important features of this industry. The present study has identified more than a dozen major stake holders in ship recycling having their own interests and roles. It has also been observed that most of the ship recycling activities is carried out in South East Asian countries where the beach based ship recycling is done in yards without proper infrastructure support. A model of beach based ship recycling has been developed and the roles, responsibilities and the mutual interactions of the elements of the system have been documented as a part of the study Subsequently the need of a generation of a wide knowledgebase on ship recycling activities as pointed out by the literature survey has been addressed. The information base and source of expertise required to build a broad knowledgebase on ship recycling operations have been identified and tabulated. Eleven important ship recycling processes have been identified and a brief sketch of steps involved in these processes have been examined and addressed in detail. Based on these findings, a detailed sequential disassembly process plan of ship recycling has been prepared and charted. After having established the need of best practices in ship recycling initially, the present study here identifies development of a user friendly expert system for ship recycling process as one of the constituents of the proposed best practises. A user friendly expert system has been developed for beach based ship recycling processes and is named as Ship Recycling Recommender (SRR). Two important functions of SRR, first one for the ‘Administrators’, the stake holders at the helm of the ship recycling affairs and second one for the ‘Users’, the stake holders who execute the actual dismantling have been presented by highlighting the steps involved in the execution of the software. The important output generated, ie.,. recommended practices for ship dismantling processes and safe handling information on materials present onboard have been presented with the help of ship recycling reports generated by the expert system. A brief account of necessity of having a ship recycling work content estimation as part of the best practices has been presented in the study. This is supported by a detailed work estimation schedule for the same as one of the appendices.As mentioned earlier, a definite lack of involvement of naval architect has been observed in development of methodologies for improving the status of ship recycling industry. Present study has put forward a holistic approach to review the status of ship recycling not simply as end of life activity of all ‘time expired’ vessels, but as a focal point of integrating all life cycle activities. A new engineering design philosophy targeting sustainable development of marine industrial domain, named design for ship recycling has been identified, formulated and presented. A new model of ship life cycle has been proposed by adding few stages to the traditional life cycle after analysing their critical role in accomplishing clean and safe end of life and partial dismantling of ships. Two applications of design for ship recycling viz, recyclability of ships and her products and allotment of Green Safety Index for ships have been presented as a part of implementation of the philosophy in actual practice.
Resumo:
Interfacings of various subjects generate new field ofstudy and research that help in advancing human knowledge. One of the latest of such fields is Neurotechnology, which is an effective amalgamation of neuroscience, physics, biomedical engineering and computational methods. Neurotechnology provides a platform to interact physicist; neurologist and engineers to break methodology and terminology related barriers. Advancements in Computational capability, wider scope of applications in nonlinear dynamics and chaos in complex systems enhanced study of neurodynamics. However there is a need for an effective dialogue among physicists, neurologists and engineers. Application of computer based technology in the field of medicine through signal and image processing, creation of clinical databases for helping clinicians etc are widely acknowledged. Such synergic effects between widely separated disciplines may help in enhancing the effectiveness of existing diagnostic methods. One of the recent methods in this direction is analysis of electroencephalogram with the help of methods in nonlinear dynamics. This thesis is an effort to understand the functional aspects of human brain by studying electroencephalogram. The algorithms and other related methods developed in the present work can be interfaced with a digital EEG machine to unfold the information hidden in the signal. Ultimately this can be used as a diagnostic tool.
Resumo:
Econometrics is a young science. It developed during the twentieth century in the mid-1930’s, primarily after the World War II. Econometrics is the unification of statistical analysis, economic theory and mathematics. The history of econometrics can be traced to the use of statistical and mathematics analysis in economics. The most prominent contributions during the initial period can be seen in the works of Tinbergen and Frisch, and also that of Haavelmo in the 1940's through the mid 1950's. Right from the rudimentary application of statistics to economic data, like the use of laws of error through the development of least squares by Legendre, Laplace, and Gauss, the discipline of econometrics has later on witnessed the applied works done by Edge worth and Mitchell. A very significant mile stone in its evolution has been the work of Tinbergen, Frisch, and Haavelmo in their development of multiple regression and correlation analysis. They used these techniques to test different economic theories using time series data. In spite of the fact that some predictions based on econometric methodology might have gone wrong, the sound scientific nature of the discipline cannot be ignored by anyone. This is reflected in the economic rationale underlying any econometric model, statistical and mathematical reasoning for the various inferences drawn etc. The relevance of econometrics as an academic discipline assumes high significance in the above context. Because of the inter-disciplinary nature of econometrics (which is a unification of Economics, Statistics and Mathematics), the subject can be taught at all these broad areas, not-withstanding the fact that most often Economics students alone are offered this subject as those of other disciplines might not have adequate Economics background to understand the subject. In fact, even for technical courses (like Engineering), business management courses (like MBA), professional accountancy courses etc. econometrics is quite relevant. More relevant is the case of research students of various social sciences, commerce and management. In the ongoing scenario of globalization and economic deregulation, there is the need to give added thrust to the academic discipline of econometrics in higher education, across various social science streams, commerce, management, professional accountancy etc. Accordingly, the analytical ability of the students can be sharpened and their ability to look into the socio-economic problems with a mathematical approach can be improved, and enabling them to derive scientific inferences and solutions to such problems. The utmost significance of hands-own practical training on the use of computer-based econometric packages, especially at the post-graduate and research levels need to be pointed out here. Mere learning of the econometric methodology or the underlying theories alone would not have much practical utility for the students in their future career, whether in academics, industry, or in practice This paper seeks to trace the historical development of econometrics and study the current status of econometrics as an academic discipline in higher education. Besides, the paper looks into the problems faced by the teachers in teaching econometrics, and those of students in learning the subject including effective application of the methodology in real life situations. Accordingly, the paper offers some meaningful suggestions for effective teaching of econometrics in higher education
Resumo:
The thesis introduced the octree and addressed the complete nature of problems encountered, while building and imaging system based on octrees. An efficient Bottom-up recursive algorithm and its iterative counterpart for the raster to octree conversion of CAT scan slices, to improve the speed of generating the octree from the slices, the possibility of utilizing the inherent parallesism in the conversion programme is explored in this thesis. The octree node, which stores the volume information in cube often stores the average density information could lead to “patchy”distribution of density during the image reconstruction. In an attempt to alleviate this problem and explored the possibility of using VQ to represent the imformation contained within a cube. Considering the ease of accommodating the process of compressing the information during the generation of octrees from CAT scan slices, proposed use of wavelet transforms to generate the compressed information in a cube. The modified algorithm for generating octrees from the slices is shown to accommodate the eavelet compression easily. Rendering the stored information in the form of octree is a complex task, necessarily because of the requirement to display the volumetric information. The reys traced from each cube in the octree, sum up the density en-route, accounting for the opacities and transparencies produced due to variations in density.
Resumo:
This work is aimed at building an adaptable frame-based system for processing Dravidian languages. There are about 17 languages in this family and they are spoken by the people of South India.Karaka relations are one of the most important features of Indian languages. They are the semabtuco-syntactic relations between verbs and other related constituents in a sentence. The karaka relations and surface case endings are analyzed for meaning extraction. This approach is comparable with the borad class of case based grammars.The efficiency of this approach is put into test in two applications. One is machine translation and the other is a natural language interface (NLI) for information retrieval from databases. The system mainly consists of a morphological analyzer, local word grouper, a parser for the source language and a sentence generator for the target language. This work make contributios like, it gives an elegant account of the relation between vibhakthi and karaka roles in Dravidian languages. This mapping is elegant and compact. The same basic thing also explains simple and complex sentence in these languages. This suggests that the solution is not just ad hoc but has a deeper underlying unity. This methodology could be extended to other free word order languages. Since the frame designed for meaning representation is general, they are adaptable to other languages coming in this group and to other applications.
Resumo:
Sharing of information with those in need of it has always been an idealistic goal of networked environments. With the proliferation of computer networks, information is so widely distributed among systems, that it is imperative to have well-organized schemes for retrieval and also discovery. This thesis attempts to investigate the problems associated with such schemes and suggests a software architecture, which is aimed towards achieving a meaningful discovery. Usage of information elements as a modelling base for efficient information discovery in distributed systems is demonstrated with the aid of a novel conceptual entity called infotron.The investigations are focused on distributed systems and their associated problems. The study was directed towards identifying suitable software architecture and incorporating the same in an environment where information growth is phenomenal and a proper mechanism for carrying out information discovery becomes feasible. An empirical study undertaken with the aid of an election database of constituencies distributed geographically, provided the insights required. This is manifested in the Election Counting and Reporting Software (ECRS) System. ECRS system is a software system, which is essentially distributed in nature designed to prepare reports to district administrators about the election counting process and to generate other miscellaneous statutory reports.Most of the distributed systems of the nature of ECRS normally will possess a "fragile architecture" which would make them amenable to collapse, with the occurrence of minor faults. This is resolved with the help of the penta-tier architecture proposed, that contained five different technologies at different tiers of the architecture.The results of experiment conducted and its analysis show that such an architecture would help to maintain different components of the software intact in an impermeable manner from any internal or external faults. The architecture thus evolved needed a mechanism to support information processing and discovery. This necessitated the introduction of the noveI concept of infotrons. Further, when a computing machine has to perform any meaningful extraction of information, it is guided by what is termed an infotron dictionary.The other empirical study was to find out which of the two prominent markup languages namely HTML and XML, is best suited for the incorporation of infotrons. A comparative study of 200 documents in HTML and XML was undertaken. The result was in favor ofXML.The concept of infotron and that of infotron dictionary, which were developed, was applied to implement an Information Discovery System (IDS). IDS is essentially, a system, that starts with the infotron(s) supplied as clue(s), and results in brewing the information required to satisfy the need of the information discoverer by utilizing the documents available at its disposal (as information space). The various components of the system and their interaction follows the penta-tier architectural model and therefore can be considered fault-tolerant. IDS is generic in nature and therefore the characteristics and the specifications were drawn up accordingly. Many subsystems interacted with multiple infotron dictionaries that were maintained in the system.In order to demonstrate the working of the IDS and to discover the information without modification of a typical Library Information System (LIS), an Information Discovery in Library Information System (lDLIS) application was developed. IDLIS is essentially a wrapper for the LIS, which maintains all the databases of the library. The purpose was to demonstrate that the functionality of a legacy system could be enhanced with the augmentation of IDS leading to information discovery service. IDLIS demonstrates IDS in action. IDLIS proves that any legacy system could be augmented with IDS effectively to provide the additional functionality of information discovery service.Possible applications of IDS and scope for further research in the field are covered.
Resumo:
The theme of the thesis is centred around one important aspect of wireless sensor networks; the energy-efficiency.The limited energy source of the sensor nodes calls for design of energy-efficient routing protocols. The schemes for protocol design should try to minimize the number of communications among the nodes to save energy. Cluster based techniques were found energy-efficient. In this method clusters are formed and data from different nodes are collected under a cluster head belonging to each clusters and then forwarded it to the base station.Appropriate cluster head selection process and generation of desirable distribution of the clusters can reduce energy consumption of the network and prolong the network lifetime. In this work two such schemes were developed for static wireless sensor networks.In the first scheme, the energy wastage due to cluster rebuilding incorporating all the nodes were addressed. A tree based scheme is presented to alleviate this problem by rebuilding only sub clusters of the network. An analytical model of energy consumption of proposed scheme is developed and the scheme is compared with existing cluster based scheme. The simulation study proved the energy savings observed.The second scheme concentrated to build load-balanced energy efficient clusters to prolong the lifetime of the network. A voting based approach to utilise the neighbor node information in the cluster head selection process is proposed. The number of nodes joining a cluster is restricted to have equal sized optimum clusters. Multi-hop communication among the cluster heads is also introduced to reduce the energy consumption. The simulation study has shown that the scheme results in balanced clusters and the network achieves reduction in energy consumption.The main conclusion from the study was the routing scheme should pay attention on successful data delivery from node to base station in addition to the energy-efficiency. The cluster based protocols are extended from static scenario to mobile scenario by various authors. None of the proposals addresses cluster head election appropriately in view of mobility. An elegant scheme for electing cluster heads is presented to meet the challenge of handling cluster durability when all the nodes in the network are moving. The scheme has been simulated and compared with a similar approach.The proliferation of sensor networks enables users with large set of sensor information to utilise them in various applications. The sensor network programming is inherently difficult due to various reasons. There must be an elegant way to collect the data gathered by sensor networks with out worrying about the underlying structure of the network. The final work presented addresses a way to collect data from a sensor network and present it to the users in a flexible way.A service oriented architecture based application is built and data collection task is presented as a web service. This will enable composition of sensor data from different sensor networks to build interesting applications. The main objective of the thesis was to design energy-efficient routing schemes for both static as well as mobile sensor networks. A progressive approach was followed to achieve this goal.
Resumo:
Modern computer systems are plagued with stability and security problems: applications lose data, web servers are hacked, and systems crash under heavy load. Many of these problems or anomalies arise from rare program behavior caused by attacks or errors. A substantial percentage of the web-based attacks are due to buffer overflows. Many methods have been devised to detect and prevent anomalous situations that arise from buffer overflows. The current state-of-art of anomaly detection systems is relatively primitive and mainly depend on static code checking to take care of buffer overflow attacks. For protection, Stack Guards and I-leap Guards are also used in wide varieties.This dissertation proposes an anomaly detection system, based on frequencies of system calls in the system call trace. System call traces represented as frequency sequences are profiled using sequence sets. A sequence set is identified by the starting sequence and frequencies of specific system calls. The deviations of the current input sequence from the corresponding normal profile in the frequency pattern of system calls is computed and expressed as an anomaly score. A simple Bayesian model is used for an accurate detection.Experimental results are reported which show that frequency of system calls represented using sequence sets, captures the normal behavior of programs under normal conditions of usage. This captured behavior allows the system to detect anomalies with a low rate of false positives. Data are presented which show that Bayesian Network on frequency variations responds effectively to induced buffer overflows. It can also help administrators to detect deviations in program flow introduced due to errors.
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
After skin cancer, breast cancer accounts for the second greatest number of cancer diagnoses in women. Currently the etiologies of breast cancer are unknown, and there is no generally accepted therapy for preventing it. Therefore, the best way to improve the prognosis for breast cancer is early detection and treatment. Computer aided detection systems (CAD) for detecting masses or micro-calcifications in mammograms have already been used and proven to be a potentially powerful tool , so the radiologists are attracted by the effectiveness of clinical application of CAD systems. Fractal geometry is well suited for describing the complex physiological structures that defy the traditional Euclidean geometry, which is based on smooth shapes. The major contribution of this research include the development of • A new fractal feature to accurately classify mammograms into normal and normal (i)With masses (benign or malignant) (ii) with microcalcifications (benign or malignant) • A novel fast fractal modeling method to identify the presence of microcalcifications by fractal modeling of mammograms and then subtracting the modeled image from the original mammogram. The performances of these methods were evaluated using different standard statistical analysis methods. The results obtained indicate that the developed methods are highly beneficial for assisting radiologists in making diagnostic decisions. The mammograms for the study were obtained from the two online databases namely, MIAS (Mammographic Image Analysis Society) and DDSM (Digital Database for Screening Mammography.
Resumo:
The application of computer vision based quality control has been slowly but steadily gaining importance mainly due to its speed in achieving results and also greatly due to its non- destnictive nature of testing. Besides, in food applications it also does not contribute to contamination. However, computer vision applications in quality control needs the application of an appropriate software for image analysis. Eventhough computer vision based quality control has several advantages, its application has limitations as to the type of work to be done, particularly so in the food industries. Selective applications, however, can be highly advantageous and very accurate.Computer vision based image analysis could be used in morphometric measurements of fish with the same accuracy as the existing conventional method. The method is non-destructive and non-contaminating thus providing anadvantage in seafood processing.The images could be stored in archives and retrieved at anytime to carry out morphometric studies for biologists.Computer vision and subsequent image analysis could be used in measurements of various food products to assess uniformity of size. One product namely cutlet and product ingredients namely coating materials such as bread crumbs and rava were selected for the study. Computer vision based image analysis was used in the measurements of length, width and area of cutlets. Also the width of coating materials like bread crumbs was measured.Computer imaging and subsequent image analysis can be very effectively used in quality evaluations of product ingredients in food processing. Measurement of width of coating materials could establish uniformity of particles or the lack of it. The application of image analysis in bacteriological work was also done
Resumo:
This thesis attempts to investigate the problems associated with such schemes and suggests a software architecture, which is aimed towards achieving a meaningful discovery. Usage of information elements as a modelling base for efficient information discovery in distributed systems is demonstrated with the aid of a novel conceptual entity called infotron. The investigations are focused on distributed systems and their associated problems. The study was directed towards identifying suitable software architecture and incorporating the same in an environment where information growth is phenomenal and a proper mechanism for carrying out information discovery becomes feasible. An empirical study undertaken with the aid of an election database of constituencies distributed geographically, provided the insights required. This is manifested in the Election Counting and Reporting Software (ECRS) System. ECRS system is a software system, which is essentially distributed in nature designed to prepare reports to district administrators about the election counting process and to generate other miscellaneous statutory reports.
Resumo:
This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements
Resumo:
This thesis summarizes the results on the studies on a syntax based approach for translation between Malayalam, one of Dravidian languages and English and also on the development of the major modules in building a prototype machine translation system from Malayalam to English. The development of the system is a pioneering effort in Malayalam language unattempted by previous researchers. The computational models chosen for the system is first of its kind for Malayalam language. An in depth study has been carried out in the design of the computational models and data structures needed for different modules: morphological analyzer , a parser, a syntactic structure transfer module and target language sentence generator required for the prototype system. The generation of list of part of speech tags, chunk tags and the hierarchical dependencies among the chunks required for the translation process also has been done. In the development process, the major goals are: (a) accuracy of translation (b) speed and (c) space. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required. The space requirement also has to be minimised