18 resultados para Word Classification

em Cochin University of Science


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Suffix separation plays a vital role in improving the quality of training in the Statistical Machine Translation from English into Malayalam. The morphological richness and the agglutinative nature of Malayalam make it necessary to retrieve the root word from its inflected form in the training process. The suffix separation process accomplishes this task by scrutinizing the Malayalam words and by applying sandhi rules. In this paper, various handcrafted rules designed for the suffix separation process in the English Malayalam SMT are presented. A classification of these rules is done based on the Malayalam syllable preceding the suffix in the inflected form of the word (check_letter). The suffixes beginning with the vowel sounds like ആല, ഉെെ, ഇല etc are mainly considered in this process. By examining the check_letter in a word, the suffix separation rules can be directly applied to extract the root words. The quick look up table provided in this paper can be used as a guideline in implementing suffix separation in Malayalam language

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new procedure for the classification of lower case English language characters is presented in this work . The character image is binarised and the binary image is further grouped into sixteen smaller areas ,called Cells . Each cell is assigned a name depending upon the contour present in the cell and occupancy of the image contour in the cell. A data reduction procedure called Filtering is adopted to eliminate undesirable redundant information for reducing complexity during further processing steps . The filtered data is fed into a primitive extractor where extraction of primitives is done . Syntactic methods are employed for the classification of the character . A decision tree is used for the interaction of the various components in the scheme . 1ike the primitive extraction and character recognition. A character is recognized by the primitive by primitive construction of its description . Openended inventories are used for including variants of the characters and also adding new members to the general class . Computer implementation of the proposal is discussed at the end using handwritten character samples . Results are analyzed and suggestions for future studies are made. The advantages of the proposal are discussed in detail .

Relevância:

20.00% 20.00%

Publicador:

Resumo:

After skin cancer, breast cancer accounts for the second greatest number of cancer diagnoses in women. Currently the etiologies of breast cancer are unknown, and there is no generally accepted therapy for preventing it. Therefore, the best way to improve the prognosis for breast cancer is early detection and treatment. Computer aided detection systems (CAD) for detecting masses or micro-calcifications in mammograms have already been used and proven to be a potentially powerful tool , so the radiologists are attracted by the effectiveness of clinical application of CAD systems. Fractal geometry is well suited for describing the complex physiological structures that defy the traditional Euclidean geometry, which is based on smooth shapes. The major contribution of this research include the development of • A new fractal feature to accurately classify mammograms into normal and normal (i)With masses (benign or malignant) (ii) with microcalcifications (benign or malignant) • A novel fast fractal modeling method to identify the presence of microcalcifications by fractal modeling of mammograms and then subtracting the modeled image from the original mammogram. The performances of these methods were evaluated using different standard statistical analysis methods. The results obtained indicate that the developed methods are highly beneficial for assisting radiologists in making diagnostic decisions. The mammograms for the study were obtained from the two online databases namely, MIAS (Mammographic Image Analysis Society) and DDSM (Digital Database for Screening Mammography.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Magnetic Resonance Imaging (MRI) is a multi sequence medical imaging technique in which stacks of images are acquired with different tissue contrasts. Simultaneous observation and quantitative analysis of normal brain tissues and small abnormalities from these large numbers of different sequences is a great challenge in clinical applications. Multispectral MRI analysis can simplify the job considerably by combining unlimited number of available co-registered sequences in a single suite. However, poor performance of the multispectral system with conventional image classification and segmentation methods makes it inappropriate for clinical analysis. Recent works in multispectral brain MRI analysis attempted to resolve this issue by improved feature extraction approaches, such as transform based methods, fuzzy approaches, algebraic techniques and so forth. Transform based feature extraction methods like Independent Component Analysis (ICA) and its extensions have been effectively used in recent studies to improve the performance of multispectral brain MRI analysis. However, these global transforms were found to be inefficient and inconsistent in identifying less frequently occurred features like small lesions, from large amount of MR data. The present thesis focuses on the improvement in ICA based feature extraction techniques to enhance the performance of multispectral brain MRI analysis. Methods using spectral clustering and wavelet transforms are proposed to resolve the inefficiency of ICA in identifying small abnormalities, and problems due to ICA over-completeness. Effectiveness of the new methods in brain tissue classification and segmentation is confirmed by a detailed quantitative and qualitative analysis with synthetic and clinical, normal and abnormal, data. In comparison to conventional classification techniques, proposed algorithms provide better performance in classification of normal brain tissues and significant small abnormalities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The term reliability of an equipment or device is often meant to indicate the probability that it carries out the functions expected of it adequately or without failure and within specified performance limits at a given age for a desired mission time when put to use under the designated application and operating environmental stress. A broad classification of the approaches employed in relation to reliability studies can be made as probabilistic and deterministic, where the main interest in the former is to device tools and methods to identify the random mechanism governing the failure process through a proper statistical frame work, while the latter addresses the question of finding the causes of failure and steps to reduce individual failures thereby enhancing reliability. In the probabilistic attitude to which the present study subscribes to, the concept of life distribution, a mathematical idealisation that describes the failure times, is fundamental and a basic question a reliability analyst has to settle is the form of the life distribution. It is for no other reason that a major share of the literature on the mathematical theory of reliability is focussed on methods of arriving at reasonable models of failure times and in showing the failure patterns that induce such models. The application of the methodology of life time distributions is not confined to the assesment of endurance of equipments and systems only, but ranges over a wide variety of scientific investigations where the word life time may not refer to the length of life in the literal sense, but can be concieved in its most general form as a non-negative random variable. Thus the tools developed in connection with modelling life time data have found applications in other areas of research such as actuarial science, engineering, biomedical sciences, economics, extreme value theory etc.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Image processing has been a challenging and multidisciplinary research area since decades with continuing improvements in its various branches especially Medical Imaging. The healthcare industry was very much benefited with the advances in Image Processing techniques for the efficient management of large volumes of clinical data. The popularity and growth of Image Processing field attracts researchers from many disciplines including Computer Science and Medical Science due to its applicability to the real world. In the meantime, Computer Science is becoming an important driving force for the further development of Medical Sciences. The objective of this study is to make use of the basic concepts in Medical Image Processing and develop methods and tools for clinicians’ assistance. This work is motivated from clinical applications of digital mammograms and placental sonograms, and uses real medical images for proposing a method intended to assist radiologists in the diagnostic process. The study consists of two domains of Pattern recognition, Classification and Content Based Retrieval. Mammogram images of breast cancer patients and placental images are used for this study. Cancer is a disaster to human race. The accuracy in characterizing images using simplified user friendly Computer Aided Diagnosis techniques helps radiologists in detecting cancers at an early stage. Breast cancer which accounts for the major cause of cancer death in women can be fully cured if detected at an early stage. Studies relating to placental characteristics and abnormalities are important in foetal monitoring. The diagnostic variability in sonographic examination of placenta can be overlooked by detailed placental texture analysis by focusing on placental grading. The work aims on early breast cancer detection and placental maturity analysis. This dissertation is a stepping stone in combing various application domains of healthcare and technology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Treating e-mail filtering as a binary text classification problem, researchers have applied several statistical learning algorithms to email corpora with promising results. This paper examines the performance of a Naive Bayes classifier using different approaches to feature selection and tokenization on different email corpora

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Underwater target localization and tracking attracts tremendous research interest due to various impediments to the estimation task caused by the noisy ocean environment. This thesis envisages the implementation of a prototype automated system for underwater target localization, tracking and classification using passive listening buoy systems and target identification techniques. An autonomous three buoy system has been developed and field trials have been conducted successfully. Inaccuracies in the localization results, due to changes in the environmental parameters, measurement errors and theoretical approximations are refined using the Kalman filter approach. Simulation studies have been conducted for the tracking of targets with different scenarios even under maneuvering situations. This system can as well be used for classifying the unknown targets by extracting the features of the noise emanations from the targets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Author identification is the problem of identifying the author of an anonymous text or text whose authorship is in doubt from a given set of authors. The works by different authors are strongly distinguished by quantifiable features of the text. This paper deals with the attempts made on identifying the most likely author of a text in Malayalam from a list of authors. Malayalam is a Dravidian language with agglutinative nature and not much successful tools have been developed to extract syntactic & semantic features of texts in this language. We have done a detailed study on the various stylometric features that can be used to form an authors profile and have found that the frequencies of word collocations can be used to clearly distinguish an author in a highly inflectious language such as Malayalam. In our work we try to extract the word level and character level features present in the text for characterizing the style of an author. Our first step was towards creating a profile for each of the candidate authors whose texts were available with us, first from word n-gram frequencies and then by using variable length character n-gram frequencies. Profiles of the set of authors under consideration thus formed, was then compared with the features extracted from anonymous text, to suggest the most likely author.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam translation using statistical models like translation model, language model and a decoder. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set up among the sentence pairs of the source and target language before subjecting them for training. This paper is deals with the techniques which can be adopted for improving the alignment model of SMT. Incorporating the parts of speech information into the bilingual corpus has eliminated many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A spectral angle based feature extraction method, Spectral Clustering Independent Component Analysis (SC-ICA), is proposed in this work to improve the brain tissue classification from Magnetic Resonance Images (MRI). SC-ICA provides equal priority to global and local features; thereby it tries to resolve the inefficiency of conventional approaches in abnormal tissue extraction. First, input multispectral MRI is divided into different clusters by a spectral distance based clustering. Then, Independent Component Analysis (ICA) is applied on the clustered data, in conjunction with Support Vector Machines (SVM) for brain tissue analysis. Normal and abnormal datasets, consisting of real and synthetic T1-weighted, T2-weighted and proton density/fluid-attenuated inversion recovery images, were used to evaluate the performance of the new method. Comparative analysis with ICA based SVM and other conventional classifiers established the stability and efficiency of SC-ICA based classification, especially in reproduction of small abnormalities. Clinical abnormal case analysis demonstrated it through the highest Tanimoto Index/accuracy values, 0.75/98.8%, observed against ICA based SVM results, 0.17/96.1%, for reproduced lesions. Experimental results recommend the proposed method as a promising approach in clinical and pathological studies of brain diseases

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a multispectral analysis system using wavelet based Principal Component Analysis (PCA), to improve the brain tissue classification from MRI images. Global transforms like PCA often neglects significant small abnormality details, while dealing with a massive amount of multispectral data. In order to resolve this issue, input dataset is expanded by detail coefficients from multisignal wavelet analysis. Then, PCA is applied on the new dataset to perform feature analysis. Finally, an unsupervised classification with Fuzzy C-Means clustering algorithm is used to measure the improvement in reproducibility and accuracy of the results. A detailed comparative analysis of classified tissues with those from conventional PCA is also carried out. Proposed method yielded good improvement in classification of small abnormalities with high sensitivity/accuracy values, 98.9/98.3, for clinical analysis. Experimental results from synthetic and clinical data recommend the new method as a promising approach in brain tissue analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multispectral analysis is a promising approach in tissue classification and abnormality detection from Magnetic Resonance (MR) images. But instability in accuracy and reproducibility of the classification results from conventional techniques keeps it far from clinical applications. Recent studies proposed Independent Component Analysis (ICA) as an effective method for source signals separation from multispectral MR data. However, it often fails to extract the local features like small abnormalities, especially from dependent real data. A multisignal wavelet analysis prior to ICA is proposed in this work to resolve these issues. Best de-correlated detail coefficients are combined with input images to give better classification results. Performance improvement of the proposed method over conventional ICA is effectively demonstrated by segmentation and classification using k-means clustering. Experimental results from synthetic and real data strongly confirm the positive effect of the new method with an improved Tanimoto index/Sensitivity values, 0.884/93.605, for reproduced small white matter lesions