21 resultados para Discriminant
em Indian Institute of Science - Bangalore - Índia
Resumo:
The problem of learning correct decision rules to minimize the probability of misclassification is a long-standing problem of supervised learning in pattern recognition. The problem of learning such optimal discriminant functions is considered for the class of problems where the statistical properties of the pattern classes are completely unknown. The problem is posed as a game with common payoff played by a team of mutually cooperating learning automata. This essentially results in a probabilistic search through the space of classifiers. The approach is inherently capable of learning discriminant functions that are nonlinear in their parameters also. A learning algorithm is presented for the team and convergence is established. It is proved that the team can obtain the optimal classifier to an arbitrary approximation. Simulation results with a few examples are presented where the team learns the optimal classifier.
Resumo:
Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.
Resumo:
We present two discriminative language modelling techniques for Lempel-Ziv-Welch (LZW) based LID system. The previous approach to LID using LZW algorithm was to directly use the LZW pattern tables forlanguage modelling. But, since the patterns in a language pattern table are shared by other language pattern tables, confusability prevailed in the LID task. For overcoming this, we present two pruning techniques (i) Language Specific (LS-LZW)-in which patterns common to more than one pattern table are removed. (ii) Length-Frequency product based (LF-LZW)-in which patterns having their length-frequency product below a threshold are removed. These approaches reduce the classification score (Compression Ratio [LZW-CR] or the weighted discriminant score [LZW-WDS]) for non native languages and increases the LID performance considerably. Also the memory and computational requirements of these techniques are much less compared to basic LZW techniques.
Resumo:
Sensory nerve action potentials (SNAPs) and compound nerve action potentials (CNAPs) were recorded from 25 normal subjects and 21 hanseniasis patients following electrical stimulation of the median nerve at the wrist. The various nerve conduction parameters from the affected nerves of the patients were compared with those from the clinically normal nerves of patients as well as data from healthy individuals. Analysis of the data and clinical correlation studies indicate the suitability of amplitudes of the SNAPs and CNAPs rather than the nerve conduction velocities in better characterizing the neuropathy of the patients. Significantly reduced amplitudes of responses from clinically unaffected nerves of patients indicate an early stage of neuropathy, thus being of predictive value. Further, a discriminant classifier, trained on data from clinically affected nerves of patients, classified most of the data from clinically unaffected nerves of patients as abnormal. This indicates that clinical neurophysiological studies can reveal leprous neuropathy much before it becomes clinically evident by means of sensory or motor loss. A discriminant score involving only the parameters of motor threshold, amplitude of digit potential and palm nerve conduction velocity is able to classify almost all of the normal and abnormal responses. The authors hope that further confirmative studies might ultimately lead to the use of the study of distal sensory conduction in the upper limbs in possible screening of a population exposed to Mycobacterium leprae. On the other hand, misclassification of a normal person occurred and suggests that further refinement of the methods is necessary in order to facilitate wider use of the methods under held conditions.
Resumo:
We report a hierarchical blind script identifier for 11 different Indian scripts. An initial grouping of the 11 scripts is accomplished at the first level of this hierarchy. At the subsequent level, we recognize the script in each group. The various nodes of this tree use different feature-classifier combinations. A database of 20,000 words of different font styles and sizes is collected and used for each script. Effectiveness of Gabor and Discrete Cosine Transform features has been independently, evaluated using nearest neighbor linear discriminant and support vector machine classifiers. The minimum and maximum accuracies obtained, using this hierarchical mechanism, are 92.2% and 97.6%, respectively.
Resumo:
Queens and workers are not morphologically differentiated in the primitively eusocial wasp, Ropalidia marginata. Upon removal of the queen, one of the workers becomes extremely aggressive, but immediately drops her aggression if the queen is returned. If the queen is not returned, this hyper-aggressive individual, the potential queen (PQ), will develop her ovaries, lose her hyper-aggression, and become the next colony queen. Because of the non-aggressive nature of the queen, and because the PQ loses her aggression by the time she starts laying eggs, we hypothesized that regulation of worker reproduction in R marginata is mediated by pheromones rather than by physical aggression. Based on the immediate loss of aggression by the PQ upon return of the queen, we developed a bioassay to test whether the queen's Dufour's gland is, at least, one of the sources of the queen pheromone. Macerates of the queen's Dufour's gland, but not that of the worker's Dufour's gland, mimic the queen in making the PQ decrease her aggression. We also correctly distinguished queens and workers of R. marginata nests by a discriminant function analysis based on the chemical composition of their respective Dufour's glands.
Resumo:
Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-nomalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP's with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, Nearest neighbor, Linear discriminant function, SVM's and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96.
Resumo:
While plants of a single species emit a diversity of volatile organic compounds (VOCs) to attract or repel interacting organisms, these specific messages may be lost in the midst of the hundreds of VOCs produced by sympatric plants of different species, many of which may have no signal content. Receivers must be able to reduce the babel or noise in these VOCs in order to correctly identify the message. For chemical ecologists faced with vast amounts of data on volatile signatures of plants in different ecological contexts, it is imperative to employ accurate methods of classifying messages, so that suitable bioassays may then be designed to understand message content. We demonstrate the utility of `Random Forests' (RF), a machine-learning algorithm, for the task of classifying volatile signatures and choosing the minimum set of volatiles for accurate discrimination, using datam from sympatric Ficus species as a case study. We demonstrate the advantages of RF over conventional classification methods such as principal component analysis (PCA), as well as data-mining algorithms such as support vector machines (SVM), diagonal linear discriminant analysis (DLDA) and k-nearest neighbour (KNN) analysis. We show why a tree-building method such as RF, which is increasingly being used by the bioinformatics, food technology and medical community, is particularly advantageous for the study of plant communication using volatiles, dealing, as it must, with abundant noise.
Resumo:
A new scheme is proposed for the detection of premature ventricular beats, which is a vital function in rhythm monitoring of cardiac patients. A transformation based on the first difference of the digitized electrocardiogram (ECG) signal is developed for the detection and delineation of QRS complexes. The method for classifying the abnormal complexes from the normal ones is based on the concepts of minimum phase and signal length. The parameters of a linear discriminant function obtained from a training feature vector set are used to classify the complexes. Results of application of the scheme to ECG of two arrhythmia patients are presented.
Resumo:
Unlike queens of typical primitively eusocial species, Ropalidia marginata queens are docile and non-interactive, and hence cannot be using dominance to maintain their status. It appears that the queen maintains reproductive monopoly through a pheromone, of which the Dufour's gland is at least one source. Here, we reconfirm earlier results showing that queens and workers can be correctly classified on a discriminant function using the compositions of their respective Dufour's glands, and also demonstrate consistent queen-worker differences based on categories of compounds and on single compounds also in some cases. Since the queen pheromone is expected to be an honest signal of the fecundity of a queen, we investigate the correlation of Dufour's gland compounds with ovarian activation of queens. Our study shows that Dufour's gland compounds in R. marginata correlate with the state of ovarian activation of queens, suggesting that such compounds may portray the fecundity of a queen, and may indeed function as honest signals of fertility.
Resumo:
Queens of the primitively eusocial wasp Ropalidia marginata appear to maintain reproductive monopoly through pheromone rather than through physical aggression. Upon queen removal, one of the workers (potential queen, PQ) becomes extremely aggressive but drops her aggression immediately upon returning the queen. If the queen is not returned, the PQ gradually drops her aggression and becomes the next queen of the colony. In a previous study, the Dufour's gland was found to be at least one source of the queen pheromone. Queen-worker classification could be done with 100% accuracy in a discriminant analysis, using the compositions of their respective Dufour's glands. In a bioassay, the PQ dropped her aggression in response to the queen's Dufour's gland macerate, suggesting that the queen's Dufour's gland contents mimicked the queen herself. In the present study, we found that the PQ also dropped her aggression in response to the macerate of a foreign queen's Dufour's gland. This suggests that the queen signal is perceived across colonies. This also suggests that the Dufour's gland in R. marginata does not contain information about nestmateship, because queens are attacked when introduced into foreign colonies, and hence PQ is not expected to reduce her aggression in response to a foreign queen's signal. The latter conclusion is especially significant because the Dufour's gland chemicals are adequate to classify individuals correctly not only on the basis of fertility status (queen versus worker) but also according to their colony membership, using discriminant analysis. This leads to the additional conclusion (and precaution) that the ability to statistically discriminate organisms using their chemical profiles does not necessarily imply that the organisms themselves can make such discrimination. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Models for electricity planning require inclusion of demand. Depending on the type of planning, the demand is usually represented as an annual demand for electricity (GWh), a peak demand (MW) or in the form of annual load-duration curves. The demand for electricity varies with the seasons, economic activities, etc. Existing schemes do not capture the dynamics of demand variations that are important for planning. For this purpose, we introduce the concept of representative load curves (RLCs). Advantages of RLCs are demonstrated in a case study for the state of Karnataka in India. Multiple discriminant analysis is used to cluster the 365 daily load curves for 1993-94 into nine RLCs. Further analyses of these RLCs help to identify important factors, namely, seasonal, industrial, agricultural, and residential (water heating and air-cooling) demand variations besides rationing by the utility. (C) 1999 Elsevier Science Ltd. All rights reserved.
Resumo:
We present a improved language modeling technique for Lempel-Ziv-Welch (LZW) based LID scheme. The previous approach to LID using LZW algorithm prepares the language pattern table using LZW algorithm. Because of the sequential nature of the LZW algorithm, several language specific patterns of the language were missing in the pattern table. To overcome this, we build a universal pattern table, which contains all patterns of different length. For each language it's corresponding language specific pattern table is constructed by retaining the patterns of the universal table whose frequency of appearance in the training data is above the threshold.This approach reduces the classification score (Compression Ratio [LZW-CR] or the weighted discriminant score[LZW-WDS]) for non native languages and increases the LID performance considerably.
Resumo:
We present a new approach to spoken language modeling for language identification (LID) using the Lempel-Ziv-Welch (LZW) algorithm. The LZW technique is applicable to any kind of tokenization of the speech signal. Because of the efficiency of LZW algorithm to obtain variable length symbol strings in the training data, the LZW codebook captures the essentials of a language effectively. We develop two new deterministic measures for LID based on the LZW algorithm namely: (i) Compression ratio score (LZW-CR) and (ii) weighted discriminant score (LZW-WDS). To assess these measures, we consider error-free tokenization of speech as well as artificially induced noise in the tokenization. It is shown that for a 6 language LID task of OGI-TS database with clean tokenization, the new model (LZW-WDS) performs slightly better than the conventional bigram model. For noisy tokenization, which is the more realistic case, LZW-WDS significantly outperforms the bigram technique
Resumo:
In this paper, we give a brief review of pattern classification algorithms based on discriminant analysis. We then apply these algorithms to classify movement direction based on multivariate local field potentials recorded from a microelectrode array in the primary motor cortex of a monkey performing a reaching task. We obtain prediction accuracies between 55% and 90% using different methods which are significantly above the chance level of 12.5%.