24 resultados para discriminant analysis and cluster analysis
em Indian Institute of Science - Bangalore - Índia
Resumo:
Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.
Resumo:
Queens of the primitively eusocial wasp Ropalidia marginata appear to maintain reproductive monopoly through pheromone rather than through physical aggression. Upon queen removal, one of the workers (potential queen, PQ) becomes extremely aggressive but drops her aggression immediately upon returning the queen. If the queen is not returned, the PQ gradually drops her aggression and becomes the next queen of the colony. In a previous study, the Dufour's gland was found to be at least one source of the queen pheromone. Queen-worker classification could be done with 100% accuracy in a discriminant analysis, using the compositions of their respective Dufour's glands. In a bioassay, the PQ dropped her aggression in response to the queen's Dufour's gland macerate, suggesting that the queen's Dufour's gland contents mimicked the queen herself. In the present study, we found that the PQ also dropped her aggression in response to the macerate of a foreign queen's Dufour's gland. This suggests that the queen signal is perceived across colonies. This also suggests that the Dufour's gland in R. marginata does not contain information about nestmateship, because queens are attacked when introduced into foreign colonies, and hence PQ is not expected to reduce her aggression in response to a foreign queen's signal. The latter conclusion is especially significant because the Dufour's gland chemicals are adequate to classify individuals correctly not only on the basis of fertility status (queen versus worker) but also according to their colony membership, using discriminant analysis. This leads to the additional conclusion (and precaution) that the ability to statistically discriminate organisms using their chemical profiles does not necessarily imply that the organisms themselves can make such discrimination. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Models for electricity planning require inclusion of demand. Depending on the type of planning, the demand is usually represented as an annual demand for electricity (GWh), a peak demand (MW) or in the form of annual load-duration curves. The demand for electricity varies with the seasons, economic activities, etc. Existing schemes do not capture the dynamics of demand variations that are important for planning. For this purpose, we introduce the concept of representative load curves (RLCs). Advantages of RLCs are demonstrated in a case study for the state of Karnataka in India. Multiple discriminant analysis is used to cluster the 365 daily load curves for 1993-94 into nine RLCs. Further analyses of these RLCs help to identify important factors, namely, seasonal, industrial, agricultural, and residential (water heating and air-cooling) demand variations besides rationing by the utility. (C) 1999 Elsevier Science Ltd. All rights reserved.
Resumo:
Myopathies are among the major causes of mortality in the world. There is no complete cure for this heterogeneous group of diseases, but a sensitive, specific, and fast diagnostic tool may improve therapy effectiveness. In this study, Raman spectroscopy is applied to discriminate between muscle mutants in Drosophila on the basis of associated changes at the molecular level. Raman spectra were collected from indirect flight muscles of mutants, upheld1 (up1), heldup(2) (hdp(2)), myosin heavy chain7 (Mhc7), actin88F(KM88) (Act88F(KM88)), upheld101 (up101), and Canton-S (CS) control group, for both 2 and 12 days old flies. Difference spectra (mutant minus control) of all the mutants showed an increase in nucleic acid and beta-sheet and/or random coil protein content along with a decrease in a-helix protein. Interestingly, the 12th day samples of up1 and Act88F(KM88) showed significantly higher levels of glycogen and carotenoids than CS. A principal components based linear discriminant analysis classification model was developed based on multidimensional Raman spectra, which classified the mutants according to their pathophysiology and yielded an overall accuracy of 97% and 93% for 2 and 12 days old flies, respectively. The up1 and Act88F(KM88) (nemaline-myopathy) mutants form a group that is clearly separated in a linear discriminant plane from up101 and hdp2 (cardiomyopathy) mutants. Notably, Raman spectra from a human sample with nemaline-myopathy formed a cluster with the corresponding Drosophila mutant (up1). In conclusion, this is the first demonstration in which myopathies, despite their heterogeneity, were screened on the basis of biochemical differences using Raman spectroscopy.
Resumo:
The DNA polymorphism among 22 isolates of Sclerospora graminicola, the causal agent of downy mildew disease of pearl millet was assessed using 20 inter simple sequence repeats (ISSR) primers. The objective of the study was to examine the effectiveness of using ISSR markers for unravelling the extent and pattern of genetic diversity in 22 S. graminicola isolates collected from different host cultivars in different states of India. The 19 functional ISSR primers generated 410 polymorphic bands and revealed 89% polymorphism and were able to distinguish all the 22 isolates. Polymorphic bands used to construct an unweighted pair group method of averages (UPGMA) dendrogram based on Jaccard's co-efficient of similarity and principal coordinate analysis resulted in the formation of four major clusters of 22 isolates. The standardized Nei genetic distance among the 22 isolates ranged from 0.0050 to 0.0206. The UPGMA clustering using the standardized genetic distance matrix resulted in the identification of four clusters of the 22 isolates with bootstrap values ranging from 15 to 100. The 3D-scale data supported the UPGMA results, which resulted into four clusters amounting to 70% variation among each other. However, comparing the two methods show that sub clustering by dendrogram and multi dimensional scaling plot is slightly different. All the S. graminicola isolates had distinct ISSR genotypes and cluster analysis origin. The results of ISSR fingerprints revealed significant level of genetic diversity among the isolates and that ISSR markers could be a powerful tool for fingerprinting and diversity analysis in fungal pathogens.
Resumo:
While plants of a single species emit a diversity of volatile organic compounds (VOCs) to attract or repel interacting organisms, these specific messages may be lost in the midst of the hundreds of VOCs produced by sympatric plants of different species, many of which may have no signal content. Receivers must be able to reduce the babel or noise in these VOCs in order to correctly identify the message. For chemical ecologists faced with vast amounts of data on volatile signatures of plants in different ecological contexts, it is imperative to employ accurate methods of classifying messages, so that suitable bioassays may then be designed to understand message content. We demonstrate the utility of `Random Forests' (RF), a machine-learning algorithm, for the task of classifying volatile signatures and choosing the minimum set of volatiles for accurate discrimination, using datam from sympatric Ficus species as a case study. We demonstrate the advantages of RF over conventional classification methods such as principal component analysis (PCA), as well as data-mining algorithms such as support vector machines (SVM), diagonal linear discriminant analysis (DLDA) and k-nearest neighbour (KNN) analysis. We show why a tree-building method such as RF, which is increasingly being used by the bioinformatics, food technology and medical community, is particularly advantageous for the study of plant communication using volatiles, dealing, as it must, with abundant noise.
Resumo:
In this paper, we give a brief review of pattern classification algorithms based on discriminant analysis. We then apply these algorithms to classify movement direction based on multivariate local field potentials recorded from a microelectrode array in the primary motor cortex of a monkey performing a reaching task. We obtain prediction accuracies between 55% and 90% using different methods which are significantly above the chance level of 12.5%.
Resumo:
Myopathies are muscular diseases in which muscle fibers degenerate due to many factors such as nutrient deficiency, infection and mutations in myofibrillar etc. The objective of this study is to identify the bio-markers to distinguish various muscle mutants in Drosophila (fruit fly) using Raman Spectroscopy. Principal Components based Linear Discriminant Analysis (PC-LDA) classification model yielding >95% accuracy was developed to classify such different mutants representing various myopathies according to their physiopathology.
Resumo:
Myopathies are muscular diseases in which muscle fibers degenerate due to many factors such as nutrient deficiency, infection and mutations in myofibrillar etc. The objective of this study is to identify the bio-markers to distinguish various muscle mutants in Drosophila (fruit fly) using Raman Spectroscopy. Principal Components based Linear Discriminant Analysis (PC-LDA) classification model yielding >95% accuracy was developed to classify such different mutants representing various myopathies according to their physiopathology.
Resumo:
Rice landraces are lineages developed by farmers through artificial selection during the long-term domestication process. Despite huge potential for crop improvement, they are largely understudied in India. Here, we analyse a suite of phenotypic characters from large numbers of Indian landraces comprised of both aromatic and non-aromatic varieties. Our primary aim was to investigate the major determinants of diversity, the strength of segregation among aromatic and non-aromatic landraces as well as that within aromatic landraces. Using principal component analysis, we found that grain length, width and weight, panicle weight and leaf length have the most substantial contribution. Discriminant analysis can effectively distinguish the majority of aromatic from non-aromatic landraces. More interestingly, within aromatic landraces long-grain traditional Basmati and short-grain non-Basmati aromatics remain morphologically well differentiated. The present research emphasizes the general patterns of phenotypic diversity and finds out the most important characters. It also confirms the existence of very unique short-grain aromatic landraces, perhaps carrying signatures of independent origin of an additional aroma quantitative trait locus in the indica group, unlike introgression of specific alleles of the BADH2 gene from the japonica group as in Basmati. We presume that this parallel origin and evolution of aroma in short-grain indica landraces are linked to the long history of rice domestication that involved inheritance of several traits from Oryza nivara, in addition to O. rufipogon. We conclude with a note that the insights from the phenotypic analysis essentially comprise the first part, which will likely be validated with subsequent molecular analysis.
Resumo:
Classical and non-classical isomers of both neutral and dianionic BC2P2H3 species, which are isolobal to Cp+ and Cp-, are studied at both B3LYP/6-311++G(d,p) and G3B3 levels of theory. The global minimum structure given by B3LYP/6-311+ + G(d,p) for BC2P2H3 is based on a vinylcyclopropenyl-type structure, whereas BC2P2H32- has a planar aromatic cyclopentadienyl-ion-like structure. However, at the G3B3 level, there are three low-energy isomers for BC2P2H3: 1)tricyclopentane, 2) nido and 3) vinylcyclopropenyl-type structures, all within 1.7 kcal mol(-1) of each other. On the contrary, for the dianionic species the cyclic planar structure is still the minimum. In comparison to the isolobal Cp+ and HnCnP5-n+ isomers, BC2P2H3 shows a competition between pi-delocalised vinylcyclopropenyl- and cluster-type structures (nido and tricyclopentane). Substitution of H on C by tBu, and H on B by Ph, in BC2P2H3 increases the energy difference between the low-lying isomers, giving the lowest energy structure as a tricyclopentane type. Similar substitution in BC2P2H32- merely favours different positional isomers of the cyclic planar geometry, as observed in 1) isoelectronic neutral heterodiphospholes EtBu2C2P2 (E=S, Se, Te), 2) monoanionic heterophospholyl rings EtBu2C2P2 (E=P-, As-, Sb-) and 3) polyphospholyl rings anions tBu(5-n)C(n)P(5-n) (n=0-5). The principal factors that affect the stability of three-, four-, and five-membered ring and acyclic geometrical and positional isomers of neutral and dianionic BC2P2H3 isomers appear to be: 1) relative bond strengths, 2) availability of electrons for the empty 2p boron orbital and 3) steric effects of the tBu groups in the HBC(2)P(2)tBu(2) systems.
Resumo:
Core-level binding energies of the component metals in bimetallic clusters of various compositions in the Ni-Cu, Au-Ag, Ni-Pd, and Cu-Pd systems have been measured as functions of coverage or cluster size, after having characterized the clusters with respect to sizes and compositions. The core-level binding energy shifts, relative to the bulk metals, at large coverages or cluster size, Delta E(a), are found to be identical to those of bulk alloys. By substracting the Delta E(a) values from the observed binding energy shifts, Delta E, we obtain the shifts, Delta E(c), due to cluster size. The Delta E(c) values in all the alloy systems increase with the decrease in cluster size. These results establish the additivity of the binding energy shifts due to alloying and cluster size effects in bimetallic clusters.
Resumo:
This paper considers the problem of identifying the footprints of communication of multiple transmitters in a given geographical area. To do this, a number of sensors are deployed at arbitrary but known locations in the area, and their individual decisions regarding the presence or absence of the transmitters' signal are combined at a fusion center to reconstruct the spatial spectral usage map. One straightforward scheme to construct this map is to query each of the sensors and cluster the sensors that detect the primary's signal. However, using the fact that a typical transmitter footprint map is a sparse image, two novel compressive sensing based schemes are proposed, which require significantly fewer number of transmissions compared to the querying scheme. A key feature of the proposed schemes is that the measurement matrix is constructed from a pseudo-random binary phase shift applied to the decision of each sensor prior to transmission. The measurement matrix is thus a binary ensemble which satisfies the restricted isometry property. The number of measurements needed for accurate footprint reconstruction is determined using compressive sampling theory. The three schemes are compared through simulations in terms of a performance measure that quantifies the accuracy of the reconstructed spatial spectral usage map. It is found that the proposed sparse reconstruction technique-based schemes significantly outperform the round-robin scheme.
Resumo:
We report the results of Monte Carlo simulation of the phase diagram and oxygen ordering in YBa2Cu3O6+x for low intra-sublattice repulsion. At low temperatures, apart from tetragonal (T), orthorhombic (OI) and 'double cell' ortho II phases, there is evidence for two additional orthorhombic phases labelled here as OIBAR and OIII. At high temperatures, there was no evidence for the decomposition of the OI phase into the T and OI phases. We find qualitative agreement with experimental observations and cluster-variation method results.
Resumo:
We propose a new approach to clustering. Our idea is to map cluster formation to coalition formation in cooperative games, and to use the Shapley value of the patterns to identify clusters and cluster representatives. We show that the underlying game is convex and this leads to an efficient biobjective clustering algorithm that we call BiGC. The algorithm yields high-quality clustering with respect to average point-to-center distance (potential) as well as average intracluster point-to-point distance (scatter). We demonstrate the superiority of BiGC over state-of-the-art clustering algorithms (including the center based and the multiobjective techniques) through a detailed experimentation using standard cluster validity criteria on several benchmark data sets. We also show that BiGC satisfies key clustering properties such as order independence, scale invariance, and richness.