53 resultados para Unsupervised classification


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multitemporal Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) imagery was used to assess coastline morphological changes in southeastern Brazil. A spectral linear mixing approach (SLMA) was used to estimate fraction imagery representing amounts of vegetation, clean water (a proxy for shade) and soil. Fraction abundances were related to erosive and depositional features. Shoreline, sandy banks (including emerged and submerged banks) and sand spits were highlighted mainly by clean water and soil fraction imagery. To evaluate changes in the coastline geomorphic features, the fraction imagery generated for each data set was classified in a contextual approach using a segmentation technique and ISOSEG, an unsupervised classification. Evaluation of the classifications was performed visually and by an error matrix relating ground-truth data to classification results. Comparison of the classification results revealed an intense transformation in the coastline, and that erosive and depositional features are extremely dynamic and subject to change in short periods of time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Microarray gene expression profiling is a high-throughput system used to identify differentially expressed genes and regulation patterns, and to discover new tumor markers. As the molecular pathogenesis of meningiomas and schwannomas, characterized by NF2 gene alterations, remains unclear and suitable molecular targets need to be identified, we used low density cDNA microarrays to establish expression patterns of 96 cancer-related genes on 23 schwannomas, 42 meningiomas and 3 normal cerebral meninges. We also performed a mutational analysis of the NF2 gene (PCR, dHPLC, Sequencing and MLPA), a search for 22q LOH and an analysis of gene silencing by promoter hypermethylation (MS-MLPA). Results showed a high frequency of NF2 gene mutations (40%), increased 22q LOH as aggressiveness increased, frequent losses and gains by MLPA in benign meningiomas, and gene expression silencing by hypermethylation. Array analysis showed decreased expression of 7 genes in meningiomas. Unsupervised analyses identified 2 molecular subgroups for both meningiomas and schwannomas showing 38 and 20 differentially expressed genes, respectively, and 19 genes differentially expressed between the two tumor types. These findings provide a molecular subgroup classification for meningiomas and schwannomas with possible implications for clinical practice.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Saving our science from ourselves: the plight of biological classification. Biological classification ( nomenclature, taxonomy, and systematics) is being sold short. The desire for new technologies, faster and cheaper taxonomic descriptions, identifications, and revisions is symptomatic of a lack of appreciation and understanding of classification. The problem of gadget-driven science, a lack of best practice and the inability to accept classification as a descriptive and empirical science are discussed. The worst cases scenario is a future in which classifications are purely artificial and uninformative.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

PURPOSE: The main goal of this study was to develop and compare two different techniques for classification of specific types of corneal shapes when Zernike coefficients are used as inputs. A feed-forward artificial Neural Network (NN) and discriminant analysis (DA) techniques were used. METHODS: The inputs both for the NN and DA were the first 15 standard Zernike coefficients for 80 previously classified corneal elevation data files from an Eyesys System 2000 Videokeratograph (VK), installed at the Departamento de Oftalmologia of the Escola Paulista de Medicina, São Paulo. The NN had 5 output neurons which were associated with 5 typical corneal shapes: keratoconus, with-the-rule astigmatism, against-the-rule astigmatism, "regular" or "normal" shape and post-PRK. RESULTS: The NN and DA responses were statistically analyzed in terms of precision ([true positive+true negative]/total number of cases). Mean overall results for all cases for the NN and DA techniques were, respectively, 94% and 84.8%. CONCLUSION: Although we used a relatively small database, results obtained in the present study indicate that Zernike polynomials as descriptors of corneal shape may be a reliable parameter as input data for diagnostic automation of VK maps, using either NN or DA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a molecular phylogenetic analysis of caenophidian (advanced) snakes using sequences from two mitochondrial genes (12S and 16S rRNA) and one nuclear (c-mos) gene (1681 total base pairs), and with 131 terminal taxa sampled from throughout all major caenophidian lineages but focussing on Neotropical xenodontines. Direct optimization parsimony analysis resulted in a well-resolved phylogenetic tree, which corroborates some clades identified in previous analyses and suggests new hypotheses for the composition and relationships of others. The major salient points of our analysis are: (1) placement of Acrochordus, Xenodermatids, and Pareatids as successive outgroups to all remaining caenophidians (including viperids, elapids, atractaspidids, and all other "colubrid" groups); (2) within the latter group, viperids and homalopsids are sucessive sister clades to all remaining snakes; (3) the following monophyletic clades within crown group caenophidians: Afro-Asian psammophiids (including Mimophis from Madagascar), Elapidae (including hydrophiines but excluding Homoroselaps), Pseudoxyrhophiinae, Colubrinae, Natricinae, Dipsadinae, and Xenodontinae. Homoroselaps is associated with atractaspidids. Our analysis suggests some taxonomic changes within xenodontines, including new taxonomy for Alsophis elegans, Liophis amarali, and further taxonomic changes within Xenodontini and the West Indian radiation of xenodontines. Based on our molecular analysis, we present a revised classification for caenophidians and provide morphological diagnoses for many of the included clades; we also highlight groups where much more work is needed. We name as new two higher taxonomic clades within Caenophidia, one new subfamily within Dipsadidae, and, within Xenodontinae five new tribes, six new genera and two resurrected genera. We synonymize Xenoxybelis and Pseudablabes with Philodryas; Erythrolamprus with Liophis; and Lystrophis and Waglerophis with Xenodon.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a new food classification which assigns foodstuffs according to the extent and purpose of the industrial processing applied to them. Three main groups are defined: unprocessed or minimally processed foods (group 1), processed culinary and food industry ingredients (group 2), and ultra-processed food products (group 3). The use of this classification is illustrated by applying it to data collected in the Brazilian Household Budget Survey which was conducted in 2002/2003 through a probabilistic sample of 48,470 Brazilian households. The average daily food availability was 1,792 kcal/person being 42.5% from group 1 (mostly rice and beans and meat and milk), 37.5% from group 2 (mostly vegetable oils, sugar, and flours), and 20% from group 3 (mostly breads, biscuits, sweets, soft drinks, and sausages). The share of group 3 foods increased with income, and represented almost one third of all calories in higher income households. The impact of the replacement of group 1 foods and group 2 ingredients by group 3 products on the overall quality of the diet, eating patterns and health is discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work proposes a new approach using a committee machine of artificial neural networks to classify masses found in mammograms as benign or malignant. Three shape factors, three edge-sharpness measures, and 14 texture measures are used for the classification of 20 regions of interest (ROIs) related to malignant tumors and 37 ROIs related to benign masses. A group of multilayer perceptrons (MLPs) is employed as a committee machine of neural network classifiers. The classification results are reached by combining the responses of the individual classifiers. Experiments involving changes in the learning algorithm of the committee machine are conducted. The classification accuracy is evaluated using the area A. under the receiver operating characteristics (ROC) curve. The A, result for the committee machine is compared with the A, results obtained using MLPs and single-layer perceptrons (SLPs), as well as a linear discriminant analysis (LDA) classifier Tests are carried out using the student's t-distribution. The committee machine classifier outperforms the MLP SLP, and LDA classifiers in the following cases: with the shape measure of spiculation index, the A, values of the four methods are, in order 0.93, 0.84, 0.75, and 0.76; and with the edge-sharpness measure of acutance, the values are 0.79, 0.70, 0.69, and 0.74. Although the features with which improvement is obtained with the committee machines are not the same as those that provided the maximal value of A(z) (A(z) = 0.99 with some shape features, with or without the committee machine), they correspond to features that are not critically dependent on the accuracy of the boundaries of the masses, which is an important result. (c) 2008 SPIE and IS&T.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aims. In this work, we describe the pipeline for the fast supervised classification of light curves observed by the CoRoT exoplanet CCDs. We present the classification results obtained for the first four measured fields, which represent a one-year in-orbit operation. Methods. The basis of the adopted supervised classification methodology has been described in detail in a previous paper, as is its application to the OGLE database. Here, we present the modifications of the algorithms and of the training set to optimize the performance when applied to the CoRoT data. Results. Classification results are presented for the observed fields IRa01, SRc01, LRc01, and LRa01 of the CoRoT mission. Statistics on the number of variables and the number of objects per class are given and typical light curves of high-probability candidates are shown. We also report on new stellar variability types discovered in the CoRoT data. The full classification results are publicly available.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of semialgebraic Lipschitz classification of quasihomogeneous polynomials on a Holder triangle is studied. For this problem, the ""moduli"" are described completely in certain combinatorial terms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Quality control of toys for avoiding children exposure to potentially toxic elements is of utmost relevance and it is a common requirement in national and/or international norms for health and safety reasons. Laser-induced breakdown spectroscopy (LIBS) was recently evaluated at authors` laboratory for direct analysis of plastic toys and one of the main difficulties for the determination of Cd. Cr and Pb was the variety of mixtures and types of polymers. As most norms rely on migration (lixiviation) protocols, chemometric classification models from LIBS spectra were tested for sampling toys that present potential risk of Cd, Cr and Pb contamination. The classification models were generated from the emission spectra of 51 polymeric toys and by using Partial Least Squares - Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA) and K-Nearest Neighbor (KNN). The classification models and validations were carried out with 40 and 11 test samples, respectively. Best results were obtained when KNN was used, with corrected predictions varying from 95% for Cd to 100% for Cr and Pb. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.