936 resultados para Pattern recognition, cluster finding, calibration and fitting methods
Resumo:
Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Background: the E-cadherin gene (CDH1) maps, at chromosome 16q22.1, a region often associated with loss of heterozygosity (LOH) in human breast cancer. LOH at this site is thought to lead to loss of function of this tumor suppressor gene and was correlated with decreased disease-free survival, poor prognosis, and metastasis. Differential CpG island methylation in the promoter region of the CDH1 gene might be an alternative way for the loss of expression and function of E-cadherin, leading to loss of tissue integrity, an essential step in tumor progression.Methods: the aim of our study was to assess, by Methylation-Specific Polymerase Chain Reaction (MSP), the methylation pattern of the CDH1 gene and its possible correlation with the expression of E-cadherin and other standard immunohistochemical parameters (Her-2, ER, PgR, p53, and K-67) in a series of 79 primary breast cancers ( 71 infiltrating ductal, 5 infiltrating lobular, 1 metaplastic, 1 apocrine, and 1 papillary carcinoma).Results: CDH1 hypermethylation was observed in 72% of the cases including 52/71 ductal, 4/5 lobular carcinomas and 1 apocrine carcinoma. Reduced levels of E-cadherin protein were observed in 85% of our samples. Although not statistically significant, the levels of E-cadherin expression tended to diminish with the CDH1 promoter region methylation. In the group of 71 ductal cancinomas, most of the cases of showing CDH1 hypermethylation also presented reduced levels of expression of ER and PgR proteins, and a possible association was observed between CDH1 methylation and ER expression ( p = 0.0301, Fisher's exact test). However, this finding was not considered significant after Bonferroni correction of p-value.Conclusion: Our preliminary findings suggested that abnormal CDH1 methylation occurs in high frequencies in infiltrating breast cancers associated with a decrease in E-cadherin expression in a subgroup of cases characterized by loss of expression of other important genes to the mammary carcinogenesis process, probably due to the disruption of the mechanism of maintenance of DNA methylation in tumoral cells.
Resumo:
A set of 25 quinone compounds with anti-trypanocidal activity was studied by using the density functional theory (DFT) method in order to calculate atomic and molecular properties to be correlated with the biological activity. The chemometric methods principal component analysis (PCA), hierarchical cluster analysis (HCA), stepwise discriminant analysis (SDA), Kth nearest neighbor (KNN) and soft independent modeling of class analogy (SIMCA) were used to obtain possible relationships between the calculated descriptors and the biological activity studied and to predict the anti-trypanocidal activity of new quinone compounds from a prediction set. Four descriptors were responsible for the separation between the active and inactive compounds: T-5 (torsion angle), QTS1 (sum of absolute values of the atomic charges), VOLS2 (volume of the substituent at region B) and HOMO-1 (energy of the molecular orbital below HOMO). These descriptors give information on the kind of interaction that occurs between the compounds and the biological receptor. The prediction study was done with a set of three new compounds by using the PCA, HCA, SDA, KNN and SIMCA methods and two of them were predicted as active against the Trypanosoma cruzi. (c) 2005 Elsevier SAS. All rights reserved.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The applications of Automatic Vowel Recognition (AVR), which is a sub-part of fundamental importance in most of the speech processing systems, vary from automatic interpretation of spoken language to biometrics. State-of-the-art systems for AVR are based on traditional machine learning models such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), however, such classifiers can not deal with efficiency and effectiveness at the same time, existing a gap to be explored when real-time processing is required. In this work, we present an algorithm for AVR based on the Optimum-Path Forest (OPF), which is an emergent pattern recognition technique recently introduced in literature. Adopting a supervised training procedure and using speech tags from two public datasets, we observed that OPF has outperformed ANNs, SVMs, plus other classifiers, in terms of training time and accuracy. ©2010 IEEE.
Resumo:
The Optimum-Path Forest (OPF) classifier is a recent and promising method for pattern recognition, with a fast training algorithm and good accuracy results. Therefore, the investigation of a combining method for this kind of classifier can be important for many applications. In this paper we report a fast method to combine OPF-based classifiers trained with disjoint training subsets. Given a fixed number of subsets, the algorithm chooses random samples, without replacement, from the original training set. Each subset accuracy is improved by a learning procedure. The final decision is given by majority vote. Experiments with simulated and real data sets showed that the proposed combining method is more efficient and effective than naive approach provided some conditions. It was also showed that OPF training step runs faster for a series of small subsets than for the whole training set. The combining scheme was also designed to support parallel or distributed processing, speeding up the procedure even more. © 2011 Springer-Verlag.
Resumo:
The research on multiple classifiers systems includes the creation of an ensemble of classifiers and the proper combination of the decisions. In order to combine the decisions given by classifiers, methods related to fixed rules and decision templates are often used. Therefore, the influence and relationship between classifier decisions are often not considered in the combination schemes. In this paper we propose a framework to combine classifiers using a decision graph under a random field model and a game strategy approach to obtain the final decision. The results of combining Optimum-Path Forest (OPF) classifiers using the proposed model are reported, obtaining good performance in experiments using simulated and real data sets. The results encourage the combination of OPF ensembles and the framework to design multiple classifier systems. © 2011 Springer-Verlag.
Resumo:
Tropical rain forest conservation requires a good understanding of plant-animal interactions. Seed dispersal provides a means for plant seeds to escape competition and density-dependent seed predators and pathogens and to colonize new habitats. This makes the role and effectiveness of frugivorous species in the seed dispersal process an important topic. Northern pigtailed macaques (Macaca leonina) may be effective seed dispersers because they have a diverse diet and process seeds in several ways (swallowing, spitting out, or dropping them). To investigate the seed dispersal effectiveness of a habituated group of pigtailed macaques in Khao Yai National Park, Thailand, we examined seed dispersal quantity (number of fruit species eaten, proportion in the diet, number of feces containing seeds, and number of seeds processed) and quality (processing methods used, seed viability and germination success, habitat type and distance from parent tree for the deposited seeds, and dispersal patterns) via focal and scan sampling, seed collection, and germination tests. We found thousands of seeds per feces, including seeds up to 58 mm in length and from 88 fruit species. Importantly, the macaques dispersed seeds from primary to secondary forests, via swallowing, spitting, and dropping. Of 21 species, the effect of swallowing and spitting was positive for two species (i. e., processed seeds had a higher % germination and % viability than control seeds), neutral for 13 species (no difference in % germination or viability), and negative (processed seeds had lower % germination and viability) for five species. For the final species, the effect was neutral for spat-out seeds but negative for swallowed seeds. We conclude that macaques are effective seed dispersers in both quantitative and qualitative terms and that they are of potential importance for tropical rain forest regeneration. © 2013 Springer Science+Business Media New York.
Resumo:
O método do orbital molecular AM1 foi empregado para calcular um conjunto de descritores moleculares para vinte neolignanas sintéticas com atividade anti-esquistossomose. O método de reconhecimento de padrão (análise de componentes principais ACP, análise de conglomerados AC e análise de discriminante) foi utilizado para obter a relação entre a estrutura molecular e a atividade biológica. O conjunto de moléculas foi classificado em dois grupos de acordo com seus graus de atividade biológica. Estes resultados permitem que, projete-se racionalmente novos compostos, potenciais candidatos à síntese e à avaliação biológica.
Resumo:
Prostate cancer is a serious public health problem accounting for up to 30% of clinical tumors in men. The diagnosis of this disease is made with clinical, laboratorial and radiological exams, which may indicate the need for transrectal biopsy. Prostate biopsies are discerningly evaluated by pathologists in an attempt to determine the most appropriate conduct. This paper presents a set of techniques for identifying and quantifying regions of interest in prostatic images. Analyses were performed using multi-scale lacunarity and distinct classification methods: decision tree, support vector machine and polynomial classifier. The performance evaluation measures were based on area under the receiver operating characteristic curve (AUC). The most appropriate region for distinguishing the different tissues (normal, hyperplastic and neoplasic) was defined: the corresponding lacunarity values and a rule's model were obtained considering combinations commonly explored by specialists in clinical practice. The best discriminative values (AUC) were 0.906, 0.891 and 0.859 between neoplasic versus normal, neoplasic versus hyperplastic and hyperplastic versus normal groups, respectively. The proposed protocol offers the advantage of making the findings comprehensible to pathologists. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
Thiosemicarbazones are cruzain inhibitors which have been identified as potential antitrypanosomal agents. In this work, several molecular properties were calculated at the density functional theory (DFT)/B3LYP/6-311G* level for a set of 44 thiosemicarbazones. Unsupervised and supervised pattern recognition techniques (hierarchical cluster analysis, principal component analysis, kth-nearest neighbors, and soft independent modeling by class analogy) were used to obtain structureactivity relationship models, which are able to classify unknown compounds according to their activities. The chemometric analyses performed here revealed that 12 descriptors can be considered responsible for the discrimination between high and low activity compounds. Classification models were validated with an external test set, showing that predictive classifications were achieved with the selected variable set. The results obtained here are in good agreement with previous findings from the literature, suggesting that our models can be useful on further investigations on the molecular determinants for the antichagasic activity. (C) 2012 Wiley Periodicals, Inc.
Resumo:
The development of new statistical and computational methods is increasingly making it possible to bridge the gap between hard sciences and humanities. In this study, we propose an approach based on a quantitative evaluation of attributes of objects in fields of humanities, from which concepts such as dialectics and opposition are formally defined mathematically. As case studies, we analyzed the temporal evolution of classical music and philosophy by obtaining data for 8 features characterizing the corresponding fields for 7 well-known composers and philosophers, which were treated with multivariate statistics and pattern recognition methods. A bootstrap method was applied to avoid statistical bias caused by the small sample data set, with which hundreds of artificial composers and philosophers were generated, influenced by the 7 names originally chosen. Upon defining indices for opposition, skewness and counter-dialectics, we confirmed the intuitive analysis of historians in that classical music evolved according to a master apprentice tradition, while in philosophy changes were driven by opposition. Though these case studies were meant only to show the possibility of treating phenomena in humanities quantitatively, including a quantitative measure of concepts such as dialectics and opposition, the results are encouraging for further application of the approach presented here to many other areas, since it is entirely generic.
Resumo:
In this paper, we report our initial research to obtain hexagonal rod-like elongated silver tungstate (alpha-Ag2WO4) microcrystals by different methods [sonochemistry (SC), coprecipitation (CP), and conventional hydrothermal (CH)] and to study their cluster coordination and optical properties. These microcrystals were structurally characterized by X-ray diffraction (XRD), Rietveld refinements, Fourier transform infrared (FT-IR), X-ray absorption near-edge structure (XANES), and extended X-ray absorption fine structure (EXAFS) spectroscopies. The shape and average size of these alpha-Ag2WO4 microcrystals were observed by field-emission scanning electron microscopy (FE-SEM). The optical properties of these microcrystals were investigated by ultraviolet-visible (UV-vis) spectroscopy and photoluminescence (PL) measurements. XRD patterns and Rietveld refinement data confirmed that alpha-Ag2WO4 microcrystals have an orthorhombic structure. FT-IR spectra exhibited four IR-active modes in a range from 250 to 1000 cm(-1). XANES spectra at the W L-3-edge showed distorted octahedral [WO6] clusters in the lattice, while EXAFS analyses confirmed that W atoms are coordinated by six O atoms. FE-SEM images suggest that the alpha-Ag2WO4 microcrystals grow by aggregation and the Ostwald ripening process. PL properties of alpha-Ag2WO4 microcrystals decrease with an increase in the optical band-gap values (3.19-3.23 eV). Finally, we observed that large hexagonal rod-like alpha-Ag2WO4 microcrystals prepared by the SC method exhibited a major PL emission intensity relative to alpha-Ag2WO4 microcrystals prepared by the CP and CH methods.
Resumo:
Gunshot residues (GSR) can be used in forensic evaluations to obtain information about the type of gun and ammunition used in a crime. In this work, we present our efforts to develop a promising new method to discriminate the type of gun [four different guns were used: two handguns (0.38 revolver and 0.380 pistol) and two long-barrelled guns (12-calibre pump-action shotgun and 0.38 repeating rifle)] and ammunition (five different types: normal, semi-jacketed, full-jacketed, green, and 3T) used by a suspect. The proposed approach is based on information obtained from cyclic voltammograms recorded in solutions containing GSR collected from the hands of the shooters, using a gold microelectrode; the information was further analysed by non-supervised pattern-recognition methods [(Principal Component Analysis (PCA) and Hierarchical Cluster Analysis (HCA)]. In all cases (gun and ammunition discrimination), good separation among different samples in the score plots and dendrograms was achieved. (C) 2012 Elsevier B.V. All rights reserved.