10 resultados para Classification Methods
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.
Resumo:
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.
Resumo:
In this paper, we present a study on a deterministic partially self-avoiding walk (tourist walk), which provides a novel method for texture feature extraction. The method is able to explore an image on all scales simultaneously. Experiments were conducted using different dynamics concerning the tourist walk. A new strategy, based on histograms. to extract information from its joint probability distribution is presented. The promising results are discussed and compared to the best-known methods for texture description reported in the literature. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
The topology of real-world complex networks, such as in transportation and communication, is always changing with time. Such changes can arise not only as a natural consequence of their growth, but also due to major modi. cations in their intrinsic organization. For instance, the network of transportation routes between cities and towns ( hence locations) of a given country undergo a major change with the progressive implementation of commercial air transportation. While the locations could be originally interconnected through highways ( paths, giving rise to geographical networks), transportation between those sites progressively shifted or was complemented by air transportation, with scale free characteristics. In the present work we introduce the path-star transformation ( in its uniform and preferential versions) as a means to model such network transformations where paths give rise to stars of connectivity. It is also shown, through optimal multivariate statistical methods (i.e. canonical projections and maximum likelihood classification) that while the US highways network adheres closely to a geographical network model, its path-star transformation yields a network whose topological properties closely resembles those of the respective airport transportation network.
Resumo:
Shape provides one of the most relevant information about an object. This makes shape one of the most important visual attributes used to characterize objects. This paper introduces a novel approach for shape characterization, which combines modeling shape into a complex network and the analysis of its complexity in a dynamic evolution context. Descriptors computed through this approach show to be efficient in shape characterization, incorporating many characteristics, such as scale and rotation invariant. Experiments using two different shape databases (an artificial shapes database and a leaf shape database) are presented in order to evaluate the method. and its results are compared to traditional shape analysis methods found in literature. (C) 2009 Published by Elsevier B.V.
Resumo:
Differently from theoretical scale-free networks, most real networks present multi-scale behavior, with nodes structured in different types of functional groups and communities. While the majority of approaches for classification of nodes in a complex network has relied on local measurements of the topology/connectivity around each node, valuable information about node functionality can be obtained by concentric (or hierarchical) measurements. This paper extends previous methodologies based on concentric measurements, by studying the possibility of using agglomerative clustering methods, in order to obtain a set of functional groups of nodes, considering particular institutional collaboration network nodes, including various known communities (departments of the University of Sao Paulo). Among the interesting obtained findings, we emphasize the scale-free nature of the network obtained, as well as identification of different patterns of authorship emerging from different areas (e.g. human and exact sciences). Another interesting result concerns the relatively uniform distribution of hubs along concentric levels, contrariwise to the non-uniform pattern found in theoretical scale-free networks such as the BA model. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
In this paper we present a novel approach for multispectral image contextual classification by combining iterative combinatorial optimization algorithms. The pixel-wise decision rule is defined using a Bayesian approach to combine two MRF models: a Gaussian Markov Random Field (GMRF) for the observations (likelihood) and a Potts model for the a priori knowledge, to regularize the solution in the presence of noisy data. Hence, the classification problem is stated according to a Maximum a Posteriori (MAP) framework. In order to approximate the MAP solution we apply several combinatorial optimization methods using multiple simultaneous initializations, making the solution less sensitive to the initial conditions and reducing both computational cost and time in comparison to Simulated Annealing, often unfeasible in many real image processing applications. Markov Random Field model parameters are estimated by Maximum Pseudo-Likelihood (MPL) approach, avoiding manual adjustments in the choice of the regularization parameters. Asymptotic evaluations assess the accuracy of the proposed parameter estimation procedure. To test and evaluate the proposed classification method, we adopt metrics for quantitative performance assessment (Cohen`s Kappa coefficient), allowing a robust and accurate statistical analysis. The obtained results clearly show that combining sub-optimal contextual algorithms significantly improves the classification performance, indicating the effectiveness of the proposed methodology. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
BACKGROUND: A major problem in Chagas disease donor screening is the high frequency of samples with inconclusive results. The objective of this study was to describe patterns of serologic results among donors to the three Brazilian REDS-II blood centers and correlate with epidemiologic characteristics. STUDY DESIGN AND METHODS: The centers screened donor samples with one Trypanosoma cruzi lysate enzyme immunoassay (EIA). EIA-reactive samples were tested with a second lysate EIA, a recombinant-antigen based EIA, and an immunfluorescence assay. Based on the serologic results, samples were classified as confirmed positive (CP), probable positive (PP), possible other parasitic infection (POPI), and false positive (FP). RESULTS: In 2007 to 2008, a total of 877 of 615,433 donations were discarded due to Chagas assay reactivity. The prevalences (95% confidence intervals [CIs]) among first-time donors for CP, PP, POPI, and FP patterns were 114 (99-129), 26 (19-34), 10 (5-14), and 96 (82-110) per 100,000 donations, respectively. CP and PP had similar patterns of prevalence when analyzed by age, sex, education, and location, suggesting that PP cases represent true T. cruzi infections; in contrast the demographics of donors with POPI were distinct and likely unrelated to Chagas disease. No CP cases were detected among 218,514 repeat donors followed for a total of 718,187 person-years. CONCLUSION: We have proposed a classification algorithm that may have practical importance for donor counseling and epidemiologic analyses of T. cruzi-seroreactive donors. The absence of incident T. cruzi infections is reassuring with respect to risk of window phase infections within Brazil and travel-related infections in nonendemic countries such as the United States.
Resumo:
Coconut water is a natural isotonic, nutritive, and low-caloric drink. Preservation process is necessary to increase its shelf life outside the fruit and to improve commercialization. However, the influence of the conservation processes, antioxidant addition, maturation time, and soil where coconut is cultivated on the chemical composition of coconut water has had few arguments and studies. For these reasons, an evaluation of coconut waters (unprocessed and processed) was carried out using Ca, Cu, Fe, K, Mg, Mn, Na, Zn, chloride, sulfate, phosphate, malate, and ascorbate concentrations and chemometric tools. The quantitative determinations were performed by electrothermal atomic absorption spectrometry, inductively coupled plasma optical emission spectrometry, and capillary electrophoresis. The results showed that Ca, K, and Zn concentrations did not present significant alterations between the samples. The ranges of Cu, Fe, Mg, Mn, PO (4) (3-) , and SO (4) (2-) concentrations were as follows: Cu (3.1-120 A mu g L(-1)), Fe (60-330 A mu g L(-1)), Mg (48-123 mg L(-1)), Mn (0.4-4.0 mg L(-1)), PO (4) (3-) (55-212 mg L(-1)), and SO (4) (2-) (19-136 mg L(-1)). The principal component analysis (PCA) and hierarchical cluster analysis (HCA) were applied to differentiate unprocessed and processed samples. Multivariated analysis (PCA and HCA) were compared through one-way analysis of variance with Tukey-Kramer multiple comparisons test, and p values less than 0.05 were considered to be significant.
Resumo:
Chemometric methods can contribute to soil research by permitting the extraction of more information from the data. The aim of this work was to use Principal Component Analysis to evaluate data obtained through chemical and spectroscopic methods on the changes in the humification process of soil organic matter from two tropical soils after sewage sludge application. In this case, humic acids extracted from Typic Eutrorthox and Typic Haplorthox soils with and without sewage sludge application for 7 consecutive years were studied. The results obtained for all of the samples and methods showed two clusters: samples extracted from the two soil types. These expected results indicated the textural difference between the two soils was more significant than the differences between treatments (control and sewage sludge application) or between depths. In this case, an individual chemometric treatment was made for each type of soil. It was noted that the characterization of the humic acids extracted from soils with and without sewage sludge application after 7 consecutive years using several methods supplies important results about changes in the humification degree of soil organic matter, These important result obtained by Principal Component Analysis justify further research using these methods to characterize the changes in the humic acids extracted from sewage sludge-amended soils. (C) 2009 Elsevier B.V. All rights reserved.