32 resultados para Data classification
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
This Project aims to develop methods for data classification in a Data Warehouse for decision-making purposes. We also have as another goal the reduction of an attribute set in a Data Warehouse, in which a given reduced set is capable of keeping the same properties of the original one. Once we achieve a reduced set, we have a smaller computational cost of processing, we are able to identify non-relevant attributes to certain kinds of situations, and finally we are also able to recognize patterns in the database that will help us to take decisions. In order to achieve these main objectives, it will be implemented the Rough Sets algorithm. We chose PostgreSQL as our data base management system due to its efficiency, consolidation and finally, it’s an open-source system (free distribution)
Resumo:
Pós-graduação em Agronomia (Energia na Agricultura) - FCA
Resumo:
In general, pattern recognition techniques require a high computational burden for learning the discriminating functions that are responsible to separate samples from distinct classes. As such, there are several studies that make effort to employ machine learning algorithms in the context of big data classification problems. The research on this area ranges from Graphics Processing Units-based implementations to mathematical optimizations, being the main drawback of the former approaches to be dependent on the graphic video card. Here, we propose an architecture-independent optimization approach for the optimum-path forest (OPF) classifier, that is designed using a theoretical formulation that relates the minimum spanning tree with the minimum spanning forest generated by the OPF over the training dataset. The experiments have shown that the approach proposed can be faster than the traditional one in five public datasets, being also as accurate as the original OPF. (C) 2014 Elsevier B. V. All rights reserved.
Resumo:
Since the beginning, some pattern recognition techniques have faced the problem of high computational burden for dataset learning. Among the most widely used techniques, we may highlight Support Vector Machines (SVM), which have obtained very promising results for data classification. However, this classifier requires an expensive training phase, which is dominated by a parameter optimization that aims to make SVM less prone to errors over the training set. In this paper, we model the problem of finding such parameters as a metaheuristic-based optimization task, which is performed through Harmony Search (HS) and some of its variants. The experimental results have showen the robustness of HS-based approaches for such task in comparison against with an exhaustive (grid) search, and also a Particle Swarm Optimization-based implementation.
Resumo:
This article deals with classification problems involving unequal probabilities in each class and discusses metrics to systems that use multilayer perceptrons neural networks (MLP) for the task of classifying new patterns. In addition we propose three new pruning methods that were compared to other seven existing methods in the literature for MLP networks. All pruning algorithms presented in this paper have been modified by the authors to do pruning of neurons, in order to produce fully connected MLP networks but being small in its intermediary layer. Experiments were carried out involving the E. coli unbalanced classification problem and ten pruning methods. The proposed methods had obtained good results, actually, better results than another pruning methods previously defined at the MLP neural network area. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
The purpose of this paper was to evaluate attributes derived from fully polarimetric PALSAR data to discriminate and map macrophyte species in the Amazon floodplain wetlands. Fieldwork was carried out almost simultaneously to the radar acquisition, and macrophyte biomass and morphological variables were measured in the field. Attributes were calculated from the covariance matrix [C] derived from the single-look complex data. Image attributes and macrophyte variables were compared and analyzed to investigate the sensitivity of the attributes for discriminating among species. Based on these analyses, a rule-based classification was applied to map macrophyte species. Other classification approaches were tested and compared to the rule-based method: a classification based on the Freeman-Durden and Cloude-Pottier decomposition models, a hybrid classification (Wishart classifier with the input classes based on the H/a plane), and a statistical-based classification (supervised classification using Wishart distance measures). The findings show that attributes derived from fully polarimetric L-band data have good potential for discriminating herbaceous plant species based on morphology and that estimation of plant biomass and productivity could be improved by using these polarimetric attributes.
Resumo:
This paper reports on a sensor array able to distinguish tastes and used to classify red wines. The array comprises sensing units made from Langmuir-Blodgett (LB) films of conducting polymers and lipids and layer-by-layer (LBL) films from chitosan deposited onto gold interdigitated electrodes. Using impedance spectroscopy as the principle of detection, we show that distinct clusters can be identified in principal component analysis (PCA) plots for six types of red wine. Distinction can be made with regard to vintage, vineyard and brands of the red wine. Furthermore, if the data are treated with artificial neural networks (ANNs), this artificial tongue can identify wine samples stored under different conditions. This is illustrated by considering 900 wine samples, obtained with 30 measurements for each of the five bottles of the six wines, which could be recognised with 100% accuracy using the algorithms Standard Backpropagation and Backpropagation momentum in the ANNs. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
ObjectiveTo describe onset features, classification and treatment of juvenile dermatomyositis (JDM) and juvenile polymyositis (JPM) from a multicentre registry.MethodsInclusion criteria were onset age lower than 18 years and a diagnosis of any idiopathic inflammatory myopathy (IIM) by attending physician. Bohan & Peter (1975) criteria categorisation was established by a scoring algorithm to define JDM and JPM based oil clinical protocol data.ResultsOf the 189 cases included, 178 were classified as JDM, 9 as JPM (19.8: 1) and 2 did not fit the criteria; 6.9% had features of chronic arthritis and connective tissue disease overlap. Diagnosis classification agreement occurred in 66.1%. Medial? onset age was 7 years, median follow-up duration was 3.6 years. Malignancy was described in 2 (1.1%) cases. Muscle weakness occurred in 95.8%; heliotrope rash 83.5%; Gottron plaques 83.1%; 92% had at least one abnormal muscle enzyme result. Muscle biopsy performed in 74.6% was abnormal in 91.5% and electromyogram performed in 39.2% resulted abnormal in 93.2%. Logistic regression analysis was done in 66 cases with all parameters assessed and only aldolase resulted significant, as independent variable for definite JDM (OR=5.4, 95%CI 1.2-24.4, p=0.03). Regarding treatment, 97.9% received steroids; 72% had in addition at least one: methotrexate (75.7%), hydroxychloroquine (64.7%), cyclosporine A (20.6%), IV immunoglobulin (20.6%), azathioprine (10.3%) or cyclophosphamide (9.6%). In this series 24.3% developed calcinosis and mortality rate was 4.2%.ConclusionEvaluation of predefined criteria set for a valid diagnosis indicated aldolase as the most important parameter associated with de, methotrexate combination, was the most indicated treatment.
Resumo:
The pipe flow of a viscous-oil-gas-water mixture such as that involved in heavy oil production is a rather complex thereto-fluid dynamical problem. Considering the complexity of three-phase flow, it is of fundamental importance the introduction of a flow pattern classification tool to obtain useful information about the flow structure. Flow patterns are important because they indicate the degree of mixing during flow and the spatial distribution of phases. In particular, the pressure drop and temperature evolution along the pipe is highly dependent on the spatial configuration of the phases. In this work we investigate the three-phase water-assisted flow patterns, i.e. those configurations where water is injected in order to reduce friction caused by the viscous oil. Phase flow rates and pressure drop data from previous laboratory experiments in a horizontal pipe are used for flow pattern identification by means of the 'support vector machine' technique (SVM).
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This paper describes the application of artificial neural nets as an alternative and efficient method for the classification of botanical taxa based on chemical data (chemosystematics). A total of 28,000 botanical occurrences of chemical compounds isolated from the Asteraceae family were chosen from the literature, and grouped by chemical class for each species. Four tests were carried out to differentiate and classify different botanical taxa. The qualifying capacity of the artificial neural nets was dichotomically tested at different hierarchical levels of the family, such as subfamilies and groups of Heliantheae subtribes. Furthermore, two specific subtribes of the Heliantheae and two genera of one of these subtribes were also tested. In general, the artificial neural net gave rise to good results, with multiple-correlation values R > 0.90. Hence, it was possible to differentiate the dichotomic character of the botanical taxa studied.
Resumo:
A total of 2400 samples of commercial Brazilian C gasoline were collected over a 6-month period from different gas stations in the São Paulo state, Brazil, and analysed with respect to 12 physicochemical parameters according to regulation 309 of the Brazilian Government Petroleum, Natural Gas and Biofuels Agency (ANP). The percentages (v/v) of hydrocarbons (olefins, aromatics and saturated) were also determined. Hierarchical cluster analysis (HCA) was employed to select 150 representative samples that exhibited least similarity on the basis of their physicochemical parameters and hydrocarbon compositions. The chromatographic profiles of the selected samples were measured by gas chromatography with flame ionisation detection and analysed using soft independent modelling of class analogy (SIMCA) method in order to create a classification scheme to identify conform gasolines according to ANP 309 regulation. Following the optimisation of the SIMCA algorithm, it was possible to classify correctly 96% of the commercial gasoline samples present in the training set of 100. In order to check the quality of the model, an external group of 50 gasoline samples (the prediction set) were analysed and the developed SIMCA model classified 94% of these correctly. The developed chemometric method is recommended for screening commercial gasoline quality and detection of potential adulteration. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
In this paper we focus on providing coordinated visual strategies to assist users in performing tasks driven by the presence of temporal and spatial attributes. We introduce temporal visualization techniques targeted at such tasks, and illustrate their use with an application involving a climate classification process. The climate classification requires extensive Processing of a database containing daily rain precipitation values collected along over fifty years at several spatial locations in the São Paulo state, Brazil. We identify user exploration tasks typically conducted as part of the data preparation required in this process, and then describe how such tasks may be assisted by the multiple visual techniques provided. Issues related to the use of the multiple techniques by an end-user are also discussed.