841 resultados para Imbalanced datasets
Resumo:
There is an increasing interest in the application of Evolutionary Algorithms (EAs) to induce classification rules. This hybrid approach can benefit areas where classical methods for rule induction have not been very successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when one or more classes heavily outnumber other classes. Frequently, classical machine learning (ML) classifiers are not able to learn in the presence of imbalanced data sets, inducing classification models that always predict the most numerous classes. In this work, we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are fed to classical ML systems that produce rule sets. The rule sets are combined creating a pool of rules and an EA is used to build a classifier from this pool of rules. This hybrid approach has some advantages over undersampling, since it reduces the amount of discarded information, and some advantages over oversampling, since it avoids overfitting. The proposed approach was experimentally analysed and the experimental results show an improvement in the classification performance measured as the area under the receiver operating characteristics (ROC) curve.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Zones of mixing between shallow groundwaters of different composition were unravelled by two-way regionalized classification, a technique based on correspondence analysis (CA), cluster analysis (ClA) and discriminant analysis (DA), aided by gridding, map-overlay and contouring tools. The shallow groundwaters are from a granitoid plutonite in the Funda o region (central Portugal). Correspondence analysis detected three natural clusters in the working dataset: 1, weathering; 2, domestic effluents; 3, fertilizers. Cluster analysis set an alternative distribution of the samples by the three clusters. Group memberships obtained by correspondence analysis and by cluster analysis were optimized by discriminant analysis, gridded memberships as follows: codes 1, 2 or 3 were used when classification by correspondence analysis and cluster analysis produced the same results; code 0 when the grid node was first assigned to cluster 1 and then to cluster 2 or vice versa (mixing between weathering and effluents); code 4 in the other cases (mixing between agriculture and the other influences). Code-3 areas were systematically surrounded by code-4 areas, an observation attributed to hydrodynamic dispersion. Accordingly, the extent of code-4 areas in two orthogonal directions was assumed proportional to the longitudinal and transverse dispersivities of local soils. The results (0.7-16.8 and 0.4-4.3 m, respectively) are acceptable at the macroscopic scale. The ratios between longitudinal and transverse dispersivities (1.2-11.1) are also in agreement with results obtained by other studies.
Resumo:
Traditional pattern recognition techniques can not handle the classification of large datasets with both efficiency and effectiveness. In this context, the Optimum-Path Forest (OPF) classifier was recently introduced, trying to achieve high recognition rates and low computational cost. Although OPF was much faster than Support Vector Machines for training, it was slightly slower for classification. In this paper, we present the Efficient OPF (EOPF), which is an enhanced and faster version of the traditional OPF, and validate it for the automatic recognition of white matter and gray matter in magnetic resonance images of the human brain. © 2010 IEEE.
Resumo:
In this work, a new approach for supervised pattern recognition is presented which improves the learning algorithm of the Optimum-Path Forest classifier (OPF), centered on detection and elimination of outliers in the training set. Identification of outliers is based on a penalty computed for each sample in the training set from the corresponding number of imputable false positive and false negative classification of samples. This approach enhances the accuracy of OPF while still gaining in classification time, at the expense of a slight increase in training time. © 2010 Springer-Verlag.
Resumo:
The main purpose of this work is to report the presence of spurious discontinuities in the pattern of diurnal variation of sea level pressure of the three reanalysis datasets from: the National Centers for Environmental Prediction (NCEP) and National Center for Atmospheric Science (R1), the NCEP and Department of Energy (R2), and the European Centre for Medium Range Weather Forecasting (ERA-40). Such discontinuities can be connected to the major changes in the global observing system that have occurred throughout reanalyses years. In the R1, the richest period in discontinuities is 1956-1958, coinciding with the start of modern radiosonde observation network. Rapid increase in the density of surface-based observations from 1967 also had an important impact on both R1 and ERA-40, with larger impact on R1. The reanalyses show discontinuities in the 1970s related to the assimilation of radiances measured by the Vertical Temperature Profile Radiometer and TIROS-N Operational Vertical Sounders onboard satellites. In the ERA-40, which additionally assimilated Special Sensor Microwave/Imager data, there are discontinuities in 1987-1989. The R1 also presents further discontinuities, in 1988-1993 likely connected to replacement/introduction of NOAA-series satellites with different biases, and to the volcanic eruption of Mount Pinatubo in June 1991, which is known to have severely affected measurements of infrared radiances for several years. The discontinuities in 1996-1998 might be partially connected to change in the type of radiosonde, from VIZ-B to VIZ-B2. The R2, which covers only satellite era (1979-on), shows discontinuities mainly in 1992, 1996-1997, and 2001. The discontinuities in 1992 and 2001 might have been caused by change in the satellite measurements and those in 1996-1997 by some changes in land-based observations network. © 2012 Springer-Verlag.
Resumo:
We consider three-body systems in two dimensions with zero-range interactions for general masses and interaction strengths. The momentum-space Schrödinger equation is solved numerically and in the Born-Oppenheimer (BO) approximation. The BO expression is derived using separable potentials and yields a concise adiabatic potential between the two heavy particles. The BO potential is Coulomb-like and exponentially decreasing at small and large distances, respectively. While we find similar qualitative features to previous studies, we find important quantitative differences. Our results demonstrate that mass-imbalanced systems that are accessible in the field of ultracold atomic gases can have a rich three-body bound state spectrum in two-dimensional geometries. Small light-heavy mass ratios increase the number of bound states. For 87Rb-87Rb-6Li and 133Cs- 133Cs-6Li we find respectively three and four bound states. © 2013 IOP Publishing Ltd.
Resumo:
Includes bibliography
Resumo:
As part of ongoing efforts to strengthen the statistical capacities of National Statistical Offices (NSOs) in the region, the Economic Commission for Latin America and the Caribbean (ECLAC) convened a two-day Regional Training Workshop on Data Sharing, Data Ownership and Harmonization of Survey Datasets on 26-27 August 2009 at the Cascadia Hotel, Trinidad and Tobago. This workshop was one of the concluding activities of the Project on Improving Household Surveys in the Caribbean which has been implemented by the ECLAC Subregional office from 2007.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The pathogenic mechanisms of thromboangiitis obliterans (TAO) are not entirely known and the imbalance of matrix metalloproteinases (MMPs) plays a role in vascular diseases. We evaluated the MMP-2 and MMP-9 circulating levels and their endogenous tissue inhibitors of metalloproteinases (TIMP-1 and TIMP-2) in TAO patients with clinical manifestations. The study included 20 TAO patients (n = 10 female, n = 10 male) aged 38-59 years under clinical follow-up. The patients were classified into two groups: (1) TAO former smokers (n = 11) and (2) TAO active smokers (n = 9); the control group included normal volunteer non-smokers (n = 10) and active smokers without peripheral artery disease (n = 10). Patient plasma samples were used to analyze MMP-2 and MMP-9 levels using zymography, and TIMP-1 and TIMP-2 concentrations were determined by enzyme-linked immunosorbent assays. The analysis of MMP-2/TIMP-2 and MMP-9/TIMP-1 ratios (which were used as indices of net MMP-2 and MMP-9 activity, respectively) showed significantly higher MMP-9/TIMP-1 ratios in TAO patients (p < 0.05). We found no significant differences in MMP-2/TIMP-2 ratios (p > 0.05). We found higher MMP-9 levels and decreased levels of TIMP-1 in the TAO groups (active smokers and former smokers), especially in active smokers compared with the other groups (all p < 0.05). MMP-2 and TIMP-2 were not significantly different in patients with TAO as compared to the control group (p > 0.05). In conclusion, our results showed increased MMP-9 and reduced TIMP-1 activity in TAO patients, especially in active smokers compared with non-TAO patients. These data suggest that smoke compounds could activate MMP-9 production or inhibit TIMP-1 activity.
Resumo:
[EN]Gender recognition has achieved impressive results based on the face appearance in controlled datasets. Its application in the wild and large datasets is still a challenging task for researchers. In this paper, we make use of classical techniques to analyze their performance in controlled and uncontrolled condition respectively with the LFW and MORPH datasets. For both sets the benchmarking protocol follows the 5-fold cross-validation proposed by the BEFIT challenge.
Resumo:
[EN] This paper analyzes the detection and localization performance of the participating face and eye algorithms compared with the Viola Jones detector and four leading commercial face detectors. Performance is characterized under the different conditions and parameterized by per-image brightness and contrast. In localization accuracy for eyes, the groups/companies focusing on long-range face detection outperform leading commercial applications.