106 resultados para Misclassification
Resumo:
A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model's generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems.
Resumo:
Optimal estimation (OE) and probabilistic cloud screening were developed to provide lake surface water temperature (LSWT) estimates from the series of (advanced) along-track scanning radiometers (ATSRs). Variations in physical properties such as elevation, salinity, and atmospheric conditions are accounted for through the forward modelling of observed radiances. Therefore, the OE retrieval scheme developed is generic (i.e., applicable to all lakes). LSWTs were obtained for 258 of Earth's largest lakes from ATSR-2 and AATSR imagery from 1995 to 2009. Comparison to in situ observations from several lakes yields satellite in situ differences of −0.2 ± 0.7 K for daytime and −0.1 ± 0.5 K for nighttime observations (mean ± standard deviation). This compares with −0.05 ± 0.8 K for daytime and −0.1 ± 0.9 K for nighttime observations for previous methods based on operational sea surface temperature algorithms. The new approach also increases coverage (reducing misclassification of clear sky as cloud) and exhibits greater consistency between retrievals using different channel–view combinations. Empirical orthogonal function (EOF) techniques were applied to the LSWT retrievals (which contain gaps due to cloud cover) to reconstruct spatially and temporally complete time series of LSWT. The new LSWT observations and the EOF-based reconstructions offer benefits to numerical weather prediction, lake model validation, and improve our knowledge of the climatology of lakes globally. Both observations and reconstructions are publically available from http://hdl.handle.net/10283/88.
Resumo:
This contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to re-balance the class distribution of the original imbalanced data set under the principle that synthetic data sample follows the same statistical properties. Based on the over-sampled training data, the radial basis function (RBF) classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier’s structure and the parameters of RBF kernels are determined using a particle swarm optimisation algorithm based on the criterion of minimising the leave-one-out misclassification rate. The effectiveness of the proposed PDFOS approach is demonstrated by the empirical study on several imbalanced data sets.
Resumo:
In this paper artificial neural network (ANN) based on supervised and unsupervised algorithms were investigated for use in the study of rheological parameters of solid pharmaceutical excipients, in order to develop computational tools for manufacturing solid dosage forms. Among four supervised neural networks investigated, the best learning performance was achieved by a feedfoward multilayer perceptron whose architectures was composed by eight neurons in the input layer, sixteen neurons in the hidden layer and one neuron in the output layer. Learning and predictive performance relative to repose angle was poor while to Carr index and Hausner ratio (CI and HR, respectively) showed very good fitting capacity and learning, therefore HR and CI were considered suitable descriptors for the next stage of development of supervised ANNs. Clustering capacity was evaluated for five unsupervised strategies. Network based on purely unsupervised competitive strategies, classic "Winner-Take-All", "Frequency-Sensitive Competitive Learning" and "Rival-Penalize Competitive Learning" (WTA, FSCL and RPCL, respectively) were able to perform clustering from database, however this classification was very poor, showing severe classification errors by grouping data with conflicting properties into the same cluster or even the same neuron. On the other hand it could not be established what was the criteria adopted by the neural network for those clustering. Self-Organizing Maps (SOM) and Neural Gas (NG) networks showed better clustering capacity. Both have recognized the two major groupings of data corresponding to lactose (LAC) and cellulose (CEL). However, SOM showed some errors in classify data from minority excipients, magnesium stearate (EMG) , talc (TLC) and attapulgite (ATP). NG network in turn performed a very consistent classification of data and solve the misclassification of SOM, being the most appropriate network for classifying data of the study. The use of NG network in pharmaceutical technology was still unpublished. NG therefore has great potential for use in the development of software for use in automated classification systems of pharmaceutical powders and as a new tool for mining and clustering data in drug development
Resumo:
RePART (Reward/Punishment ART) is a neural model that constitutes a variation of the Fuzzy Artmap model. This network was proposed in order to minimize the inherent problems in the Artmap-based model, such as the proliferation of categories and misclassification. RePART makes use of additional mechanisms, such as an instance counting parameter, a reward/punishment process and a variable vigilance parameter. The instance counting parameter, for instance, aims to minimize the misclassification problem, which is a consequence of the sensitivity to the noises, frequently presents in Artmap-based models. On the other hand, the use of the variable vigilance parameter tries to smoouth out the category proliferation problem, which is inherent of Artmap-based models, decreasing the complexity of the net. RePART was originally proposed in order to minimize the aforementioned problems and it was shown to have better performance (higer accuracy and lower complexity) than Artmap-based models. This work proposes an investigation of the performance of the RePART model in classifier ensembles. Different sizes, learning strategies and structures will be used in this investigation. As a result of this investigation, it is aimed to define the main advantages and drawbacks of this model, when used as a component in classifier ensembles. This can provide a broader foundation for the use of RePART in other pattern recognition applications
Resumo:
Objective: To identify potential prognostic factors for pulmonary thromboembolism (PTE), establishing a mathematical model to predict the risk for fatal PTE and nonfatal PTE.Method: the reports on 4,813 consecutive autopsies performed from 1979 to 1998 in a Brazilian tertiary referral medical school were reviewed for a retrospective study. From the medical records and autopsy reports of the 512 patients found with macroscopically and/or microscopically,documented PTE, data on demographics, underlying diseases, and probable PTE site of origin were gathered and studied by multiple logistic regression. Thereafter, the jackknife method, a statistical cross-validation technique that uses the original study patients to validate a clinical prediction rule, was performed.Results: the autopsy rate was 50.2%, and PTE prevalence was 10.6%. In 212 cases, PTE was the main cause of death (fatal PTE). The independent variables selected by the regression significance criteria that were more likely to be associated with fatal PTE were age (odds ratio [OR], 1.02; 95% confidence interval [CI], 1.00 to 1.03), trauma (OR, 8.5; 95% CI, 2.20 to 32.81), right-sided cardiac thrombi (OR, 1.96; 95% CI, 1.02 to 3.77), pelvic vein thrombi (OR, 3.46; 95% CI, 1.19 to 10.05); those most likely to be associated with nonfatal PTE were systemic arterial hypertension (OR, 0.51; 95% CI, 0.33 to 0.80), pneumonia (OR, 0.46; 95% CI, 0.30 to 0.71), and sepsis (OR, 0.16; 95% CI, 0.06 to 0.40). The results obtained from the application of the equation in the 512 cases studied using logistic regression analysis suggest the range in which logit p > 0.336 favors the occurrence of fatal PTE, logit p < - 1.142 favors nonfatal PTE, and logit P with intermediate values is not conclusive. The cross-validation prediction misclassification rate was 25.6%, meaning that the prediction equation correctly classified the majority of the cases (74.4%).Conclusions: Although the usefulness of this method in everyday medical practice needs to be confirmed by a prospective study, for the time being our results suggest that concerning prevention, diagnosis, and treatment of PTE, strict attention should be given to those patients presenting the variables that are significant in the logistic regression model.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Pós-graduação em Zootecnia - FCAV
Resumo:
Deer-vehicle collisions (DVCs) impact the economic and social well being of humans. We examined large-scale patterns behind DVCs across 3 ecoregions: Southern Lower Peninsula (SLP), Northern Lower Peninsula (NLP), and Upper Peninsula (UP) in Michigan. A 3 component conceptual model of DVCs with drivers, deer, and a landscape was the framework of analysis. The conceptual model was parameterized into a parsimonious mathematical model. The dependent variable was DVCs by county by ecoregion and the independent variables were percent forest cover, percent crop cover, mean annual vehicle miles traveled (VMT), and mean deer density index (DDI) by county. A discriminant function analysis of the 4 independent variables by counties by ecoregion indicated low misclassification, and provided support to the groupings by ecoregions. The global model and all sub-models were run for the 3 ecoregions and evaluated using information-theoretic approaches. Adjusted R2 values for the global model increased substantially from the SLP (0.21) to the NLP (0.54) to the UP (0.72). VMT and DDI were important variables across all 3 ecoregions. Percent crop cover played an important role in DVCs in the SLP and UP. The scale at which causal factors of DVCs operate appear to be finer in southern Michigan than in northern Michigan. Reduction of DVCs will likely occur only through a reduction in deer density, a reduction in traffic volume, or in modification of sitespecific factors, such as driver behavior, sight distance, highway features, or speed limits.
Resumo:
The purpose of this study was to examine the reliability, validity and classification accuracy of the South Oaks Gambling Screen (SOGS) in a sample of the Brazilian population. Participants in this study were drawn from three sources: 71 men and women from the general population interviewed at a metropolitan train station; 116 men and women encountered at a bingo venue; and 54 men and women undergoing treatment for gambling. The SOGS and a DSM-IV-based instrument were applied by trained researchers. The internal consistency of the SOGS was 0.75 according to the Cronbach`s alpha model, and construct validity was good. A significant difference among groups was demonstrated by ANOVA (F ((2.238)) = 221.3, P < 0.001). The SOGS items and DSM-IV symptoms were highly correlated (r = 0.854, P < 0.01). The SOGS also presented satisfactory psychometric properties: sensitivity (100), specificity (74.7), positive predictive rate (60.7), negative predictive rate (100) and misclassification rate (0.18). However, a cut-off score of eight improved classification accuracy and reduced the rate of false positives: sensitivity (95.4), specificity (89.8), positive predictive rate (78.5), negative predictive rate (98) and misclassification rate (0.09). Thus, the SOGS was found to be reliable and valid in the Brazilian population.
Resumo:
Background Genotyping of hepatitis C virus (HCV) has become an essential tool for prognosis and prediction of treatment duration. The aim of this study was to compare two HCV genotyping methods: reverse hybridization line probe assay (LiPA v.1) and partial sequencing of the NS5B region. Methods Plasma of 171 patients with chronic hepatitis C were screened using both a commercial method (LiPA HCV Versant, Siemens, Tarrytown, NY, USA) and different primers targeting the NS5B region for PCR amplification and sequencing analysis. Results Comparison of the HCV genotyping methods showed no difference in the classification at the genotype level. However, a total of 82/171 samples (47.9%) including misclassification, non-subtypable, discrepant and inconclusive results were not classified by LiPA at the subtype level but could be discriminated by NS5B sequencing. Of these samples, 34 samples of genotype 1a and 6 samples of genotype 1b were classified at the subtype level using sequencing of NS5B. Conclusions Sequence analysis of NS5B for genotyping HCV provides precise genotype and subtype identification and an accurate epidemiological representation of circulating viral strains.
Resumo:
This work proposes a system for classification of industrial steel pieces by means of magnetic nondestructive device. The proposed classification system presents two main stages, online system stage and off-line system stage. In online stage, the system classifies inputs and saves misclassification information in order to perform posterior analyses. In the off-line optimization stage, the topology of a Probabilistic Neural Network is optimized by a Feature Selection algorithm combined with the Probabilistic Neural Network to increase the classification rate. The proposed Feature Selection algorithm searches for the signal spectrogram by combining three basic elements: a Sequential Forward Selection algorithm, a Feature Cluster Grow algorithm with classification rate gradient analysis and a Sequential Backward Selection. Also, a trash-data recycling algorithm is proposed to obtain the optimal feedback samples selected from the misclassified ones.
Resumo:
Misclassification of the electrocardiogram (ECG) contributes to treatment errors in patients with acute coronary syndrome. We hypothesized that cardiology ECG review could reduce these errors.
Resumo:
To enhance understanding of the metabolic indicators of type 2 diabetes mellitus (T2DM) disease pathogenesis and progression, the urinary metabolomes of well characterized rhesus macaques (normal or spontaneously and naturally diabetic) were examined. High-resolution ultra-performance liquid chromatography coupled with the accurate mass determination of time-of-flight mass spectrometry was used to analyze spot urine samples from normal (n = 10) and T2DM (n = 11) male monkeys. The machine-learning algorithm random forests classified urine samples as either from normal or T2DM monkeys. The metabolites important for developing the classifier were further examined for their biological significance. Random forests models had a misclassification error of less than 5%. Metabolites were identified based on accurate masses (<10 ppm) and confirmed by tandem mass spectrometry of authentic compounds. Urinary compounds significantly increased (p < 0.05) in the T2DM when compared with the normal group included glycine betaine (9-fold), citric acid (2.8-fold), kynurenic acid (1.8-fold), glucose (68-fold), and pipecolic acid (6.5-fold). When compared with the conventional definition of T2DM, the metabolites were also useful in defining the T2DM condition, and the urinary elevations in glycine betaine and pipecolic acid (as well as proline) indicated defective re-absorption in the kidney proximal tubules by SLC6A20, a Na(+)-dependent transporter. The mRNA levels of SLC6A20 were significantly reduced in the kidneys of monkeys with T2DM. These observations were validated in the db/db mouse model of T2DM. This study provides convincing evidence of the power of metabolomics for identifying functional changes at many levels in the omics pipeline.
Resumo:
Indoor radon is regularly measured in Switzerland. However, a nationwide model to predict residential radon levels has not been developed. The aim of this study was to develop a prediction model to assess indoor radon concentrations in Switzerland. The model was based on 44,631 measurements from the nationwide Swiss radon database collected between 1994 and 2004. Of these, 80% randomly selected measurements were used for model development and the remaining 20% for an independent model validation. A multivariable log-linear regression model was fitted and relevant predictors selected according to evidence from the literature, the adjusted R², the Akaike's information criterion (AIC), and the Bayesian information criterion (BIC). The prediction model was evaluated by calculating Spearman rank correlation between measured and predicted values. Additionally, the predicted values were categorised into three categories (50th, 50th-90th and 90th percentile) and compared with measured categories using a weighted Kappa statistic. The most relevant predictors for indoor radon levels were tectonic units and year of construction of the building, followed by soil texture, degree of urbanisation, floor of the building where the measurement was taken and housing type (P-values <0.001 for all). Mean predicted radon values (geometric mean) were 66 Bq/m³ (interquartile range 40-111 Bq/m³) in the lowest exposure category, 126 Bq/m³ (69-215 Bq/m³) in the medium category, and 219 Bq/m³ (108-427 Bq/m³) in the highest category. Spearman correlation between predictions and measurements was 0.45 (95%-CI: 0.44; 0.46) for the development dataset and 0.44 (95%-CI: 0.42; 0.46) for the validation dataset. Kappa coefficients were 0.31 for the development and 0.30 for the validation dataset, respectively. The model explained 20% overall variability (adjusted R²). In conclusion, this residential radon prediction model, based on a large number of measurements, was demonstrated to be robust through validation with an independent dataset. The model is appropriate for predicting radon level exposure of the Swiss population in epidemiological research. Nevertheless, some exposure misclassification and regression to the mean is unavoidable and should be taken into account in future applications of the model.