165 resultados para Classification error rate
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.
Resumo:
Background: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. Results: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10(-5) for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. Conclusions: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.
Resumo:
The purpose of this article is to present a quantitative analysis of the human failure contribution in the collision and/or grounding of oil tankers, considering the recommendation of the ""Guidelines for Formal Safety Assessment"" of the International Maritime Organization. Initially, the employed methodology is presented, emphasizing the use of the technique for human error prediction to reach the desired objective. Later, this methodology is applied to a ship operating on the Brazilian coast and, thereafter, the procedure to isolate the human actions with the greatest potential to reduce the risk of an accident is described. Finally, the management and organizational factors presented in the ""International Safety Management Code"" are associated with these selected actions. Therefore, an operator will be able to decide where to work in order to obtain an effective reduction in the probability of accidents. Even though this study does not present a new methodology, it can be considered as a reference in the human reliability analysis for the maritime industry, which, in spite of having some guides for risk analysis, has few studies related to human reliability effectively applied to the sector.
Resumo:
Age-related changes in running kinematics have been reported in the literature using classical inferential statistics. However, this approach has been hampered by the increased number of biomechanical gait variables reported and subsequently the lack of differences presented in these studies. Data mining techniques have been applied in recent biomedical studies to solve this problem using a more general approach. In the present work, we re-analyzed lower extremity running kinematic data of 17 young and 17 elderly male runners using the Support Vector Machine (SVM) classification approach. In total, 31 kinematic variables were extracted to train the classification algorithm and test the generalized performance. The results revealed different accuracy rates across three different kernel methods adopted in the classifier, with the linear kernel performing the best. A subsequent forward feature selection algorithm demonstrated that with only six features, the linear kernel SVM achieved 100% classification performance rate, showing that these features provided powerful combined information to distinguish age groups. The results of the present work demonstrate potential in applying this approach to improve knowledge about the age-related differences in running gait biomechanics and encourages the use of the SVM in other clinical contexts. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The authors present a comparative analysis between a triple-band S-C-L erbium-doped fibre amplifier and a commercial semiconductor optical amplifier in a CWDM application scenario. Both technologies were characterised for gain and noise figures from 1480 to 1610 nm (S, C and L bands) and their systemic performances were evaluated in terms of bit error rate measurements for a wide range of optical power levels.
Resumo:
In this work, a wide analysis of local search multiuser detection (LS-MUD) for direct sequence/code division multiple access (DS/CDMA) systems under multipath channels is carried out considering the performance-complexity trade-off. It is verified the robustness of the LS-MUD to variations in loading, E(b)/N(0), near-far effect, number of fingers of the Rake receiver and errors in the channel coefficients estimates. A compared analysis of the bit error rate (BER) and complexity trade-off is accomplished among LS, genetic algorithm (GA) and particle swarm optimization (PSO). Based on the deterministic behavior of the LS algorithm, it is also proposed simplifications over the cost function calculation, obtaining more efficient algorithms (simplified and combined LS-MUD versions) and creating new perspectives for the MUD implementation. The computational complexity is expressed in terms of the number of operations in order to converge. Our conclusion pointed out that the simplified LS (s-LS) method is always more efficient, independent of the system conditions, achieving a better performance with a lower complexity than the others heuristics detectors. Associated to this, the deterministic strategy and absence of input parameters made the s-LS algorithm the most appropriate for the MUD problem. (C) 2008 Elsevier GmbH. All rights reserved.
Resumo:
This paper analyzes the complexity-performance trade-off of several heuristic near-optimum multiuser detection (MuD) approaches applied to the uplink of synchronous single/multiple-input multiple-output multicarrier code division multiple access (S/MIMO MC-CDMA) systems. Genetic algorithm (GA), short term tabu search (STTS) and reactive tabu search (RTS), simulated annealing (SA), particle swarm optimization (PSO), and 1-opt local search (1-LS) heuristic multiuser detection algorithms (Heur-MuDs) are analyzed in details, using a single-objective antenna-diversity-aided optimization approach. Monte- Carlo simulations show that, after convergence, the performances reached by all near-optimum Heur-MuDs are similar. However, the computational complexities may differ substantially, depending on the system operation conditions. Their complexities are carefully analyzed in order to obtain a general complexity-performance framework comparison and to show that unitary Hamming distance search MuD (uH-ds) approaches (1-LS, SA, RTS and STTS) reach the best convergence rates, and among them, the 1-LS-MuD provides the best trade-off between implementation complexity and bit error rate (BER) performance.
Resumo:
The coexistence between different types of templates has been the choice solution to the information crisis of prebiotic evolution, triggered by the finding that a single RNA-like template cannot carry enough information to code for any useful replicase. In principle, confining d distinct templates of length L in a package or protocell, whose Survival depends on the coexistence of the templates it holds in, could resolve this crisis provided that d is made sufficiently large. Here we review the prototypical package model of Niesert et al. [1981. Origin of life between Scylla and Charybdis. J. Mol. Evol. 17, 348-353] which guarantees the greatest possible region of viability of the protocell population, and show that this model, and hence the entire package approach, does not resolve the information crisis. In particular, we show that the total information stored in a viable protocell (Ld) tends to a constant value that depends only on the spontaneous error rate per nucleotide of the template replication mechanism. As a result, an increase of d must be followed by a decrease of L, so that the net information gain is null. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
In this paper we show the results of a comparison simulation study for three classification techniques: Multinomial Logistic Regression (MLR), No Metric Discriminant Analysis (NDA) and Linear Discriminant Analysis (LDA). The measure used to compare the performance of the three techniques was the Error Classification Rate (ECR). We found that MLR and LDA techniques have similar performance and that they are better than DNA when the population multivariate distribution is Normal or Logit-Normal. For the case of log-normal and Sinh(-1)-normal multivariate distributions we found that MLR had the better performance.
Resumo:
This paper presents a new statistical algorithm to estimate rainfall over the Amazon Basin region using the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI). The algorithm relies on empirical relationships derived for different raining-type systems between coincident measurements of surface rainfall rate and 85-GHz polarization-corrected brightness temperature as observed by the precipitation radar (PR) and TMI on board the TRMM satellite. The scheme includes rain/no-rain area delineation (screening) and system-type classification routines for rain retrieval. The algorithm is validated against independent measurements of the TRMM-PR and S-band dual-polarization Doppler radar (S-Pol) surface rainfall data for two different periods. Moreover, the performance of this rainfall estimation technique is evaluated against well-known methods, namely, the TRMM-2A12 [ the Goddard profiling algorithm (GPROF)], the Goddard scattering algorithm (GSCAT), and the National Environmental Satellite, Data, and Information Service (NESDIS) algorithms. The proposed algorithm shows a normalized bias of approximately 23% for both PR and S-Pol ground truth datasets and a mean error of 0.244 mm h(-1) ( PR) and -0.157 mm h(-1)(S-Pol). For rain volume estimates using PR as reference, a correlation coefficient of 0.939 and a normalized bias of 0.039 were found. With respect to rainfall distributions and rain area comparisons, the results showed that the formulation proposed is efficient and compatible with the physics and dynamics of the observed systems over the area of interest. The performance of the other algorithms showed that GSCAT presented low normalized bias for rain areas and rain volume [0.346 ( PR) and 0.361 (S-Pol)], and GPROF showed rainfall distribution similar to that of the PR and S-Pol but with a bimodal distribution. Last, the five algorithms were evaluated during the TRMM-Large-Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) 1999 field campaign to verify the precipitation characteristics observed during the easterly and westerly Amazon wind flow regimes. The proposed algorithm presented a cumulative rainfall distribution similar to the observations during the easterly regime, but it underestimated for the westerly period for rainfall rates above 5 mm h(-1). NESDIS(1) overestimated for both wind regimes but presented the best westerly representation. NESDIS(2), GSCAT, and GPROF underestimated in both regimes, but GPROF was closer to the observations during the easterly flow.
Resumo:
Despite modern weed control practices, weeds continue to be a threat to agricultural production. Considering the variability of weeds, a classification methodology for the risk of infestation in agricultural zones using fuzzy logic is proposed. The inputs for the classification are attributes extracted from estimated maps for weed seed production and weed coverage using kriging and map analysis and from the percentage of surface infested by grass weeds, in order to account for the presence of weed species with a high rate of development and proliferation. The output for the classification predicts the risk of infestation of regions of the field for the next crop. The risk classification methodology described in this paper integrates analysis techniques which may help to reduce costs and improve weed control practices. Results for the risk classification of the infestation in a maize crop field are presented. To illustrate the effectiveness of the proposed system, the risk of infestation over the entire field is checked against the yield loss map estimated by kriging and also with the average yield loss estimated from a hyperbolic model.
Resumo:
An accurate estimate of machining time is very important for predicting delivery time, manufacturing costs, and also to help production process planning. Most commercial CAM software systems estimate the machining time in milling operations simply by dividing the entire tool path length by the programmed feed rate. This time estimate differs drastically from the real process time because the feed rate is not always constant due to machine and computer numerical controlled (CNC) limitations. This study presents a practical mechanistic method for milling time estimation when machining free-form geometries. The method considers a variable called machine response time (MRT) which characterizes the real CNC machine`s capacity to move in high feed rates in free-form geometries. MRT is a global performance feature which can be obtained for any type of CNC machine configuration by carrying out a simple test. For validating the methodology, a workpiece was used to generate NC programs for five different types of CNC machines. A practical industrial case study was also carried out to validate the method. The results indicated that MRT, and consequently, the real machining time, depends on the CNC machine`s potential: furthermore, the greater MRT, the larger the difference between predicted milling time and real milling time. The proposed method achieved an error range from 0.3% to 12% of the real machining time, whereas the CAM estimation achieved from 211% to 1244% error. The MRT-based process is also suggested as an instrument for helping in machine tool benchmarking.
Resumo:
Objective To describe onset features, classification and treatment of juvenile dermatomyositis (JDM) and juvenile polymyositis (JPM) from a multicentre registry. Methods Inclusion criteria were onset age lower than 18 years and a diagnosis of any idiopathic inflammatory myopathy (IIM) by attending physician. Bohan & Peter (1975) criteria categorisation was established by a scoring algorithm to define JDM and JPM based oil clinical protocol data. Results Of the 189 cases included, 178 were classified as JDM, 9 as JPM (19.8: 1) and 2 did not fit the criteria; 6.9% had features of chronic arthritis and connective tissue disease overlap. Diagnosis classification agreement occurred in 66.1%. Medial? onset age was 7 years, median follow-up duration was 3.6 years. Malignancy was described in 2 (1.1%) cases. Muscle weakness occurred in 95.8%; heliotrope rash 83.5%; Gottron plaques 83.1%; 92% had at least one abnormal muscle enzyme result. Muscle biopsy performed in 74.6% was abnormal in 91.5% and electromyogram performed in 39.2% resulted abnormal in 93.2%. Logistic regression analysis was done in 66 cases with all parameters assessed and only aldolase resulted significant, as independent variable for definite JDM (OR=5.4, 95%CI 1.2-24.4, p=0.03). Regarding treatment, 97.9% received steroids; 72% had in addition at least one: methotrexate (75.7%), hydroxychloroquine (64.7%), cyclosporine A (20.6%), IV immunoglobulin (20.6%), azathioprine (10.3%) or cyclophosphamide (9.6%). In this series 24.3% developed calcinosis and mortality rate was 4.2%. Conclusion Evaluation of predefined criteria set for a valid diagnosis indicated aldolase as the most important parameter associated with de, methotrexate combination, was the most indicated treatment.
Resumo:
The Biopharmaceutics Classification System (BCS) is a tool that was created to categorize drugs into different groups according to their solubility and permeability characteristics. Through a combination of these factors and physiological parameters, it is possible to understand the absorption behavior of a drug in the gastrointestinal tract, thus contributing to cost and time reductions in drug development, as well as reducing exposure of human subjects during in vivo trials. Solubility is attained by determining the equilibrium under conditions of physiological pH, while different methods may be employed for evaluating permeability. On the other hand, the intrinsic dissolution rate (IDR), which is defined as the rate of dissolution of a pure substance under constant temperature, pH, and surface area conditions, among others, may present greater correlation to the in vivo dissolution dynamic than the solubility test. The purpose of this work is to discuss the intrinsic dissolution test as a tool for determining the solubility of drugs within the scope of the Biopharmaceutics Classification System (BCS).