923 resultados para Classification errors
Resumo:
In The Conduct of Inquiry in International Relations, Patrick Jackson situates methodologies in International Relations in relation to their underlying philosophical assumptions. One of his aims is to map International Relations debates in a way that ‘capture[s] current controversies’ (p. 40). This ambition is overstated: whilst Jackson’s typology is useful as a clarificatory tool, (re)classifying existing scholarship in International Relations is more problematic. One problem with Jackson’s approach is that he tends to run together the philosophical assumptions which decisively differentiate his methodologies (by stipulating a distinctive warrant for knowledge claims) and the explanatory strategies that are employed to generate such knowledge claims, suggesting that the latter are entailed by the former. In fact, the explanatory strategies which Jackson associates with each methodology reflect conventional practice in International Relations just as much as they reflect philosophical assumptions. This makes it more difficult to identify each methodology at work than Jackson implies. I illustrate this point through a critical analysis of Jackson’s controversial reclassification of Waltz as an analyticist, showing that whilst Jackson’s typology helps to expose inconsistencies in Waltz’s approach, it does not fully support the proposed reclassification. The conventional aspect of methodologies in International Relations also raises questions about the limits of Jackson’s ‘engaged pluralism’.
Resumo:
A novel two-stage construction algorithm for linear-in-the-parameters classifier is proposed, aiming at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage to construct a sparse linear-in-the-parameters classifier. For the first stage learning of generating the prefiltered signal, a two-level algorithm is introduced to maximise the model's generalisation capability, in which an elastic net model identification algorithm using singular value decomposition is employed at the lower level while the two regularisation parameters are selected by maximising the Bayesian evidence using a particle swarm optimization algorithm. Analysis is provided to demonstrate how “Occam's razor” is embodied in this approach. The second stage of sparse classifier construction is based on an orthogonal forward regression with the D-optimality algorithm. Extensive experimental results demonstrate that the proposed approach is effective and yields competitive results for noisy data sets.
Resumo:
Introduction: Care home residents are at particular risk from medication errors, and our objective was to determine the prevalence and potential harm of prescribing, monitoring, dispensing and administration errors in UK care homes, and to identify their causes. Methods: A prospective study of a random sample of residents within a purposive sample of homes in three areas. Errors were identified by patient interview, note review, observation of practice and examination of dispensed items. Causes were understood by observation and from theoretically framed interviews with home staff, doctors and pharmacists. Potential harm from errors was assessed by expert judgement. Results: The 256 residents recruited in 55 homes were taking a mean of 8.0 medicines. One hundred and seventy-eight (69.5%) of residents had one or more errors. The mean number per resident was 1.9 errors. The mean potential harm from prescribing, monitoring, administration and dispensing errors was 2.6, 3.7, 2.1 and 2.0 (0 = no harm, 10 = death), respectively. Contributing factors from the 89 interviews included doctors who were not accessible, did not know the residents and lacked information in homes when prescribing; home staff’s high workload, lack of medicines training and drug round interruptions; lack of team work among home, practice and pharmacy; inefficient ordering systems; inaccurate medicine records and prevalence of verbal communication; and difficult to fill (and check) medication administration systems. Conclusions: That two thirds of residents were exposed to one or more medication errors is of concern. The will to improve exists, but there is a lack of overall responsibility. Action is required from all concerned.
Resumo:
Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.
Resumo:
Bayesian analysis is given of an instrumental variable model that allows for heteroscedasticity in both the structural equation and the instrument equation. Specifically, the approach for dealing with heteroscedastic errors in Geweke (1993) is extended to the Bayesian instrumental variable estimator outlined in Rossi et al. (2005). Heteroscedasticity is treated by modelling the variance for each error using a hierarchical prior that is Gamma distributed. The computation is carried out by using a Markov chain Monte Carlo sampling algorithm with an augmented draw for the heteroscedastic case. An example using real data illustrates the approach and shows that ignoring heteroscedasticity in the instrument equation when it exists may lead to biased estimates.
Resumo:
Using annual observations on industrial production over the last three centuries, and on GDP over a 100-year period, we seek an historical perspective on the forecastability of these UK output measures. The series are dominated by strong upward trends, so we consider various specifications of this, including the local linear trend structural time-series model, which allows the level and slope of the trend to vary. Our results are not unduly sensitive to how the trend in the series is modelled: the average sizes of the forecast errors of all models, and the wide span of prediction intervals, attests to a great deal of uncertainty in the economic environment. It appears that, from an historical perspective, the postwar period has been relatively more forecastable.
Resumo:
Question: What plant properties might define plant functional types (PFTs) for the analysis of global vegetation responses to climate change, and what aspects of the physical environment might be expected to predict the distributions of PFTs? Methods: We review principles to explain the distribution of key plant traits as a function of bioclimatic variables. We focus on those whole-plant and leaf traits that are commonly used to define biomes and PFTs in global maps and models. Results: Raunkiær's plant life forms (underlying most later classifications) describe different adaptive strategies for surviving low temperature or drought, while satisfying requirements for reproduction and growth. Simple conceptual models and published observations are used to quantify the adaptive significance of leaf size for temperature regulation, leaf consistency for maintaining transpiration under drought, and phenology for the optimization of annual carbon balance. A new compilation of experimental data supports the functional definition of tropical, warm-temperate, temperate and boreal phanerophytes based on mechanisms for withstanding low temperature extremes. Chilling requirements are less well quantified, but are a necessary adjunct to cold tolerance. Functional traits generally confer both advantages and restrictions; the existence of trade-offs contributes to the diversity of plants along bioclimatic gradients. Conclusions: Quantitative analysis of plant trait distributions against bioclimatic variables is becoming possible; this opens up new opportunities for PFT classification. A PFT classification based on bioclimatic responses will need to be enhanced by information on traits related to competition, successional dynamics and disturbance.
Resumo:
We present a Bayesian image classification scheme for discriminating cloud, clear and sea-ice observations at high latitudes to improve identification of areas of clear-sky over ice-free ocean for SST retrieval. We validate the image classification against a manually classified dataset using Advanced Along Track Scanning Radiometer (AATSR) data. A three way classification scheme using a near-infrared textural feature improves classifier accuracy by 9.9 % over the nadir only version of the cloud clearing used in the ATSR Reprocessing for Climate (ARC) project in high latitude regions. The three way classification gives similar numbers of cloud and ice scenes misclassified as clear but significantly more clear-sky cases are correctly identified (89.9 % compared with 65 % for ARC). We also demonstrate the poetential of a Bayesian image classifier including information from the 0.6 micron channel to be used in sea-ice extent and ice surface temperature retrieval with 77.7 % of ice scenes correctly identified and an overall classifier accuracy of 96 %.
Resumo:
Understanding the sources of systematic errors in climate models is challenging because of coupled feedbacks and errors compensation. The developing seamless approach proposes that the identification and the correction of short term climate model errors have the potential to improve the modeled climate on longer time scales. In previous studies, initialised atmospheric simulations of a few days have been used to compare fast physics processes (convection, cloud processes) among models. The present study explores how initialised seasonal to decadal hindcasts (re-forecasts) relate transient week-to-month errors of the ocean and atmospheric components to the coupled model long-term pervasive SST errors. A protocol is designed to attribute the SST biases to the source processes. It includes five steps: (1) identify and describe biases in a coupled stabilized simulation, (2) determine the time scale of the advent of the bias and its propagation, (3) find the geographical origin of the bias, (4) evaluate the degree of coupling in the development of the bias, (5) find the field responsible for the bias. This strategy has been implemented with a set of experiments based on the initial adjustment of initialised simulations and exploring various degrees of coupling. In particular, hindcasts give the time scale of biases advent, regionally restored experiments show the geographical origin and ocean-only simulations isolate the field responsible for the bias and evaluate the degree of coupling in the bias development. This strategy is applied to four prominent SST biases of the IPSLCM5A-LR coupled model in the tropical Pacific, that are largely shared by other coupled models, including the Southeast Pacific warm bias and the equatorial cold tongue bias. Using the proposed protocol, we demonstrate that the East Pacific warm bias appears in a few months and is caused by a lack of upwelling due to too weak meridional coastal winds off Peru. The cold equatorial bias, which surprisingly takes 30 years to develop, is the result of an equatorward advection of midlatitude cold SST errors. Despite large development efforts, the current generation of coupled models shows only little improvement. The strategy proposed in this study is a further step to move from the current random ad hoc approach, to a bias-targeted, priority setting, systematic model development approach.
Resumo:
Radar refractivity retrievals can capture near-surface humidity changes, but noisy phase changes of the ground clutter returns limit the accuracy for both klystron- and magnetron-based systems. Observations with a C-band (5.6 cm) magnetron weather radar indicate that the correction for phase changes introduced by local oscillator frequency changes leads to refractivity errors no larger than 0.25 N units: equivalent to a relative humidity change of only 0.25% at 20°C. Requested stable local oscillator (STALO) frequency changes were accurate to 0.002 ppm based on laboratory measurements. More serious are the random phase change errors introduced when targets are not at the range-gate center and there are changes in the transmitter frequency (ΔfTx) or the refractivity (ΔN). Observations at C band with a 2-μs pulse show an additional 66° of phase change noise for a ΔfTx of 190 kHz (34 ppm); this allows the effect due to ΔN to be predicted. Even at S band with klystron transmitters, significant phase change noise should occur when a large ΔN develops relative to the reference period [e.g., ~55° when ΔN = 60 for the Next Generation Weather Radar (NEXRAD) radars]. At shorter wavelengths (e.g., C and X band) and with magnetron transmitters in particular, refractivity retrievals relative to an earlier reference period are even more difficult, and operational retrievals may be restricted to changes over shorter (e.g., hourly) periods of time. Target location errors can be reduced by using a shorter pulse or identified by a new technique making alternate measurements at two closely spaced frequencies, which could even be achieved with a dual–pulse repetition frequency (PRF) operation of a magnetron transmitter.
Resumo:
We propose a new class of neurofuzzy construction algorithms with the aim of maximizing generalization capability specifically for imbalanced data classification problems based on leave-one-out (LOO) cross validation. The algorithms are in two stages, first an initial rule base is constructed based on estimating the Gaussian mixture model with analysis of variance decomposition from input data; the second stage carries out the joint weighted least squares parameter estimation and rule selection using orthogonal forward subspace selection (OFSS)procedure. We show how different LOO based rule selection criteria can be incorporated with OFSS, and advocate either maximizing the leave-one-out area under curve of the receiver operating characteristics, or maximizing the leave-one-out Fmeasure if the data sets exhibit imbalanced class distribution. Extensive comparative simulations illustrate the effectiveness of the proposed algorithms.
Resumo:
Scene classification based on latent Dirichlet allocation (LDA) is a more general modeling method known as a bag of visual words, in which the construction of a visual vocabulary is a crucial quantization process to ensure success of the classification. A framework is developed using the following new aspects: Gaussian mixture clustering for the quantization process, the use of an integrated visual vocabulary (IVV), which is built as the union of all centroids obtained from the separate quantization process of each class, and the usage of some features, including edge orientation histogram, CIELab color moments, and gray-level co-occurrence matrix (GLCM). The experiments are conducted on IKONOS images with six semantic classes (tree, grassland, residential, commercial/industrial, road, and water). The results show that the use of an IVV increases the overall accuracy (OA) by 11 to 12% and 6% when it is implemented on the selected and all features, respectively. The selected features of CIELab color moments and GLCM provide a better OA than the implementation over CIELab color moment or GLCM as individuals. The latter increases the OA by only ∼2 to 3%. Moreover, the results show that the OA of LDA outperforms the OA of C4.5 and naive Bayes tree by ∼20%. © 2014 Society of Photo-Optical Instrumentation Engineers (SPIE) [DOI: 10.1117/1.JRS.8.083690]
Resumo:
Automatic generation of classification rules has been an increasingly popular technique in commercial applications such as Big Data analytics, rule based expert systems and decision making systems. However, a principal problem that arises with most methods for generation of classification rules is the overfit-ting of training data. When Big Data is dealt with, this may result in the generation of a large number of complex rules. This may not only increase computational cost but also lower the accuracy in predicting further unseen instances. This has led to the necessity of developing pruning methods for the simplification of rules. In addition, classification rules are used further to make predictions after the completion of their generation. As efficiency is concerned, it is expected to find the first rule that fires as soon as possible by searching through a rule set. Thus a suit-able structure is required to represent the rule set effectively. In this chapter, the authors introduce a unified framework for construction of rule based classification systems consisting of three operations on Big Data: rule generation, rule simplification and rule representation. The authors also review some existing methods and techniques used for each of the three operations and highlight their limitations. They introduce some novel methods and techniques developed by them recently. These methods and techniques are also discussed in comparison to existing ones with respect to efficient processing of Big Data.
Resumo:
Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.
Resumo:
The use of virtualization in high-performance computing (HPC) has been suggested as a means to provide tailored services and added functionality that many users expect from full-featured Linux cluster environments. The use of virtual machines in HPC can offer several benefits, but maintaining performance is a crucial factor. In some instances the performance criteria are placed above the isolation properties. This selective relaxation of isolation for performance is an important characteristic when considering resilience for HPC environments that employ virtualization. In this paper we consider some of the factors associated with balancing performance and isolation in configurations that employ virtual machines. In this context, we propose a classification of errors based on the concept of “error zones”, as well as a detailed analysis of the trade-offs between resilience and performance based on the level of isolation provided by virtualization solutions. Finally, a set of experiments are performed using different virtualization solutions to elucidate the discussion.