923 resultados para Classification errors
Resumo:
In this work we investigate knowledge acquisition as performed by multiple agents interacting as they infer, under the presence of observation errors, respective models of a complex system. We focus the specific case in which, at each time step, each agent takes into account its current observation as well as the average of the models of its neighbors. The agents are connected by a network of interaction of Erdos-Renyi or Barabasi-Albert type. First, we investigate situations in which one of the agents has a different probability of observation error (higher or lower). It is shown that the influence of this special agent over the quality of the models inferred by the rest of the network can be substantial, varying linearly with the respective degree of the agent with different estimation error. In case the degree of this agent is taken as a respective fitness parameter, the effect of the different estimation error is even more pronounced, becoming superlinear. To complement our analysis, we provide the analytical solution of the overall performance of the system. We also investigate the knowledge acquisition dynamic when the agents are grouped into communities. We verify that the inclusion of edges between agents (within a community) having higher probability of observation error promotes the loss of quality in the estimation of the agents in the other communities.
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
Background: There are several studies in the literature depicting measurement error in gene expression data and also, several others about regulatory network models. However, only a little fraction describes a combination of measurement error in mathematical regulatory networks and shows how to identify these networks under different rates of noise. Results: This article investigates the effects of measurement error on the estimation of the parameters in regulatory networks. Simulation studies indicate that, in both time series (dependent) and non-time series (independent) data, the measurement error strongly affects the estimated parameters of the regulatory network models, biasing them as predicted by the theory. Moreover, when testing the parameters of the regulatory network models, p-values computed by ignoring the measurement error are not reliable, since the rate of false positives are not controlled under the null hypothesis. In order to overcome these problems, we present an improved version of the Ordinary Least Square estimator in independent (regression models) and dependent (autoregressive models) data when the variables are subject to noises. Moreover, measurement error estimation procedures for microarrays are also described. Simulation results also show that both corrected methods perform better than the standard ones (i.e., ignoring measurement error). The proposed methodologies are illustrated using microarray data from lung cancer patients and mouse liver time series data. Conclusions: Measurement error dangerously affects the identification of regulatory network models, thus, they must be reduced or taken into account in order to avoid erroneous conclusions. This could be one of the reasons for high biological false positive rates identified in actual regulatory network models.
Resumo:
The problem of semialgebraic Lipschitz classification of quasihomogeneous polynomials on a Holder triangle is studied. For this problem, the ""moduli"" are described completely in certain combinatorial terms.
Resumo:
Quality control of toys for avoiding children exposure to potentially toxic elements is of utmost relevance and it is a common requirement in national and/or international norms for health and safety reasons. Laser-induced breakdown spectroscopy (LIBS) was recently evaluated at authors` laboratory for direct analysis of plastic toys and one of the main difficulties for the determination of Cd. Cr and Pb was the variety of mixtures and types of polymers. As most norms rely on migration (lixiviation) protocols, chemometric classification models from LIBS spectra were tested for sampling toys that present potential risk of Cd, Cr and Pb contamination. The classification models were generated from the emission spectra of 51 polymeric toys and by using Partial Least Squares - Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA) and K-Nearest Neighbor (KNN). The classification models and validations were carried out with 40 and 11 test samples, respectively. Best results were obtained when KNN was used, with corrected predictions varying from 95% for Cd to 100% for Cr and Pb. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Traditionally, chronotype classification is based on the Morningness-Eveningness Questionnaire (MEQ). It is implicit in the classification that intermediate individuals get intermediate scores to most of the MEQ questions. However, a small group of individuals has a different pattern of answers. In some questions, they answer as ""morning-types"" and in some others they answer as ""evening-types,"" resulting in an intermediate total score. ""Evening-type"" and ""Morning-type"" answers were set as A(1) and A(4), respectively. Intermediate answers were set as A(2) and A(3). The following algorithm was applied: Bimodality Index = (Sigma A(1) x Sigma A(4))(2) - (Sigma A(2) x Sigma A(3))(2). Neither-types that had positive bimodality scores were classified as bimodal. If our hypothesis is validated by objective data, an update of chronotype classification will be required. (Author correspondence: brunojm@ymail.com)
Resumo:
This paper proposes a three-stage offline approach to detect, identify, and correct series and shunt branch parameter errors. In Stage 1 the branches suspected of having parameter errors are identified through an Identification Index (II). The II of a branch is the ratio between the number of measurements adjacent to that branch, whose normalized residuals are higher than a specified threshold value, and the total number of measurements adjacent to that branch. Using several measurement snapshots, in Stage 2 the suspicious parameters are estimated, in a simultaneous multiple-state-and-parameter estimation, via an augmented state and parameter estimator which increases the V - theta state vector for the inclusion of suspicious parameters. Stage 3 enables the validation of the estimation obtained in Stage 2, and is performed via a conventional weighted least squares estimator. Several simulation results (with IEEE bus systems) have demonstrated the reliability of the proposed approach to deal with single and multiple parameter errors in adjacent and non-adjacent branches, as well as in parallel transmission lines with series compensation. Finally the proposed approach is confirmed on tests performed on the Hydro-Quebec TransEnergie network.
Resumo:
Oropharyngeal dysphagia is characterized by any alteration in swallowing dynamics which may lead to malnutrition and aspiration pneumonia. Early diagnosis is crucial for the prognosis of patients with dysphagia, and the best method for swallowing dynamics assessment is swallowing videofluoroscopy, an exam performed with X-rays. Because it exposes patients to radiation, videofluoroscopy should not be performed frequently nor should it be prolonged. This study presents a non-invasive method for the pre-diagnosis of dysphagia based on the analysis of the swallowing acoustics, where the discrete wavelet transform plays an important role to increase sensitivity and specificity in the identification of dysphagic patients. (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
Despite modern weed control practices, weeds continue to be a threat to agricultural production. Considering the variability of weeds, a classification methodology for the risk of infestation in agricultural zones using fuzzy logic is proposed. The inputs for the classification are attributes extracted from estimated maps for weed seed production and weed coverage using kriging and map analysis and from the percentage of surface infested by grass weeds, in order to account for the presence of weed species with a high rate of development and proliferation. The output for the classification predicts the risk of infestation of regions of the field for the next crop. The risk classification methodology described in this paper integrates analysis techniques which may help to reduce costs and improve weed control practices. Results for the risk classification of the infestation in a maize crop field are presented. To illustrate the effectiveness of the proposed system, the risk of infestation over the entire field is checked against the yield loss map estimated by kriging and also with the average yield loss estimated from a hyperbolic model.
Resumo:
This work presents an automated system for the measurement of form errors of mechanical components using an industrial robot. A three-probe error separation technique was employed to allow decoupling between the measured form error and errors introduced by the robotic system. A mathematical model of the measuring system was developed to provide inspection results by means of the solution of a system of linear equations. A new self-calibration procedure, which employs redundant data from several runs, minimizes the influence of probes zero-adjustment on the final result. Experimental tests applied to the measurement of straightness errors of mechanical components were accomplished and demonstrated the effectiveness of the employed methodology. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
The properties of recycled aggregate produced from mixed (masonry and concrete) construction and demolition (C&D) waste are highly variable, and this restricts the use of such aggregate in structural concrete production. The development of classification techniques capable of reducing this variability is instrumental for quality control purposes and the production of high quality C&D aggregate. This paper investigates how the classification of C&D mixed coarse aggregate according to porosity influences the mechanical performance of concrete. Concretes using a variety of C&D aggregate porosity classes and different water/cement ratios were produced and the mechanical properties measured. For concretes produced with constant volume fractions of water, cement, natural sand and coarse aggregate from recycled mixed C&D waste, the compressive strength and Young modulus are direct exponential functions of the aggregate porosity. Sink and float technique is a simple laboratory density separation tool that facilitates the separation of cement particles with lower porosity, a difficult task when done only by visual sorting. For this experiment, separation using a 2.2 kg/dmA(3) suspension produced recycled aggregate (porosity less than 17%) which yielded good performance in concrete production. Industrial gravity separators may lead to the production of high quality recycled aggregate from mixed C&D waste for structural concrete applications.
Resumo:
The procedure for online process control by attributes consists of inspecting a single item at every m produced items. It is decided on the basis of the inspection result whether the process is in-control (the conforming fraction is stable) or out-of-control (the conforming fraction is decreased, for example). Most articles about online process control have cited the stoppage of the production process for an adjustment when the inspected item is non-conforming (then the production is restarted in-control, here denominated as corrective adjustment). Moreover, the articles related to this subject do not present semi-economical designs (which may yield high quantities of non-conforming items), as they do not include a policy of preventive adjustments (in such case no item is inspected), which can be more economical, mainly if the inspected item can be misclassified. In this article, the possibility of preventive or corrective adjustments in the process is decided at every m produced item. If a preventive adjustment is decided upon, then no item is inspected. On the contrary, the m-th item is inspected; if it conforms, the production goes on, otherwise, an adjustment takes place and the process restarts in-control. This approach is economically feasible for some practical situations and the parameters of the proposed procedure are determined minimizing an average cost function subject to some statistical restrictions (for example, to assure a minimal levelfixed in advanceof conforming items in the production process). Numerical examples illustrate the proposal.
Resumo:
Higher order (2,4) FDTD schemes used for numerical solutions of Maxwell`s equations are focused on diminishing the truncation errors caused by the Taylor series expansion of the spatial derivatives. These schemes use a larger computational stencil, which generally makes use of the two constant coefficients, C-1 and C-2, for the four-point central-difference operators. In this paper we propose a novel way to diminish these truncation errors, in order to obtain more accurate numerical solutions of Maxwell`s equations. For such purpose, we present a method to individually optimize the pair of coefficients, C-1 and C-2, based on any desired grid size resolution and size of time step. Particularly, we are interested in using coarser grid discretizations to be able to simulate electrically large domains. The results of our optimization algorithm show a significant reduction in dispersion error and numerical anisotropy for all modeled grid size resolutions. Numerical simulations of free-space propagation verifies the very promising theoretical results. The model is also shown to perform well in more complex, realistic scenarios.
Resumo:
Objective To describe onset features, classification and treatment of juvenile dermatomyositis (JDM) and juvenile polymyositis (JPM) from a multicentre registry. Methods Inclusion criteria were onset age lower than 18 years and a diagnosis of any idiopathic inflammatory myopathy (IIM) by attending physician. Bohan & Peter (1975) criteria categorisation was established by a scoring algorithm to define JDM and JPM based oil clinical protocol data. Results Of the 189 cases included, 178 were classified as JDM, 9 as JPM (19.8: 1) and 2 did not fit the criteria; 6.9% had features of chronic arthritis and connective tissue disease overlap. Diagnosis classification agreement occurred in 66.1%. Medial? onset age was 7 years, median follow-up duration was 3.6 years. Malignancy was described in 2 (1.1%) cases. Muscle weakness occurred in 95.8%; heliotrope rash 83.5%; Gottron plaques 83.1%; 92% had at least one abnormal muscle enzyme result. Muscle biopsy performed in 74.6% was abnormal in 91.5% and electromyogram performed in 39.2% resulted abnormal in 93.2%. Logistic regression analysis was done in 66 cases with all parameters assessed and only aldolase resulted significant, as independent variable for definite JDM (OR=5.4, 95%CI 1.2-24.4, p=0.03). Regarding treatment, 97.9% received steroids; 72% had in addition at least one: methotrexate (75.7%), hydroxychloroquine (64.7%), cyclosporine A (20.6%), IV immunoglobulin (20.6%), azathioprine (10.3%) or cyclophosphamide (9.6%). In this series 24.3% developed calcinosis and mortality rate was 4.2%. Conclusion Evaluation of predefined criteria set for a valid diagnosis indicated aldolase as the most important parameter associated with de, methotrexate combination, was the most indicated treatment.