20 resultados para Truncated robust multivariate outlier detection
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
This work covers two aspects. First, it generally compares and summarizes the similarities and differences of state of the art feature detector and descriptor and second it presents a novel approach of detecting intestinal content (in particular bubbles) in capsule endoscopy images. Feature detectors and descriptors providing invariance to change of perspective, scale, signal-noise-ratio and lighting conditions are important and interesting topics in current research and the number of possible applications seems to be numberless. After analysing a selection of in the literature presented approaches, this work investigates in their suitability for applications information extraction in capsule endoscopy images. Eventually, a very good performing detector of intestinal content in capsule endoscopy images is presented. A accurate detection of intestinal content is crucial for all kinds of machine learning approaches and other analysis on capsule endoscopy studies because they occlude the field of view of the capsule camera and therefore those frames need to be excluded from analysis. As a so called “byproduct” of this investigation a graphical user interface supported Feature Analysis Tool is presented to execute and compare the discussed feature detectors and descriptor on arbitrary images, with configurable parameters and visualized their output. As well the presented bubble classifier is part of this tool and if a ground truth is available (or can also be generated using this tool) a detailed visualization of the validation result will be performed.
Resumo:
Often practical performance of analytical redundancy for fault detection and diagnosis is decreased by uncertainties prevailing not only in the system model, but also in the measurements. In this paper, the problem of fault detection is stated as a constraint satisfaction problem over continuous domains with a big number of variables and constraints. This problem can be solved using modal interval analysis and consistency techniques. Consistency techniques are then shown to be particularly efficient to check the consistency of the analytical redundancy relations (ARRs), dealing with uncertain measurements and parameters. Through the work presented in this paper, it can be observed that consistency techniques can be used to increase the performance of a robust fault detection tool, which is based on interval arithmetic. The proposed method is illustrated using a nonlinear dynamic model of a hydraulic system
Resumo:
We introduce a simple new hypothesis testing procedure, which,based on an independent sample drawn from a certain density, detects which of $k$ nominal densities is the true density is closest to, under the total variation (L_{1}) distance. Weobtain a density-free uniform exponential bound for the probability of false detection.
Resumo:
In this paper we propose an endpoint detection system based on the use of several features extracted from each speech frame, followed by a robust classifier (i.e Adaboost and Bagging of decision trees, and a multilayer perceptron) and a finite state automata (FSA). We present results for four different classifiers. The FSA module consisted of a 4-state decision logic that filtered false alarms and false positives. We compare the use of four different classifiers in this task. The look ahead of the method that we propose was of 7 frames, which are the number of frames that maximized the accuracy of the system. The system was tested with real signals recorded inside a car, with signal to noise ratio that ranged from 6 dB to 30dB. Finally we present experimental results demonstrating that the system yields robust endpoint detection.
Resumo:
Cognitive radio is a wireless technology aimed at improvingthe efficiency use of the radio-electric spectrum, thus facilitating a reductionin the load on the free frequency bands. Cognitive radio networkscan scan the spectrum and adapt their parameters to operate in the unoccupiedbands. To avoid interfering with licensed users operating on a givenchannel, the networks need to be highly sensitive, which is achieved byusing cooperative sensing methods. Current cooperative sensing methodsare not robust enough against occasional or continuous attacks. This articleoutlines a Group Fusion method that takes into account the behavior ofusers over the short and long term. On fusing the data, the method is basedon giving more weight to user groups that are more unanimous in their decisions.Simulations have been performed in a dynamic environment withinterferences. Results prove that when attackers are present (both reiterativeor sporadic), the proposed Group Fusion method has superior sensingcapability than other methods.
Resumo:
One of the techniques used to detect faults in dynamic systems is analytical redundancy. An important difficulty in applying this technique to real systems is dealing with the uncertainties associated with the system itself and with the measurements. In this paper, this uncertainty is taken into account by the use of intervals for the parameters of the model and for the measurements. The method that is proposed in this paper checks the consistency between the system's behavior, obtained from the measurements, and the model's behavior; if they are inconsistent, then there is a fault. The problem of detecting faults is stated as a quantified real constraint satisfaction problem, which can be solved using the modal interval analysis (MIA). MIA is used because it provides powerful tools to extend the calculations over real functions to intervals. To improve the results of the detection of the faults, the simultaneous use of several sliding time windows is proposed. The result of implementing this method is semiqualitative tracking (SQualTrack), a fault-detection tool that is robust in the sense that it does not generate false alarms, i.e., if there are false alarms, they indicate either that the interval model does not represent the system adequately or that the interval measurements do not represent the true values of the variables adequately. SQualTrack is currently being used to detect faults in real processes. Some of these applications using real data have been developed within the European project advanced decision support system for chemical/petrochemical manufacturing processes and are also described in this paper
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
The R-package “compositions”is a tool for advanced compositional analysis. Its basicfunctionality has seen some conceptual improvement, containing now some facilitiesto work with and represent ilr bases built from balances, and an elaborated subsys-tem for dealing with several kinds of irregular data: (rounded or structural) zeroes,incomplete observations and outliers. The general approach to these irregularities isbased on subcompositions: for an irregular datum, one can distinguish a “regular” sub-composition (where all parts are actually observed and the datum behaves typically)and a “problematic” subcomposition (with those unobserved, zero or rounded parts, orelse where the datum shows an erratic or atypical behaviour). Systematic classificationschemes are proposed for both outliers and missing values (including zeros) focusing onthe nature of irregularities in the datum subcomposition(s).To compute statistics with values missing at random and structural zeros, a projectionapproach is implemented: a given datum contributes to the estimation of the desiredparameters only on the subcompositon where it was observed. For data sets withvalues below the detection limit, two different approaches are provided: the well-knownimputation technique, and also the projection approach.To compute statistics in the presence of outliers, robust statistics are adapted to thecharacteristics of compositional data, based on the minimum covariance determinantapproach. The outlier classification is based on four different models of outlier occur-rence and Monte-Carlo-based tests for their characterization. Furthermore the packageprovides special plots helping to understand the nature of outliers in the dataset.Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator,robustness, rounded zeros
Resumo:
Three multivariate statistical tools (principal component analysis, factor analysis, analysis discriminant) have been tested to characterize and model the sags registered in distribution substations. Those models use several features to represent the magnitude, duration and unbalanced grade of sags. They have been obtained from voltage and current waveforms. The techniques are tested and compared using 69 registers of sags. The advantages and drawbacks of each technique are listed
Resumo:
Not considered in the analytical model of the plant, uncertainties always dramatically decrease the performance of the fault detection task in the practice. To cope better with this prevalent problem, in this paper we develop a methodology using Modal Interval Analysis which takes into account those uncertainties in the plant model. A fault detection method is developed based on this model which is quite robust to uncertainty and results in no false alarm. As soon as a fault is detected, an ANFIS model is trained in online to capture the major behavior of the occurred fault which can be used for fault accommodation. The simulation results understandably demonstrate the capability of the proposed method for accomplishing both tasks appropriately
Resumo:
Structural equation models are widely used in economic, socialand behavioral studies to analyze linear interrelationships amongvariables, some of which may be unobservable or subject to measurementerror. Alternative estimation methods that exploit different distributionalassumptions are now available. The present paper deals with issues ofasymptotic statistical inferences, such as the evaluation of standarderrors of estimates and chi--square goodness--of--fit statistics,in the general context of mean and covariance structures. The emphasisis on drawing correct statistical inferences regardless of thedistribution of the data and the method of estimation employed. A(distribution--free) consistent estimate of $\Gamma$, the matrix ofasymptotic variances of the vector of sample second--order moments,will be used to compute robust standard errors and a robust chi--squaregoodness--of--fit squares. Simple modifications of the usual estimateof $\Gamma$ will also permit correct inferences in the case of multi--stage complex samples. We will also discuss the conditions under which,regardless of the distribution of the data, one can rely on the usual(non--robust) inferential statistics. Finally, a multivariate regressionmodel with errors--in--variables will be used to illustrate, by meansof simulated data, various theoretical aspects of the paper.
Resumo:
Cognitive radio networks sense spectrum occupancy and manage themselvesto operate in unused bands without disturbing licensed users. The detection capability of aradio system can be enhanced if the sensing process is performed jointly by a group of nodesso that the effects of wireless fading and shadowing can be minimized. However, taking acollaborative approach poses new security threats to the system as nodes can report falsesensing data to reach a wrong decision. This paper makes a review of secure cooperativespectrum sensing in cognitive radio networks. The main objective of these protocols is toprovide an accurate resolution about the availability of some spectrum channels, ensuring thecontribution from incapable users as well as malicious ones is discarded. Issues, advantagesand disadvantages of such protocols are investigated and summarized.
Resumo:
Income distribution in Spain has experienced a substantial improvement towards equalisation during the second half of the seventies and the eighties; a period during which most OECD countries experienced the opposite trend. In spite of the many recent papers on the Spanish income distribution, the period covered by those stops in 1990. The aim of this paper is to extent the analysis to 1996 employing the same methodology and the same data set (ECPF). Our results not only corroborate the (decreasing inequality) trend found by others during the second half of the eighties, but also suggest that this trend extends over the first half of the nineties. We also show that our main conclusions are robust to changes in the equivalence scale, to changes in the definition of income and to potential data contamination. Finally, we analyse some of the causes which may be driving the overall picture of income inequality using two decomposition techniques. From this analyses three variables emerge as the major responsible factors for the observed improvement in the income distribution: education, household composition and socioeconomic situation of the household head.
Resumo:
This paper uses sequential stochastic dominance procedures to compare the joint distribution of health and income across space and time. It is the First application of which we are aware of methods to compare multidimensional distributions of income and health using procedures that are robust to aggregation techniques. The paper's approach is more general than comparisons of health gradients and does not require the estimation of health equivalent incomes. We illustrate the approach by contrasting Canada and the US using comparable data. Canada dominates the US over the lower bidimensional welfare distribution of health and income, though not generally in terms of the uni-dimensional distribution of health or income. The paper also finds that welfare for both Canadians and Americans has not unambiguously improved during the last decade over the joint distribution of income and health, in spite of the fact that the uni-dimensional distributions of income have clearly improved during that period.
Resumo:
As computer chips implementation technologies evolve to obtain more performance, those computer chips are using smaller components, with bigger density of transistors and working with lower power voltages. All these factors turn the computer chips less robust and increase the probability of a transient fault. Transient faults may occur once and never more happen the same way in a computer system lifetime. There are distinct consequences when a transient fault occurs: the operating system might abort the execution if the change produced by the fault is detected by bad behavior of the application, but the biggest risk is that the fault produces an undetected data corruption that modifies the application final result without warnings (for example a bit flip in some crucial data). With the objective of researching transient faults in computer system’s processor registers and memory we have developed an extension of HP’s and AMD joint full system simulation environment, named COTSon. This extension allows the injection of faults that change a single bit in processor registers and memory of the simulated computer. The developed fault injection system makes it possible to: evaluate the effects of single bit flip transient faults in an application, analyze an application robustness against single bit flip transient faults and validate fault detection mechanism and strategies.