11 resultados para OUTLIERS
em Indian Institute of Science - Bangalore - Índia
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
Resumo:
The removal of noise and outliers from health signals is an important problem in jet engine health monitoring. Typically, health signals are time series of damage indicators, which can be sensor measurements or features derived from such measurements. Sharp or sudden changes in health signals can represent abrupt faults and long term deterioration in the system is typical of gradual faults. Simple linear filters tend to smooth out the sharp trend shifts in jet engine signals and are also not good for outlier removal. We propose new optimally designed nonlinear weighted recursive median filters for noise removal from typical health signals of jet engines. Signals for abrupt and gradual faults and with transient data are considered. Numerical results are obtained for a jet engine and show that preprocessing of health signals using the proposed filter significantly removes Gaussian noise and outliers and could therefore greatly improve the accuracy of diagnostic systems. [DOI: 10.1115/1.3200907].
Resumo:
Increased emphasis on rotorcraft performance and perational capabilities has resulted in accurate computation of aerodynamic stability and control parameters. System identification is one such tool in which the model structure and parameters such as aerodynamic stability and control derivatives are derived. In the present work, the rotorcraft aerodynamic parameters are computed using radial basis function neural networks (RBFN) in the presence of both state and measurement noise. The effect of presence of outliers in the data is also considered. RBFN is found to give superior results compared to finite difference derivatives for noisy data. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
The problem of denoising damage indicator signals for improved operational health monitoring of systems is addressed by applying soft computing methods to design filters. Since measured data in operational settings is contaminated with noise and outliers, pattern recognition algorithms for fault detection and isolation can give false alarms. A direct approach to improving the fault detection and isolation is to remove noise and outliers from time series of measured data or damage indicators before performing fault detection and isolation. Many popular signal-processing approaches do not work well with damage indicator signals, which can contain sudden changes due to abrupt faults and non-Gaussian outliers. Signal-processing algorithms based on radial basis function (RBF) neural network and weighted recursive median (WRM) filters are explored for denoising simulated time series. The RBF neural network filter is developed using a K-means clustering algorithm and is much less computationally expensive to develop than feedforward neural networks trained using backpropagation. The nonlinear multimodal integer-programming problem of selecting optimal integer weights of the WRM filter is solved using genetic algorithm. Numerical results are obtained for helicopter rotor structural damage indicators based on simulated frequencies. Test signals consider low order polynomial growth of damage indicators with time to simulate gradual or incipient faults and step changes in the signal to simulate abrupt faults. Noise and outliers are added to the test signals. The WRM and RBF filters result in a noise reduction of 54 - 71 and 59 - 73% for the test signals considered in this study, respectively. Their performance is much better than the moving average FIR filter, which causes significant feature distortion and has poor outlier removal capabilities and shows the potential of soft computing methods for specific signal-processing applications.
Resumo:
The problem of denoising damage indicator signals for improved operational health monitoring of systems is addressed by applying soft computing methods to design filters. Since measured data in operational settings is contaminated with noise and outliers, pattern recognition algorithms for fault detection and isolation can give false alarms. A direct approach to improving the fault detection and isolation is to remove noise and outliers from time series of measured data or damage indicators before performing fault detection and isolation. Many popular signal-processing approaches do not work well with damage indicator signals, which can contain sudden changes due to abrupt faults and non-Gaussian outliers. Signal-processing algorithms based on radial basis function (RBF) neural network and weighted recursive median (WRM) filters are explored for denoising simulated time series. The RBF neural network filter is developed using a K-means clustering algorithm and is much less computationally expensive to develop than feedforward neural networks trained using backpropagation. The nonlinear multimodal integer-programming problem of selecting optimal integer weights of the WRM filter is solved using genetic algorithm. Numerical results are obtained for helicopter rotor structural damage indicators based on simulated frequencies. Test signals consider low order polynomial growth of damage indicators with time to simulate gradual or incipient faults and step changes in the signal to simulate abrupt faults. Noise and outliers are added to the test signals. The WRM and RBF filters result in a noise reduction of 54 - 71 and 59 - 73% for the test signals considered in this study, respectively. Their performance is much better than the moving average FIR filter, which causes significant feature distortion and has poor outlier removal capabilities and shows the potential of soft computing methods for specific signal-processing applications. (C) 2005 Elsevier B. V. All rights reserved.
Resumo:
Measured health signals incorporate significant details about any malfunction in a gas turbine. The attenuation of noise and removal of outliers from these health signals while preserving important features is an important problem in gas turbine diagnostics. The measured health signals are a time series of sensor measurements such as the low rotor speed, high rotor speed, fuel flow, and exhaust gas temperature in a gas turbine. In this article, a comparative study is done by varying the window length of acausal and unsymmetrical weighted recursive median filters and numerical results for error minimization are obtained. It is found that optimal filters exist, which can be used for engines where data are available slowly (three-point filter) and rapidly (seven-point filter). These smoothing filters are proposed as preprocessors of measurement delta signals before subjecting them to fault detection and isolation algorithms.
Resumo:
The removal of noise and outliers from measurement signals is a major problem in jet engine health monitoring. Topical measurement signals found in most jet engines include low rotor speed, high rotor speed. fuel flow and exhaust gas temperature. Deviations in these measurements from a baseline 'good' engine are often called measurement deltas and the health signals used for fault detection, isolation, trending and data mining. Linear filters such as the FIR moving average filter and IIR exponential average filter are used in the industry to remove noise and outliers from the jet engine measurement deltas. However, the use of linear filters can lead to loss of critical features in the signal that can contain information about maintenance and repair events that could be used by fault isolation algorithms to determine engine condition or by data mining algorithms to learn valuable patterns in the data, Non-linear filters such as the median and weighted median hybrid filters offer the opportunity to remove noise and gross outliers from signals while preserving features. In this study. a comparison of traditional linear filters popular in the jet engine industry is made with the median filter and the subfilter weighted FIR median hybrid (SWFMH) filter. Results using simulated data with implanted faults shows that the SWFMH filter results in a noise reduction of over 60 per cent compared to only 20 per cent for FIR filters and 30 per cent for IIR filters. Preprocessing jet engine health signals using the SWFMH filter would greatly improve the accuracy of diagnostic systems. (C) 2002 Published by Elsevier Science Ltd.
Resumo:
We address the problem of robust formant tracking in continuous speech in the presence of additive noise. We propose a new approach based on mixture modeling of the formant contours. Our approach consists of two main steps: (i) Computation of a pyknogram based on multiband amplitude-modulation/frequency-modulation (AM/FM) decomposition of the input speech; and (ii) Statistical modeling of the pyknogram using mixture models. We experiment with both Gaussian mixture model (GMM) and Student's-t mixture model (tMM) and show that the latter is robust with respect to handling outliers in the pyknogram data, parameter selection, accuracy, and smoothness of the estimated formant contours. Experimental results on simulated data as well as noisy speech data show that the proposed tMM-based approach is also robust to additive noise. We present performance comparisons with a recently developed adaptive filterbank technique proposed in the literature and the classical Burg's spectral estimator technique, which show that the proposed technique is more robust to noise.
Resumo:
Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.
Resumo:
Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity.
Resumo:
We propose a distributed sequential algorithm for quick detection of spectral holes in a Cognitive Radio set up. Two or more local nodes make decisions and inform the fusion centre (FC) over a reporting Multiple Access Channel (MAC), which then makes the final decision. The local nodes use energy detection and the FC uses mean detection in the presence of fading, heavy-tailed electromagnetic interference (EMI) and outliers. The statistics of the primary signal, channel gain and the EMI is not known. Different nonparametric sequential algorithms are compared to choose appropriate algorithms to be used at the local nodes and the Fe. Modification of a recently developed random walk test is selected for the local nodes for energy detection as well as at the fusion centre for mean detection. We show via simulations and analysis that the nonparametric distributed algorithm developed performs well in the presence of fading, EMI and outliers. The algorithm is iterative in nature making the computation and storage requirements minimal.