958 resultados para Truncated robust multivariate outlier detection
Resumo:
We propose robust estimators of the generalized log-gamma distribution and, more generally, of location-shape-scale families of distributions. A (weighted) Q tau estimator minimizes a tau scale of the differences between empirical and theoretical quantiles. It is n(1/2) consistent; unfortunately, it is not asymptotically normal and, therefore, inconvenient for inference. However, it is a convenient starting point for a one-step weighted likelihood estimator, where the weights are based on a disparity measure between the model density and a kernel density estimate. The one-step weighted likelihood estimator is asymptotically normal and fully efficient under the model. It is also highly robust under outlier contamination. Supplementary materials are available online.
Resumo:
This study developed and validated a method for moisture determination in artisanal Minas cheese, using near-infrared spectroscopy and partial-least-squares. The model robustness was assured by broad sample diversity, real conditions of routine analysis, variable selection, outlier detection and analytical validation. The model was built from 28.5-55.5% w/w, with a root-mean-square-error-of-prediction of 1.6%. After its adoption, the method stability was confirmed over a period of two years through the development of a control chart. Besides this specific method, the present study sought to provide an example multivariate metrological methodology with potential for application in several areas, including new aspects, such as more stringent evaluation of the linearity of multivariate methods.
Resumo:
The objective of this work was to develop a free access exploratory data analysis software application for academic use that is easy to install and can be handled without user-level programming due to extensive use of chemometrics and its association with applications that require purchased licenses or routines. The developed software, called Chemostat, employs Hierarchical Cluster Analysis (HCA), Principal Component Analysis (PCA), intervals Principal Component Analysis (iPCA), as well as correction methods, data transformation and outlier detection. The data can be imported from the clipboard, text files, ASCII or FT-IR Perkin-Elmer “.sp” files. It generates a variety of charts and tables that allow the analysis of results that can be exported in several formats. The main features of the software were tested using midinfrared and near-infrared spectra in vegetable oils and digital images obtained from different types of commercial diesel. In order to validate the software results, the same sets of data were analyzed using Matlab© and the results in both applications matched in various combinations. In addition to the desktop version, the reuse of algorithms allowed an online version to be provided that offers a unique experience on the web. Both applications are available in English.
Resumo:
In questa tesi vengono analizzati gli algoritmi DistributedSolvingSet e LazyDistributedSolvingSet e verranno mostrati dei risultati sperimentali relativi al secondo.
Resumo:
In this paper, we propose a new method for fully-automatic landmark detection and shape segmentation in X-ray images. To detect landmarks, we estimate the displacements from some randomly sampled image patches to the (unknown) landmark positions, and then we integrate these predictions via a voting scheme. Our key contribution is a new algorithm for estimating these displacements. Different from other methods where each image patch independently predicts its displacement, we jointly estimate the displacements from all patches together in a data driven way, by considering not only the training data but also geometric constraints on the test image. The displacements estimation is formulated as a convex optimization problem that can be solved efficiently. Finally, we use the sparse shape composition model as the a priori information to regularize the landmark positions and thus generate the segmented shape contour. We validate our method on X-ray image datasets of three different anatomical structures: complete femur, proximal femur and pelvis. Experiments show that our method is accurate and robust in landmark detection, and, combined with the shape model, gives a better or comparable performance in shape segmentation compared to state-of-the art methods. Finally, a preliminary study using CT data shows the extensibility of our method to 3D data.
Resumo:
In the framework of the European Project for Ice Coring in Antarctica (EPICA), a comprehensive glaciological pre-site survey has been carried out on Amundsenisen, Dronning Maud Land, East Antarctica, in the past decade. Within this survey, four intermediate-depth ice cores and 13 snow pits were analyzed for their ionic composition and interpreted with respect to the spatial and temporal variability of volcanic sulphate deposition. The comparison of the non-sea-salt (nss)-sulphate peaks that are related to the well-known eruptions of Pinatubo and Cerro Hudson in AD 1991 revealed sulphate depositions of comparable size (15.8 ± 3.4 kg/km**2) in 11 snow pits. There is a tendency to higher annual concentrations for smaller snow-accumulation rates. The combination of seasonal sodium and annually resolved nss-sulphate records allowed the establishment of a time-scale derived by annual-layer counting over the last 2000 years and thus a detailed chronology of annual volcanic sulphate deposition. Using a robust outlier detection algorithm, 49 volcanic eruptions were identified between AD 165 and 1997. The dating uncertainty is ±3 years between AD 1997 and 1601, around ±5 years between AD 1601 and 1257, and increasing to ±24 years at AD 165, improving the accuracy of the volcanic chronology during the penultimate millennium considerably.
Resumo:
A novel and high-quality system for moving object detection in sequences recorded with moving cameras is proposed. This system is based on the collaboration between an automatic homography estimation module for image alignment, and a robust moving object detection using an efficient spatiotemporal nonparametric background modeling.
Resumo:
Outliers are objects that show abnormal behavior with respect to their context or that have unexpected values in some of their parameters. In decision-making processes, information quality is of the utmost importance. In specific applications, an outlying data element may represent an important deviation in a production process or a damaged sensor. Therefore, the ability to detect these elements could make the difference between making a correct and an incorrect decision. This task is complicated by the large sizes of typical databases. Due to their importance in search processes in large volumes of data, researchers pay special attention to the development of efficient outlier detection techniques. This article presents a computationally efficient algorithm for the detection of outliers in large volumes of information. This proposal is based on an extension of the mathematical framework upon which the basic theory of detection of outliers, founded on Rough Set Theory, has been constructed. From this starting point, current problems are analyzed; a detection method is proposed, along with a computational algorithm that allows the performance of outlier detection tasks with an almost-linear complexity. To illustrate its viability, the results of the application of the outlier-detection algorithm to the concrete example of a large database are presented.
Resumo:
The dissertation starts by providing a description of the phenomena related to the increasing importance recently acquired by satellite applications. The spread of such technology comes with implications, such as an increase in maintenance cost, from which derives the interest in developing advanced techniques that favor an augmented autonomy of spacecrafts in health monitoring. Machine learning techniques are widely employed to lay a foundation for effective systems specialized in fault detection by examining telemetry data. Telemetry consists of a considerable amount of information; therefore, the adopted algorithms must be able to handle multivariate data while facing the limitations imposed by on-board hardware features. In the framework of outlier detection, the dissertation addresses the topic of unsupervised machine learning methods. In the unsupervised scenario, lack of prior knowledge of the data behavior is assumed. In the specific, two models are brought to attention, namely Local Outlier Factor and One-Class Support Vector Machines. Their performances are compared in terms of both the achieved prediction accuracy and the equivalent computational cost. Both models are trained and tested upon the same sets of time series data in a variety of settings, finalized at gaining insights on the effect of the increase in dimensionality. The obtained results allow to claim that both models, combined with a proper tuning of their characteristic parameters, successfully comply with the role of outlier detectors in multivariate time series data. Nevertheless, under this specific context, Local Outlier Factor results to be outperforming One-Class SVM, in that it proves to be more stable over a wider range of input parameter values. This property is especially valuable in unsupervised learning since it suggests that the model is keen to adapting to unforeseen patterns.
Resumo:
Electrocardiographic (ECG) signals are emerging as a recent trend in the field of biometrics. In this paper, we propose a novel ECG biometric system that combines clustering and classification methodologies. Our approach is based on dominant-set clustering, and provides a framework for outlier removal and template selection. It enhances the typical workflows, by making them better suited to new ECG acquisition paradigms that use fingers or hand palms, which lead to signals with lower signal to noise ratio, and more prone to noise artifacts. Preliminary results show the potential of the approach, helping to further validate the highly usable setups and ECG signals as a complementary biometric modality.
Resumo:
Visual data mining, multi-dimensional scaling, POLARMAP, Sammon's mapping, clustering, outlier detection
Resumo:
It is generally accepted that most plant populations are locally adapted. Yet, understanding how environmental forces give rise to adaptive genetic variation is a challenge in conservation genetics and crucial to the preservation of species under rapidly changing climatic conditions. Environmental variation, phylogeographic history, and population demographic processes all contribute to spatially structured genetic variation, however few current models attempt to separate these confounding effects. To illustrate the benefits of using a spatially-explicit model for identifying potentially adaptive loci, we compared outlier locus detection methods with a recently-developed landscape genetic approach. We analyzed 157 loci from samples of the alpine herb Gentiana nivalis collected across the European Alps. Principle coordinates of neighbor matrices (PCNM), eigenvectors that quantify multi-scale spatial variation present in a data set, were incorporated into a landscape genetic approach relating AFLP frequencies with 23 environmental variables. Four major findings emerged. 1) Fifteen loci were significantly correlated with at least one predictor variable (R (adj) (2) > 0.5). 2) Models including PCNM variables identified eight more potentially adaptive loci than models run without spatial variables. 3) When compared to outlier detection methods, the landscape genetic approach detected four of the same loci plus 11 additional loci. 4) Temperature, precipitation, and solar radiation were the three major environmental factors driving potentially adaptive genetic variation in G. nivalis. Techniques presented in this paper offer an efficient method for identifying potentially adaptive genetic variation and associated environmental forces of selection, providing an important step forward for the conservation of non-model species under global change.
Resumo:
PURPOSE: To combine weighted iterative reconstruction with self-navigated free-breathing coronary magnetic resonance angiography for retrospective reduction of respiratory motion artifacts. METHODS: One-dimensional self-navigation was improved for robust respiratory motion detection and the consistency of the acquired data was estimated on the detected motion. Based on the data consistency, the data fidelity term of iterative reconstruction was weighted to reduce the effects of respiratory motion. In vivo experiments were performed in 14 healthy volunteers and the resulting image quality of the proposed method was compared to a navigator-gated reference in terms of acquisition time, vessel length, and sharpness. RESULT: Although the sampling pattern of the proposed method contained 60% more samples with respect to the reference, the scan efficiency was improved from 39.5 ± 10.1% to 55.1 ± 9.1%. The improved self-navigation showed a high correlation to the standard navigator signal and the described weighting efficiently reduced respiratory motion artifacts. Overall, the average image quality of the proposed method was comparable to the navigator-gated reference. CONCLUSION: Self-navigated coronary magnetic resonance angiography was successfully combined with weighted iterative reconstruction to reduce the total acquisition time and efficiently suppress respiratory motion artifacts. The simplicity of the experimental setup and the promising image quality are encouraging toward future clinical evaluation. Magn Reson Med 73:1885-1895, 2015. © 2014 Wiley Periodicals, Inc.
Resumo:
QSAR modeling is a novel computer program developed to generate and validate QSAR or QSPR (quantitative structure- activity or property relationships) models. With QSAR modeling, users can build partial least squares (PLS) regression models, perform variable selection with the ordered predictors selection (OPS) algorithm, and validate models by using y-randomization and leave-N-out cross validation. An additional new feature is outlier detection carried out by simultaneous comparison of sample leverage with the respective Studentized residuals. The program was developed using Java version 6, and runs on any operating system that supports Java Runtime Environment version 6. The use of the program is illustrated. This program is available for download at lqta.iqm.unicamp.br.
Resumo:
In this work, a new approach for supervised pattern recognition is presented which improves the learning algorithm of the Optimum-Path Forest classifier (OPF), centered on detection and elimination of outliers in the training set. Identification of outliers is based on a penalty computed for each sample in the training set from the corresponding number of imputable false positive and false negative classification of samples. This approach enhances the accuracy of OPF while still gaining in classification time, at the expense of a slight increase in training time. © 2010 Springer-Verlag.