991 resultados para TC, Perfusione, Outlier detection, RANSAC


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study developed and validated a method for moisture determination in artisanal Minas cheese, using near-infrared spectroscopy and partial-least-squares. The model robustness was assured by broad sample diversity, real conditions of routine analysis, variable selection, outlier detection and analytical validation. The model was built from 28.5-55.5% w/w, with a root-mean-square-error-of-prediction of 1.6%. After its adoption, the method stability was confirmed over a period of two years through the development of a control chart. Besides this specific method, the present study sought to provide an example multivariate metrological methodology with potential for application in several areas, including new aspects, such as more stringent evaluation of the linearity of multivariate methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work was to develop a free access exploratory data analysis software application for academic use that is easy to install and can be handled without user-level programming due to extensive use of chemometrics and its association with applications that require purchased licenses or routines. The developed software, called Chemostat, employs Hierarchical Cluster Analysis (HCA), Principal Component Analysis (PCA), intervals Principal Component Analysis (iPCA), as well as correction methods, data transformation and outlier detection. The data can be imported from the clipboard, text files, ASCII or FT-IR Perkin-Elmer “.sp” files. It generates a variety of charts and tables that allow the analysis of results that can be exported in several formats. The main features of the software were tested using midinfrared and near-infrared spectra in vegetable oils and digital images obtained from different types of commercial diesel. In order to validate the software results, the same sets of data were analyzed using Matlab© and the results in both applications matched in various combinations. In addition to the desktop version, the reuse of algorithms allowed an online version to be provided that offers a unique experience on the web. Both applications are available in English.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Linear mixed effects models have been widely used in analysis of data where responses are clustered around some random effects, so it is not reasonable to assume independence between observations in the same cluster. In most biological applications, it is assumed that the distributions of the random effects and of the residuals are Gaussian. This makes inferences vulnerable to the presence of outliers. Here, linear mixed effects models with normal/independent residual distributions for robust inferences are described. Specific distributions examined include univariate and multivariate versions of the Student-t, the slash and the contaminated normal. A Bayesian framework is adopted and Markov chain Monte Carlo is used to carry out the posterior analysis. The procedures are illustrated using birth weight data on rats in a texicological experiment. Results from the Gaussian and robust models are contrasted, and it is shown how the implementation can be used for outlier detection. The thick-tailed distributions provide an appealing robust alternative to the Gaussian process in linear mixed models, and they are easily implemented using data augmentation and MCMC techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, a new approach for supervised pattern recognition is presented which improves the learning algorithm of the Optimum-Path Forest classifier (OPF), centered on detection and elimination of outliers in the training set. Identification of outliers is based on a penalty computed for each sample in the training set from the corresponding number of imputable false positive and false negative classification of samples. This approach enhances the accuracy of OPF while still gaining in classification time, at the expense of a slight increase in training time. © 2010 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Medicina Veterinária - FMVZ

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Given a large image set, in which very few images have labels, how to guess labels for the remaining majority? How to spot images that need brand new labels different from the predefined ones? How to summarize these data to route the user’s attention to what really matters? Here we answer all these questions. Specifically, we propose QuMinS, a fast, scalable solution to two problems: (i) Low-labor labeling (LLL) – given an image set, very few images have labels, find the most appropriate labels for the rest; and (ii) Mining and attention routing – in the same setting, find clusters, the top-'N IND.O' outlier images, and the 'N IND.R' images that best represent the data. Experiments on satellite images spanning up to 2.25 GB show that, contrasting to the state-of-the-art labeling techniques, QuMinS scales linearly on the data size, being up to 40 times faster than top competitors (GCap), still achieving better or equal accuracy, it spots images that potentially require unpredicted labels, and it works even with tiny initial label sets, i.e., nearly five examples. We also report a case study of our method’s practical usage to show that QuMinS is a viable tool for automatic coffee crop detection from remote sensing images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Model-based calibration of steady-state engine operation is commonly performed with highly parameterized empirical models that are accurate but not very robust, particularly when predicting highly nonlinear responses such as diesel smoke emissions. To address this problem, and to boost the accuracy of more robust non-parametric methods to the same level, GT-Power was used to transform the empirical model input space into multiple input spaces that simplified the input-output relationship and improved the accuracy and robustness of smoke predictions made by three commonly used empirical modeling methods: Multivariate Regression, Neural Networks and the k-Nearest Neighbor method. The availability of multiple input spaces allowed the development of two committee techniques: a 'Simple Committee' technique that used averaged predictions from a set of 10 pre-selected input spaces chosen by the training data and the "Minimum Variance Committee" technique where the input spaces for each prediction were chosen on the basis of disagreement between the three modeling methods. This latter technique equalized the performance of the three modeling methods. The successively increasing improvements resulting from the use of a single best transformed input space (Best Combination Technique), Simple Committee Technique and Minimum Variance Committee Technique were verified with hypothesis testing. The transformed input spaces were also shown to improve outlier detection and to improve k-Nearest Neighbor performance when predicting dynamic emissions with steady-state training data. An unexpected finding was that the benefits of input space transformation were unaffected by changes in the hardware or the calibration of the underlying GT-Power model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When considering data from many trials, it is likely that some of them present a markedly different intervention effect or exert an undue influence on the summary results. We develop a forward search algorithm for identifying outlying and influential studies in meta-analysis models. The forward search algorithm starts by fitting the hypothesized model to a small subset of likely outlier-free studies and proceeds by adding studies into the set one-by-one that are determined to be closest to the fitted model of the existing set. As each study is added to the set, plots of estimated parameters and measures of fit are monitored to identify outliers by sharp changes in the forward plots. We apply the proposed outlier detection method to two real data sets; a meta-analysis of 26 studies that examines the effect of writing-to-learn interventions on academic achievement adjusting for three possible effect modifiers, and a meta-analysis of 70 studies that compares a fluoride toothpaste treatment to placebo for preventing dental caries in children. A simple simulated example is used to illustrate the steps of the proposed methodology, and a small-scale simulation study is conducted to evaluate the performance of the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the framework of the European Project for Ice Coring in Antarctica (EPICA), a comprehensive glaciological pre-site survey has been carried out on Amundsenisen, Dronning Maud Land, East Antarctica, in the past decade. Within this survey, four intermediate-depth ice cores and 13 snow pits were analyzed for their ionic composition and interpreted with respect to the spatial and temporal variability of volcanic sulphate deposition. The comparison of the non-sea-salt (nss)-sulphate peaks that are related to the well-known eruptions of Pinatubo and Cerro Hudson in AD 1991 revealed sulphate depositions of comparable size (15.8 ± 3.4 kg/km**2) in 11 snow pits. There is a tendency to higher annual concentrations for smaller snow-accumulation rates. The combination of seasonal sodium and annually resolved nss-sulphate records allowed the establishment of a time-scale derived by annual-layer counting over the last 2000 years and thus a detailed chronology of annual volcanic sulphate deposition. Using a robust outlier detection algorithm, 49 volcanic eruptions were identified between AD 165 and 1997. The dating uncertainty is ±3 years between AD 1997 and 1601, around ±5 years between AD 1601 and 1257, and increasing to ±24 years at AD 165, improving the accuracy of the volcanic chronology during the penultimate millennium considerably.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En este Trabajo de Fin de Máster se desarrollará un sistema de detección de fraude en pagos con tarjeta de crédito en tiempo real utilizando tecnologías de procesamiento distribuido. Concretamente se considerarán dos tecnologías: TIBCO, un conjunto de herramientas comerciales diseñadas para el procesamiento de eventos complejos, y Apache Spark, un sistema abierto para el procesamiento de datos en tiempo real. Además de implementar el sistema utilizando las dos tecnologías propuestas, un objetivo, otro objetivo de este Trabajo de Fin de Máster consiste en analizar y comparar estos dos sistemas implementados usados para procesamiento en tiempo real. Para la detección de fraude en pagos con tarjeta de crédito se aplicarán técnicas de aprendizaje máquina, concretamente del campo de anomaly/outlier detection. Como fuentes de datos que alimenten los sistemas, haremos uso de tecnologías de colas de mensajes como TIBCO EMS y Kafka. Los datos generados son enviados a estas colas para que los respectivos sistemas puedan procesarlos y aplicar el algoritmo de aprendizaje máquina, determinando si una nueva instancia es fraude o no. Ambos sistemas hacen uso de una base de datos MongoDB para almacenar los datos generados de forma pseudoaleatoria por los generadores de mensajes, correspondientes a movimientos de tarjetas de crédito. Estos movimientos posteriormente serán usados como conjunto de entrenamiento para el algoritmo de aprendizaje máquina.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mathematical models are increasingly used in environmental science thus increasing the importance of uncertainty and sensitivity analyses. In the present study, an iterative parameter estimation and identifiability analysis methodology is applied to an atmospheric model – the Operational Street Pollution Model (OSPMr). To assess the predictive validity of the model, the data is split into an estimation and a prediction data set using two data splitting approaches and data preparation techniques (clustering and outlier detection) are analysed. The sensitivity analysis, being part of the identifiability analysis, showed that some model parameters were significantly more sensitive than others. The application of the determined optimal parameter values was shown to succesfully equilibrate the model biases among the individual streets and species. It was as well shown that the frequentist approach applied for the uncertainty calculations underestimated the parameter uncertainties. The model parameter uncertainty was qualitatively assessed to be significant, and reduction strategies were identified.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The dissertation starts by providing a description of the phenomena related to the increasing importance recently acquired by satellite applications. The spread of such technology comes with implications, such as an increase in maintenance cost, from which derives the interest in developing advanced techniques that favor an augmented autonomy of spacecrafts in health monitoring. Machine learning techniques are widely employed to lay a foundation for effective systems specialized in fault detection by examining telemetry data. Telemetry consists of a considerable amount of information; therefore, the adopted algorithms must be able to handle multivariate data while facing the limitations imposed by on-board hardware features. In the framework of outlier detection, the dissertation addresses the topic of unsupervised machine learning methods. In the unsupervised scenario, lack of prior knowledge of the data behavior is assumed. In the specific, two models are brought to attention, namely Local Outlier Factor and One-Class Support Vector Machines. Their performances are compared in terms of both the achieved prediction accuracy and the equivalent computational cost. Both models are trained and tested upon the same sets of time series data in a variety of settings, finalized at gaining insights on the effect of the increase in dimensionality. The obtained results allow to claim that both models, combined with a proper tuning of their characteristic parameters, successfully comply with the role of outlier detectors in multivariate time series data. Nevertheless, under this specific context, Local Outlier Factor results to be outperforming One-Class SVM, in that it proves to be more stable over a wider range of input parameter values. This property is especially valuable in unsupervised learning since it suggests that the model is keen to adapting to unforeseen patterns.