311 resultados para OUTLIERS


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

É crescente a busca pelo engajamento no trabalho por se tratar de um construto motivacional positivo, caracterizado por vigor, dedicação e absorção, sempre relacionado ao trabalho, implicando sentimento de realização que envolve estado cognitivo positivo e que é persistente no tempo. A busca em conhecer o que gera sentimentos positivos em trabalhadores no ambiente de trabalho é tema corrente em diversos estudos na área do comportamento organizacional. O construto de bem-estar no trabalho é constituído de três dimensões, a saber: satisfação no trabalho, envolvimento no trabalho e comprometimento organizacional afetivo associada aos afetos positivos dirigidos ao trabalho; e por sua vez o capital psicológico está relacionado com resultados de desempenho no trabalho, como otimismo, eficácia, esperança e resiliência. O presente estudo teve como objetivo analisar as relações entre engajamento no trabalho, bem-estar no trabalho e capital psicológico em profissionais da área de gestão de pessoas. Os participantes deste estudo foram 159 profissionais que atuam na área de gestão de pessoas em organizações diversas. A coleta de dados foi realizada por meio de um questionário eletrônico criado no ambiente Surveymonkey. O questionário era autoaplicável contendo as seguintes medidas. A análise dos dados foi realizada por meio do SPSS 19.0, calculando-se estatísticas descritivas e índices de correlação. Foi realizada, a priori, uma análise exploratória dos dados para verificar a precisão de entrada de dados, outliers e respostas omissas. Os resultados revelaram a existência de correlações positivas e significativas entre engajamento no trabalho, capital psicológico e as dimensões de bem-estar no trabalho (satisfação no trabalho, envolvimento no trabalho e comprometimento organizacional afetivo). Conclui-se que o indivíduo que apresenta vigor e absorção tem também em níveis acentuados otimismo, resiliência, esperança e eficácia e as dimensões de bem estar: satisfação no trabalho, envolvimento no trabalho e comprometimento organizacional afetivo.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The present study gives a contribution to the knowledge on the Na-feldspar and plagioclases, extending the database of the Raman spectra of plagioclases with different chemical compositions and structural orders. This information may be used for the future planetary explorations by “rovers”, for the investigation of ceramics nanocrystal materials and for the mineralogical phase identification in sediments. Na-feldspar and plagioclase solid solution have been investigated by Raman spectroscopy in order to determine the relationships between the vibrational changes and the plagioclase crystal chemistry and structure. We focused on the Raman micro-spectroscopy technique, being a non-destructive method, suited for contactless analysis with high spatial resolution. Chemical and structural analyses have been performed on natural samples to test the usefulness of Raman spectroscopy as a tool in the study of the pressure-induced structural deformations, the disordering processes due to change in the Al-Si distribution in the tetrahedral sites and, finally, in the determination of the anorthitic content (Anx) in plagioclase minerals. All the predicted 39 Ag Raman active modes have been identified and assigned to specific patterns of atomic vibrational motion. A detailed comparison between experimental and computed Raman spectra has been performed and previous assignments have been revised, solving some discrepancies reported in recent literature. The ab initio calculation at the hybrid HF/DFT level with the WC1LYP Hamiltonian has proven to give excellent agreement between calculated and experimentally measured Raman wavenumbers and intensities in triclinic minerals. A short digression on the 36 infrared active modes of Na-feldspar has been done too. The identification of all 39 computed Raman modes in the experimentally measured spectra of the fully ordered Na-feldspar, known as low albite, along with the detailed description of each vibrational mode, has been essential to extend the comparative analysis to the high pressure and high temperature structural forms of albite, which reflect the physical–chemical conditions of the hosting rocks. The understanding of feldspar structure response to pressure and temperature is crucial in order to constrain crustal behaviour. The compressional behaviour of the Na-feldspar has been investigated for the first time by Raman spectroscopy. The absence of phase transitions and the occurrence of two secondary compression mechanisms acting at different pressures have been confirmed. Moreover, Raman data suggest that the internal structural changes are confined to a small pressure interval, localized around 6 GPa, not spread out from 4 to 8 GPa as suggested by previous X-rays studies on elasticity. The dominant compression mechanisms act via tetrahedral tilting, while the T-O bond lengths remain nearly constant at moderate compressional regimes. At the spectroscopic level, this leads to the strong pressure dependencies of T-O-T bending modes, as found for the four modes at 478, 508, 578 and 815 cm-1. The Al-Si distribution in the tetrahedral sites affects also the Raman spectrum of Na-feldspar. In particular, peak broadening is more sensitive than peak position to changes in the degree of order. Raman spectroscopy is found to be a good probe for local ordering, in particular being sensitive to the first annealing steps, when the macroscopic order parameter is still high. Even though Raman data are scattered and there are outliers in the estimated values of the degree of order, the average peak linewidths of the Na-feldspar characteristic doublet band, labelled here as υa and υb, as a function of the order parameter Qod show interesting trends: both peak linewidths linearly increase until saturation. From Qod values lower than 0.6, peak broadening is no more affected by the Al-Si distribution. Moreover, the disordering process is found to be heterogeneous. SC-XRD and Raman data have suggested an inter-crystalline inhomogeneity of the samples, i.e., the presence of regions with different defect density on the micrometric scale. Finally, the influence of Ca-Na substitution in the plagioclase Raman spectra has been investigated. Raman spectra have been collected on a series of well characterized natural, low structural plagioclases. The variations of the Raman modes as a function of the chemical composition and the structural order have been determined. The number of the observed Raman bands at each composition gives information about the unit-cell symmetry: moving away from the C1 structures, the number of the Raman bands enhances, as the number of formula units in the unit cell increases. The modification from an “albite-like” Raman spectrum to a more “anorthite-like” spectrum occurs from sample An78 onwards, which coincides with the appearance of c reflections in the diffraction patterns of the samples. The evolution of the Raman bands υa and υb displays two changes in slope at ~An45 and ~An75: the first one occurs between e2 and e1 plagioclases, the latter separates e1 and I1 plagioclases with only b reflections in their diffraction patterns from I1 and P1 samples having b and c reflections too. The first variation represents exactly the e2→e1 phase transitions, whereas the second one corresponds in good approximation to the C1→I1 transition, which has been determined at ~An70 by previous works. The I1→P1 phase transition in the anorthite-rich side of the solid solution is not highlighted in the collected Raman spectra. Variations in peak broadening provide insights into the behaviour of the order parameter on a local scale, suggesting an increase in the structural disorder within the solid solution, as the structures have to incorporate more Al atoms to balance the change from monovalent to divalent cations. All the information acquired on these natural plagioclases has been used to produce a protocol able to give a preliminary estimation of the chemical composition of an unknown plagioclase from its Raman spectrum. Two calibration curves, one for albite-rich plagioclases and the other one for the anorthite-rich plagioclases, have been proposed by relating the peak linewidth of the most intense Raman band υa and the An content. It has been pointed out that the dependence of the composition from the linewidth can be obtained only for low structural plagioclases with a degree of order not far away from the references. The proposed tool has been tested on three mineralogical samples, two of meteoric origin and one of volcanic origin. Chemical compositions by Raman spectroscopy compare well, within an error of about 10%, with those obtained by elemental techniques. Further analyses on plagioclases with unknown composition will be necessary to validate the suggested method and introduce it as routine tool for the determination of the chemical composition from Raman data in planetary missions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper, addresses the problem of novelty detection in the case that the observed data is a mixture of a known 'background' process contaminated with an unknown other process, which generates the outliers, or novel observations. The framework we describe here is quite general, employing univariate classification with incomplete information, based on knowledge of the distribution (the 'probability density function', 'pdf') of the data generated by the 'background' process. The relative proportion of this 'background' component (the 'prior' 'background' 'probability), the 'pdf' and the 'prior' probabilities of all other components are all assumed unknown. The main contribution is a new classification scheme that identifies the maximum proportion of observed data following the known 'background' distribution. The method exploits the Kolmogorov-Smirnov test to estimate the proportions, and afterwards data are Bayes optimally separated. Results, demonstrated with synthetic data, show that this approach can produce more reliable results than a standard novelty detection scheme. The classification algorithm is then applied to the problem of identifying outliers in the SIC2004 data set, in order to detect the radioactive release simulated in the 'oker' data set. We propose this method as a reliable means of novelty detection in the emergency situation which can also be used to identify outliers prior to the application of a more general automatic mapping algorithm. © Springer-Verlag 2007.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study examines the forecasting accuracy of alternative vector autoregressive models each in a seven-variable system that comprises in turn of daily, weekly and monthly foreign exchange (FX) spot rates. The vector autoregressions (VARs) are in non-stationary, stationary and error-correction forms and are estimated using OLS. The imposition of Bayesian priors in the OLS estimations also allowed us to obtain another set of results. We find that there is some tendency for the Bayesian estimation method to generate superior forecast measures relatively to the OLS method. This result holds whether or not the data sets contain outliers. Also, the best forecasts under the non-stationary specification outperformed those of the stationary and error-correction specifications, particularly at long forecast horizons, while the best forecasts under the stationary and error-correction specifications are generally similar. The findings for the OLS forecasts are consistent with recent simulation results. The predictive ability of the VARs is very weak.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

PURPOSE: To assess the accuracy of three wavefront analyzers versus a validated binocular open-view autorefractor in determining refractive error in non-cycloplegic eyes. METHODS: Eighty eyes were examined using the SRW-5000 open-view infrared autorefractor and, in randomized sequence, three wavefront analyzers: 1) OPD-Scan (NIDEK, Gamagori, Japan), 2) WASCA (Zeiss/Meditec, Jena, Germany), and 3) Allegretto (WaveLight Laser Technologies AG, Erlangen, Germany). Subjects were healthy adults (19 men and 21 women; mean age: 20.8 +/- 2.5 years). Refractive errors ranged from +1.5 to -9.75 diopters (D) (mean: +1.83 +/- 2.74 D) with up to 1.75 D cylinder (mean: 0.58 +/- 0.53 D). Three readings were collected per instrument by one examiner without anticholinergic agents. Refraction values were decomposed into vector components for analysis, resulting in mean spherical equivalent refraction (M) and J0 and J45 being vectors of cylindrical power at 0 degrees and 45 degrees, respectively. RESULTS: Positive correlation was observed between wavefront analyzers and the SRW-5000 for spherical equivalent refraction (OPD-Scan, r=0.959, P<.001; WASCA, r=0.981, P<.001; Allegretto, r=0.942, P<.001). Mean differences and limits of agreement showed more negative spherical equivalent refraction with wavefront analyzers (OPD-Scan, 0.406 +/- 0.768 D [range: 0.235 to 0.580 D] [P<.001]; WASCA, 0.511 +/- 0.550 D [range: 0.390 to 0.634 D] [P<.001]; and Allegretto, 0.434 +/- 0.904 D [range: 0.233 to 0.635 D] [P<.001]). A second analysis eliminating outliers showed the same trend but lower differences: OPD-Scan (n=75), 0.24 +/- 0.41 D (range: 0.15 to 0.34 D) (P<.001); WASCA (n=78), 0.46 +/- 0.47 D (range: 0.36 to 0.57 D) (P<.001); and Allegretto (n=77), 0.30 +/- 0.62 D (range: 0.16 to 0.44 D) (P<.001). No statistically significant differences were noted for J0 and J45. CONCLUSIONS: Wavefront analyzer refraction resulted in 0.30 D more myopia compared to SRW-5000 refraction in eyes without cycloplegia. This is the result of the accommodation excess attributable to instrument myopia. For the relatively low degrees of astigmatism in this study (<2.0 D), good agreement was noted between wavefront analyzers and the SRW-5000. Copyright (C) 2006 SLACK Incorporated

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Automatically generating maps of a measured variable of interest can be problematic. In this work we focus on the monitoring network context where observations are collected and reported by a network of sensors, and are then transformed into interpolated maps for use in decision making. Using traditional geostatistical methods, estimating the covariance structure of data collected in an emergency situation can be difficult. Variogram determination, whether by method-of-moment estimators or by maximum likelihood, is very sensitive to extreme values. Even when a monitoring network is in a routine mode of operation, sensors can sporadically malfunction and report extreme values. If this extreme data destabilises the model, causing the covariance structure of the observed data to be incorrectly estimated, the generated maps will be of little value, and the uncertainty estimates in particular will be misleading. Marchant and Lark [2007] propose a REML estimator for the covariance, which is shown to work on small data sets with a manual selection of the damping parameter in the robust likelihood. We show how this can be extended to allow treatment of large data sets together with an automated approach to all parameter estimation. The projected process kriging framework of Ingram et al. [2007] is extended to allow the use of robust likelihood functions, including the two component Gaussian and the Huber function. We show how our algorithm is further refined to reduce the computational complexity while at the same time minimising any loss of information. To show the benefits of this method, we use data collected from radiation monitoring networks across Europe. We compare our results to those obtained from traditional kriging methodologies and include comparisons with Box-Cox transformations of the data. We discuss the issue of whether to treat or ignore extreme values, making the distinction between the robust methods which ignore outliers and transformation methods which treat them as part of the (transformed) process. Using a case study, based on an extreme radiological events over a large area, we show how radiation data collected from monitoring networks can be analysed automatically and then used to generate reliable maps to inform decision making. We show the limitations of the methods and discuss potential extensions to remedy these.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Satellite-borne scatterometers are used to measure backscattered micro-wave radiation from the ocean surface. This data may be used to infer surface wind vectors where no direct measurements exist. Inherent in this data are outliers owing to aberrations on the water surface and measurement errors within the equipment. We present two techniques for identifying outliers using neural networks; the outliers may then be removed to improve models derived from the data. Firstly the generative topographic mapping (GTM) is used to create a probability density model; data with low probability under the model may be classed as outliers. In the second part of the paper, a sensor model with input-dependent noise is used and outliers are identified based on their probability under this model. GTM was successfully modified to incorporate prior knowledge of the shape of the observation manifold; however, GTM could not learn the double skinned nature of the observation manifold. To learn this double skinned manifold necessitated the use of a sensor model which imposes strong constraints on the mapping. The results using GTM with a fixed noise level suggested the noise level may vary as a function of wind speed. This was confirmed by experiments using a sensor model with input-dependent noise, where the variation in noise is most sensitive to the wind speed input. Both models successfully identified gross outliers with the largest differences between models occurring at low wind speeds. © 2003 Elsevier Science Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper addresses the problem of obtaining 3d detailed reconstructions of human faces in real-time and with inexpensive hardware. We present an algorithm based on a monocular multi-spectral photometric-stereo setup. This system is known to capture high-detailed deforming 3d surfaces at high frame rates and without having to use any expensive hardware or synchronized light stage. However, the main challenge of such a setup is the calibration stage, which depends on the lights setup and how they interact with the specific material being captured, in this case, human faces. For this purpose we develop a self-calibration technique where the person being captured is asked to perform a rigid motion in front of the camera, maintaining a neutral expression. Rigidity constrains are then used to compute the head's motion with a structure-from-motion algorithm. Once the motion is obtained, a multi-view stereo algorithm reconstructs a coarse 3d model of the face. This coarse model is then used to estimate the lighting parameters with a stratified approach: In the first step we use a RANSAC search to identify purely diffuse points on the face and to simultaneously estimate this diffuse reflectance model. In the second step we apply non-linear optimization to fit a non-Lambertian reflectance model to the outliers of the previous step. The calibration procedure is validated with synthetic and real data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

DEA literature continues apace but software has lagged behind. This session uses suitably selected data to present newly developed software which includes many of the most recent DEA models. The software enables the user to address a variety of issues not frequently found in existing DEA software such as: -Assessments under a variety of possible assumptions of returns to scale including NIRS and NDRS; -Scale elasticity computations; -Numerous Input/Output variables and truly unlimited number of assessment units (DMUs) -Panel data analysis -Analysis of categorical data (multiple categories) -Malmquist Index and its decompositions -Computations of Supper efficiency -Automated removal of super-efficient outliers under user-specified criteria; -Graphical presentation of results -Integrated statistical tests

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Graph embedding is a general framework for subspace learning. However, because of the well-known outlier-sensitiveness disadvantage of the L2-norm, conventional graph embedding is not robust to outliers which occur in many practical applications. In this paper, an improved graph embedding algorithm (termed LPP-L1) is proposed by replacing L2-norm with L1-norm. In addition to its robustness property, LPP-L1 avoids small sample size problem. Experimental results on both synthetic and real-world data demonstrate these advantages. © 2009 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data fluctuation in multiple measurements of Laser Induced Breakdown Spectroscopy (LIBS) greatly affects the accuracy of quantitative analysis. A new LIBS quantitative analysis method based on the Robust Least Squares Support Vector Machine (RLS-SVM) regression model is proposed. The usual way to enhance the analysis accuracy is to improve the quality and consistency of the emission signal, such as by averaging the spectral signals or spectrum standardization over a number of laser shots. The proposed method focuses more on how to enhance the robustness of the quantitative analysis regression model. The proposed RLS-SVM regression model originates from the Weighted Least Squares Support Vector Machine (WLS-SVM) but has an improved segmented weighting function and residual error calculation according to the statistical distribution of measured spectral data. Through the improved segmented weighting function, the information on the spectral data in the normal distribution will be retained in the regression model while the information on the outliers will be restrained or removed. Copper elemental concentration analysis experiments of 16 certified standard brass samples were carried out. The average value of relative standard deviation obtained from the RLS-SVM model was 3.06% and the root mean square error was 1.537%. The experimental results showed that the proposed method achieved better prediction accuracy and better modeling robustness compared with the quantitative analysis methods based on Partial Least Squares (PLS) regression, standard Support Vector Machine (SVM) and WLS-SVM. It was also demonstrated that the improved weighting function had better comprehensive performance in model robustness and convergence speed, compared with the four known weighting functions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: Primary 62F35; Secondary 62P99

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A testing and validation of the Pb-Zn cluster data mining model was developed in order to show its reasonable accuracy before beingused in a production environment. The Pb-Zn cluster data mining model can be used for changes of the mine grinding and floatation processing parameters in almost real-time, which is important for the efficiency of the Pb-Zn ore beneficiation process. ACM Computing Classification System (1998): H.2.8, H.3.3.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Analysis of risk measures associated with price series data movements and its predictions are of strategic importance in the financial markets as well as to policy makers in particular for short- and longterm planning for setting up economic growth targets. For example, oilprice risk-management focuses primarily on when and how an organization can best prevent the costly exposure to price risk. Value-at-Risk (VaR) is the commonly practised instrument to measure risk and is evaluated by analysing the negative/positive tail of the probability distributions of the returns (profit or loss). In modelling applications, least-squares estimation (LSE)-based linear regression models are often employed for modeling and analyzing correlated data. These linear models are optimal and perform relatively well under conditions such as errors following normal or approximately normal distributions, being free of large size outliers and satisfying the Gauss-Markov assumptions. However, often in practical situations, the LSE-based linear regression models fail to provide optimal results, for instance, in non-Gaussian situations especially when the errors follow distributions with fat tails and error terms possess a finite variance. This is the situation in case of risk analysis which involves analyzing tail distributions. Thus, applications of the LSE-based regression models may be questioned for appropriateness and may have limited applicability. We have carried out the risk analysis of Iranian crude oil price data based on the Lp-norm regression models and have noted that the LSE-based models do not always perform the best. We discuss results from the L1, L2 and L∞-norm based linear regression models. ACM Computing Classification System (1998): B.1.2, F.1.3, F.2.3, G.3, J.2.