941 resultados para Mean squared error method


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ce mémoire porte sur la présentation des estimateurs de Bernstein qui sont des alternatives récentes aux différents estimateurs classiques de fonctions de répartition et de densité. Plus précisément, nous étudions leurs différentes propriétés et les comparons à celles de la fonction de répartition empirique et à celles de l'estimateur par la méthode du noyau. Nous déterminons une expression asymptotique des deux premiers moments de l'estimateur de Bernstein pour la fonction de répartition. Comme pour les estimateurs classiques, nous montrons que cet estimateur vérifie la propriété de Chung-Smirnov sous certaines conditions. Nous montrons ensuite que l'estimateur de Bernstein est meilleur que la fonction de répartition empirique en terme d'erreur quadratique moyenne. En s'intéressant au comportement asymptotique des estimateurs de Bernstein, pour un choix convenable du degré du polynôme, nous montrons que ces estimateurs sont asymptotiquement normaux. Des études numériques sur quelques distributions classiques nous permettent de confirmer que les estimateurs de Bernstein peuvent être préférables aux estimateurs classiques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette thèse comporte trois articles dont un est publié et deux en préparation. Le sujet central de la thèse porte sur le traitement des valeurs aberrantes représentatives dans deux aspects importants des enquêtes que sont : l’estimation des petits domaines et l’imputation en présence de non-réponse partielle. En ce qui concerne les petits domaines, les estimateurs robustes dans le cadre des modèles au niveau des unités ont été étudiés. Sinha & Rao (2009) proposent une version robuste du meilleur prédicteur linéaire sans biais empirique pour la moyenne des petits domaines. Leur estimateur robuste est de type «plugin», et à la lumière des travaux de Chambers (1986), cet estimateur peut être biaisé dans certaines situations. Chambers et al. (2014) proposent un estimateur corrigé du biais. En outre, un estimateur de l’erreur quadratique moyenne a été associé à ces estimateurs ponctuels. Sinha & Rao (2009) proposent une procédure bootstrap paramétrique pour estimer l’erreur quadratique moyenne. Des méthodes analytiques sont proposées dans Chambers et al. (2014). Cependant, leur validité théorique n’a pas été établie et leurs performances empiriques ne sont pas pleinement satisfaisantes. Ici, nous examinons deux nouvelles approches pour obtenir une version robuste du meilleur prédicteur linéaire sans biais empirique : la première est fondée sur les travaux de Chambers (1986), et la deuxième est basée sur le concept de biais conditionnel comme mesure de l’influence d’une unité de la population. Ces deux classes d’estimateurs robustes des petits domaines incluent également un terme de correction pour le biais. Cependant, ils utilisent tous les deux l’information disponible dans tous les domaines contrairement à celui de Chambers et al. (2014) qui utilise uniquement l’information disponible dans le domaine d’intérêt. Dans certaines situations, un biais non négligeable est possible pour l’estimateur de Sinha & Rao (2009), alors que les estimateurs proposés exhibent un faible biais pour un choix approprié de la fonction d’influence et de la constante de robustesse. Les simulations Monte Carlo sont effectuées, et les comparaisons sont faites entre les estimateurs proposés et ceux de Sinha & Rao (2009) et de Chambers et al. (2014). Les résultats montrent que les estimateurs de Sinha & Rao (2009) et de Chambers et al. (2014) peuvent avoir un biais important, alors que les estimateurs proposés ont une meilleure performance en termes de biais et d’erreur quadratique moyenne. En outre, nous proposons une nouvelle procédure bootstrap pour l’estimation de l’erreur quadratique moyenne des estimateurs robustes des petits domaines. Contrairement aux procédures existantes, nous montrons formellement la validité asymptotique de la méthode bootstrap proposée. Par ailleurs, la méthode proposée est semi-paramétrique, c’est-à-dire, elle n’est pas assujettie à une hypothèse sur les distributions des erreurs ou des effets aléatoires. Ainsi, elle est particulièrement attrayante et plus largement applicable. Nous examinons les performances de notre procédure bootstrap avec les simulations Monte Carlo. Les résultats montrent que notre procédure performe bien et surtout performe mieux que tous les compétiteurs étudiés. Une application de la méthode proposée est illustrée en analysant les données réelles contenant des valeurs aberrantes de Battese, Harter & Fuller (1988). S’agissant de l’imputation en présence de non-réponse partielle, certaines formes d’imputation simple ont été étudiées. L’imputation par la régression déterministe entre les classes, qui inclut l’imputation par le ratio et l’imputation par la moyenne sont souvent utilisées dans les enquêtes. Ces méthodes d’imputation peuvent conduire à des estimateurs imputés biaisés si le modèle d’imputation ou le modèle de non-réponse n’est pas correctement spécifié. Des estimateurs doublement robustes ont été développés dans les années récentes. Ces estimateurs sont sans biais si l’un au moins des modèles d’imputation ou de non-réponse est bien spécifié. Cependant, en présence des valeurs aberrantes, les estimateurs imputés doublement robustes peuvent être très instables. En utilisant le concept de biais conditionnel, nous proposons une version robuste aux valeurs aberrantes de l’estimateur doublement robuste. Les résultats des études par simulations montrent que l’estimateur proposé performe bien pour un choix approprié de la constante de robustesse.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the problem of conducting inference on nonparametric high-frequency estimators without knowing their asymptotic variances. We prove that a multivariate subsampling method achieves this goal under general conditions that were not previously available in the literature. We suggest a procedure for a data-driven choice of the bandwidth parameters. Our simulation study indicates that the subsampling method is much more robust than the plug-in method based on the asymptotic expression for the variance. Importantly, the subsampling method reliably estimates the variability of the Two Scale estimator even when its parameters are chosen to minimize the finite sample Mean Squared Error; in contrast, the plugin estimator substantially underestimates the sampling uncertainty. By construction, the subsampling method delivers estimates of the variance-covariance matrices that are always positive semi-definite. We use the subsampling method to study the dynamics of financial betas of six stocks on the NYSE. We document significant variation in betas within year 2006, and find that tick data captures more variation in betas than the data sampled at moderate frequencies such as every five or twenty minutes. To capture this variation we estimate a simple dynamic model for betas. The variance estimation is also important for the correction of the errors-in-variables bias in such models. We find that the bias corrections are substantial, and that betas are more persistent than the naive estimators would lead one to believe.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, a new directionally adaptive, learning based, single image super resolution method using multiple direction wavelet transform, called Directionlets is presented. This method uses directionlets to effectively capture directional features and to extract edge information along different directions of a set of available high resolution images .This information is used as the training set for super resolving a low resolution input image and the Directionlet coefficients at finer scales of its high-resolution image are learned locally from this training set and the inverse Directionlet transform recovers the super-resolved high resolution image. The simulation results showed that the proposed approach outperforms standard interpolation techniques like Cubic spline interpolation as well as standard Wavelet-based learning, both visually and in terms of the mean squared error (mse) values. This method gives good result with aliased images also.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Gram-Schmidt (GS) orthogonalisation procedure has been used to improve the convergence speed of least mean square (LMS) adaptive code-division multiple-access (CDMA) detectors. However, this algorithm updates two sets of parameters, namely the GS transform coefficients and the tap weights, simultaneously. Because of the additional adaptation noise introduced by the former, it is impossible to achieve the same performance as the ideal orthogonalised LMS filter, unlike the result implied in an earlier paper. The authors provide a lower bound on the minimum achievable mean squared error (MSE) as a function of the forgetting factor λ used in finding the GS transform coefficients, and propose a variable-λ algorithm to balance the conflicting requirements of good tracking and low misadjustment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ground-based Atmospheric Radiation Measurement Program (ARM) and NASA Aerosol Robotic Net- work (AERONET) routinely monitor clouds using zenith ra- diances at visible and near-infrared wavelengths. Using the transmittance calculated from such measurements, we have developed a new retrieval method for cloud effective droplet size and conducted extensive tests for non-precipitating liquid water clouds. The underlying principle is to combine a liquid-water-absorbing wavelength (i.e., 1640 nm) with a non-water-absorbing wavelength for acquiring information on cloud droplet size and optical depth. For simulated stratocumulus clouds with liquid water path less than 300 g m−2 and horizontal resolution of 201 m, the retrieval method underestimates the mean effective radius by 0.8μm, with a root-mean-squared error of 1.7 μm and a relative deviation of 13%. For actual observations with a liquid water path less than 450 g m−2 at the ARM Oklahoma site during 2007– 2008, our 1.5-min-averaged retrievals are generally larger by around 1 μm than those from combined ground-based cloud radar and microwave radiometer at a 5-min temporal resolution. We also compared our retrievals to those from combined shortwave flux and microwave observations for relatively homogeneous clouds, showing that the bias between these two retrieval sets is negligible, but the error of 2.6 μm and the relative deviation of 22 % are larger than those found in our simulation case. Finally, the transmittance-based cloud effective droplet radii agree to better than 11 % with satellite observations and have a negative bias of 1 μm. Overall, the retrieval method provides reasonable cloud effective radius estimates, which can enhance the cloud products of both ARM and AERONET.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We study the reconstruction of visual stimuli from spike trains, representing the reconstructed stimulus by a Volterra series up to second order. We illustrate this procedure in a prominent example of spiking neurons, recording simultaneously from the two H1 neurons located in the lobula plate of the fly Chrysomya megacephala. The fly views two types of stimuli, corresponding to rotational and translational displacements. Second-order reconstructions require the manipulation of potentially very large matrices, which obstructs the use of this approach when there are many neurons. We avoid the computation and inversion of these matrices using a convenient set of basis functions to expand our variables in. This requires approximating the spike train four-point functions by combinations of two-point functions similar to relations, which would be true for gaussian stochastic processes. In our test case, this approximation does not reduce the quality of the reconstruction. The overall contribution to stimulus reconstruction of the second-order kernels, measured by the mean squared error, is only about 5% of the first-order contribution. Yet at specific stimulus-dependent instants, the addition of second-order kernels represents up to 100% improvement, but only for rotational stimuli. We present a perturbative scheme to facilitate the application of our method to weakly correlated neurons.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Predictors of random effects are usually based on the popular mixed effects (ME) model developed under the assumption that the sample is obtained from a conceptual infinite population; such predictors are employed even when the actual population is finite. Two alternatives that incorporate the finite nature of the population are obtained from the superpopulation model proposed by Scott and Smith (1969. Estimation in multi-stage surveys. J. Amer. Statist. Assoc. 64, 830-840) or from the finite population mixed model recently proposed by Stanek and Singer (2004. Predicting random effects from finite population clustered samples with response error. J. Amer. Statist. Assoc. 99, 1119-1130). Predictors derived under the latter model with the additional assumptions that all variance components are known and that within-cluster variances are equal have smaller mean squared error (MSE) than the competitors based on either the ME or Scott and Smith`s models. As population variances are rarely known, we propose method of moment estimators to obtain empirical predictors and conduct a simulation study to evaluate their performance. The results suggest that the finite population mixed model empirical predictor is more stable than its competitors since, in terms of MSE, it is either the best or the second best and when second best, its performance lies within acceptable limits. When both cluster and unit intra-class correlation coefficients are very high (e.g., 0.95 or more), the performance of the empirical predictors derived under the three models is similar. (c) 2007 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our focus is on information in expectation surveys that can now be built on thousands (or millions) of respondents on an almost continuous-time basis (big data) and in continuous macroeconomic surveys with a limited number of respondents. We show that, under standard microeconomic and econometric techniques, survey forecasts are an affine function of the conditional expectation of the target variable. This is true whether or not the survey respondent knows the data-generating process (DGP) of the target variable or the econometrician knows the respondents individual loss function. If the econometrician has a mean-squared-error risk function, we show that asymptotically efficient forecasts of the target variable can be built using Hansens (Econometrica, 1982) generalized method of moments in a panel-data context, when N and T diverge or when T diverges with N xed. Sequential asymptotic results are obtained using Phillips and Moon s (Econometrica, 1999) framework. Possible extensions are also discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increase in ultraviolet radiation (UV) at surface, the high incidence of non-melanoma skin cancer (NMSC) in coast of Northeast of Brazil (NEB) and reduction of total ozone were the motivation for the present study. The overall objective was to identify and understand the variability of UV or Index Ultraviolet Radiation (UV Index) in the capitals of the east coast of the NEB and adjust stochastic models to time series of UV index aiming make predictions (interpolations) and forecasts / projections (extrapolations) followed by trend analysis. The methodology consisted of applying multivariate analysis (principal component analysis and cluster analysis), Predictive Mean Matching method for filling gaps in the data, autoregressive distributed lag (ADL) and Mann-Kendal. The modeling via the ADL consisted of parameter estimation, diagnostics, residuals analysis and evaluation of the quality of the predictions and forecasts via mean squared error and Pearson correlation coefficient. The research results indicated that the annual variability of UV in the capital of Rio Grande do Norte (Natal) has a feature in the months of September and October that consisting of a stabilization / reduction of UV index because of the greater annual concentration total ozone. The increased amount of aerosol during this period contributes in lesser intensity for this event. The increased amount of aerosol during this period contributes in lesser intensity for this event. The application of cluster analysis on the east coast of the NEB showed that this event also occurs in the capitals of Paraiba (João Pessoa) and Pernambuco (Recife). Extreme events of UV in NEB were analyzed from the city of Natal and were associated with absence of cloud cover and levels below the annual average of total ozone and did not occurring in the entire region because of the uneven spatial distribution of these variables. The ADL (4, 1) model, adjusted with data of the UV index and total ozone to period 2001-2012 made a the projection / extrapolation for the next 30 years (2013-2043) indicating in end of that period an increase to the UV index of one unit (approximately), case total ozone maintain the downward trend observed in study period

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Apresenta-se nesta dissertação a proposta de um algoritmo supervisionado de classificação de imagens de sensoreamento remoto, composto de três etapas: remoção ou suavização de nuvens, segmentação e classificação.O método de remoção de nuvens usa filtragem homomórfica para tratar as obstruções causadas pela presença de nuvens suaves e o método Inpainting para remover ou suavizar a preseça de sombras e nuvens densas. Para as etapas de segmentação e classificação é proposto um método baseado na energia AC dos coeficientes da Transformada Cosseno Discreta (DCT). O modo de classificação adotado é do tipo supervisionado. Para avaliar o algioritmo foi usado um banco de 14 imagens captadas por vários sensores, das quais 12 possuem algum tipo de obstrução. Para avaliar a etapa de remoção ou suavização de nuvens e sombras são usados a razão sinal-ruído de pico (PSNR) e o coeficiente Kappa. Nessa fase, vários filtros passa-altas foram comparados para a escolha do mais eficiente. A segmentação das imagens é avaliada pelo método da coincidência entre bordas (EBC) e a classificação é avaliada pela medida da entropia relativa e do erro médio quadrático (MSE). Tão importante quanto as métricas, as imagens resultantes são apresentadas de forma a permitir a avaliação subjetiva por comparação visual. Os resultados mostram a eficiência do algoritmo proposto, principalmente quando comparado ao software Spring, distribuído pelo Instituto Nacional de Pesquisas Espaciais (INPE).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)