856 resultados para regression algorithm
Resumo:
A new sparse kernel density estimator with tunable kernels is introduced within a forward constrained regression framework whereby the nonnegative and summing-to-unity constraints of the mixing weights can easily be satisfied. Based on the minimum integrated square error criterion, a recursive algorithm is developed to select significant kernels one at time, and the kernel width of the selected kernel is then tuned using the gradient descent algorithm. Numerical examples are employed to demonstrate that the proposed approach is effective in constructing very sparse kernel density estimators with competitive accuracy to existing kernel density estimators.
Resumo:
In this paper, we develop a novel constrained recursive least squares algorithm for adaptively combining a set of given multiple models. With data available in an online fashion, the linear combination coefficients of submodels are adapted via the proposed algorithm.We propose to minimize the mean square error with a forgetting factor, and apply the sum to one constraint to the combination parameters. Moreover an l1-norm constraint to the combination parameters is also applied with the aim to achieve sparsity of multiple models so that only a subset of models may be selected into the final model. Then a weighted l2-norm is applied as an approximation to the l1-norm term. As such at each time step, a closed solution of the model combination parameters is available. The contribution of this paper is to derive the proposed constrained recursive least squares algorithm that is computational efficient by exploiting matrix theory. The effectiveness of the approach has been demonstrated using both simulated and real time series examples.
Resumo:
Current commercially available Doppler lidars provide an economical and robust solution for measuring vertical and horizontal wind velocities, together with the ability to provide co- and cross-polarised backscatter profiles. The high temporal resolution of these instruments allows turbulent properties to be obtained from studying the variation in radial velocities. However, the instrument specifications mean that certain characteristics, especially the background noise behaviour, become a limiting factor for the instrument sensitivity in regions where the aerosol load is low. Turbulent calculations require an accurate estimate of the contribution from velocity uncertainty estimates, which are directly related to the signal-to-noise ratio. Any bias in the signal-to-noise ratio will propagate through as a bias in turbulent properties. In this paper we present a method to correct for artefacts in the background noise behaviour of commercially available Doppler lidars and reduce the signal-to-noise ratio threshold used to discriminate between noise, and cloud or aerosol signals. We show that, for Doppler lidars operating continuously at a number of locations in Finland, the data availability can be increased by as much as 50 % after performing this background correction and subsequent reduction in the threshold. The reduction in bias also greatly improves subsequent calculations of turbulent properties in weak signal regimes.
Resumo:
The primary objective of this research study is to determine which form of testing, the PEST algorithm or an operator-controlled condition is most accurate and time efficient for administration of the gaze stabilization test
The SARS algorithm: detrending CoRoT light curves with Sysrem using simultaneous external parameters
Resumo:
Surveys for exoplanetary transits are usually limited not by photon noise but rather by the amount of red noise in their data. In particular, although the CoRoT space-based survey data are being carefully scrutinized, significant new sources of systematic noises are still being discovered. Recently, a magnitude-dependant systematic effect was discovered in the CoRoT data by Mazeh et al. and a phenomenological correction was proposed. Here we tie the observed effect to a particular type of effect, and in the process generalize the popular Sysrem algorithm to include external parameters in a simultaneous solution with the unknown effects. We show that a post-processing scheme based on this algorithm performs well and indeed allows for the detection of new transit-like signals that were not previously detected.
Genetic algorithm inversion of the average 1D crustal structure using local and regional earthquakes
Resumo:
Knowing the best 1D model of the crustal and upper mantle structure is useful not only for routine hypocenter determination, but also for linearized joint inversions of hypocenters and 3D crustal structure, where a good choice of the initial model can be very important. Here, we tested the combination of a simple GA inversion with the widely used HYPO71 program to find the best three-layer model (upper crust, lower crust, and upper mantle) by minimizing the overall P- and S-arrival residuals, using local and regional earthquakes in two areas of the Brazilian shield. Results from the Tocantins Province (Central Brazil) and the southern border of the Sao Francisco craton (SE Brazil) indicated an average crustal thickness of 38 and 43 km, respectively, consistent with previous estimates from receiver functions and seismic refraction lines. The GA + HYPO71 inversion produced correct Vp/Vs ratios (1.73 and 1.71, respectively), as expected from Wadati diagrams. Tests with synthetic data showed that the method is robust for the crustal thickness, Pn velocity, and Vp/Vs ratio when using events with distance up to about 400 km, despite the small number of events available (7 and 22, respectively). The velocities of the upper and lower crusts, however, are less well constrained. Interestingly, in the Tocantins Province, the GA + HYPO71 inversion showed a secondary solution (local minimum) for the average crustal thickness, besides the global minimum solution, which was caused by the existence of two distinct domains in the Central Brazil with very different crustal thicknesses. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
In this paper, we compare the performance of two statistical approaches for the analysis of data obtained from the social research area. In the first approach, we use normal models with joint regression modelling for the mean and for the variance heterogeneity. In the second approach, we use hierarchical models. In the first case, individual and social variables are included in the regression modelling for the mean and for the variance, as explanatory variables, while in the second case, the variance at level 1 of the hierarchical model depends on the individuals (age of the individuals), and in the level 2 of the hierarchical model, the variance is assumed to change according to socioeconomic stratum. Applying these methodologies, we analyze a Colombian tallness data set to find differences that can be explained by socioeconomic conditions. We also present some theoretical and empirical results concerning the two models. From this comparative study, we conclude that it is better to jointly modelling the mean and variance heterogeneity in all cases. We also observe that the convergence of the Gibbs sampling chain used in the Markov Chain Monte Carlo method for the jointly modeling the mean and variance heterogeneity is quickly achieved.
Resumo:
Nesse artigo, tem-se o interesse em avaliar diferentes estratégias de estimação de parâmetros para um modelo de regressão linear múltipla. Para a estimação dos parâmetros do modelo foram utilizados dados de um ensaio clínico em que o interesse foi verificar se o ensaio mecânico da propriedade de força máxima (EM-FM) está associada com a massa femoral, com o diâmetro femoral e com o grupo experimental de ratas ovariectomizadas da raça Rattus norvegicus albinus, variedade Wistar. Para a estimação dos parâmetros do modelo serão comparadas três metodologias: a metodologia clássica, baseada no método dos mínimos quadrados; a metodologia Bayesiana, baseada no teorema de Bayes; e o método Bootstrap, baseado em processos de reamostragem.
Resumo:
The purpose of this paper is to develop a Bayesian analysis for nonlinear regression models under scale mixtures of skew-normal distributions. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the error distributions cover both skewness and heavy-tailed distributions such as the skew-t, skew-slash and the skew-contaminated normal distributions. The main advantage of these class of distributions is that they have a nice hierarchical representation that allows the implementation of Markov chain Monte Carlo (MCMC) methods to simulate samples from the joint posterior distribution. In order to examine the robust aspects of this flexible class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. Further, some discussions on the model selection criteria are given. The newly developed procedures are illustrated considering two simulations study, and a real data previously analyzed under normal and skew-normal nonlinear regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this paper we deal with robust inference in heteroscedastic measurement error models Rather than the normal distribution we postulate a Student t distribution for the observed variables Maximum likelihood estimates are computed numerically Consistent estimation of the asymptotic covariance matrices of the maximum likelihood and generalized least squares estimators is also discussed Three test statistics are proposed for testing hypotheses of interest with the asymptotic chi-square distribution which guarantees correct asymptotic significance levels Results of simulations and an application to a real data set are also reported (C) 2009 The Korean Statistical Society Published by Elsevier B V All rights reserved
A bivariate regression model for matched paired survival data: local influence and residual analysis
Resumo:
The use of bivariate distributions plays a fundamental role in survival and reliability studies. In this paper, we consider a location scale model for bivariate survival times based on the proposal of a copula to model the dependence of bivariate survival data. For the proposed model, we consider inferential procedures based on maximum likelihood. Gains in efficiency from bivariate models are also examined in the censored data setting. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and compared to the performance of the bivariate regression model for matched paired survival data. Sensitivity analysis methods such as local and total influence are presented and derived under three perturbation schemes. The martingale marginal and the deviance marginal residual measures are used to check the adequacy of the model. Furthermore, we propose a new measure which we call modified deviance component residual. The methodology in the paper is illustrated on a lifetime data set for kidney patients.
Resumo:
The multivariate skew-t distribution (J Multivar Anal 79:93-113, 2001; J R Stat Soc, Ser B 65:367-389, 2003; Statistics 37:359-363, 2003) includes the Student t, skew-Cauchy and Cauchy distributions as special cases and the normal and skew-normal ones as limiting cases. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis of repeated measures, pretest/post-test data, under multivariate null intercept measurement error model (J Biopharm Stat 13(4):763-771, 2003) where the random errors and the unobserved value of the covariate (latent variable) follows a Student t and skew-t distribution, respectively. The results and methods are numerically illustrated with an example in the field of dentistry.
Resumo:
The purpose of this paper is to develop a Bayesian approach for log-Birnbaum-Saunders Student-t regression models under right-censored survival data. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the considered model. In order to attenuate the influence of the outlying observations on the parameter estimates, we present in this paper Birnbaum-Saunders models in which a Student-t distribution is assumed to explain the cumulative damage. Also, some discussions on the model selection to compare the fitted models are given and case deletion influence diagnostics are developed for the joint posterior distribution based on the Kullback-Leibler divergence. The developed procedures are illustrated with a real data set. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this paper we present a genetic algorithm with new components to tackle capacitated lot sizing and scheduling problems with sequence dependent setups that appear in a wide range of industries, from soft drink bottling to food manufacturing. Finding a feasible solution to highly constrained problems is often a very difficult task. Various strategies have been applied to deal with infeasible solutions throughout the search. We propose a new scheme of classifying individuals based on nested domains to determine the solutions according to the level of infeasibility, which in our case represents bands of additional production hours (overtime). Within each band, individuals are just differentiated by their fitness function. As iterations are conducted, the widths of the bands are dynamically adjusted to improve the convergence of the individuals into the feasible domain. The numerical experiments on highly capacitated instances show the effectiveness of this computational tractable approach to guide the search toward the feasible domain. Our approach outperforms other state-of-the-art approaches and commercial solvers. (C) 2009 Elsevier Ltd. All rights reserved.