929 resultados para Ridge Regression
Forward Stepwise Ridge Regression (FSRR) based variable selection for highly correlated input spaces
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This paper addresses the investment decisions considering the presence of financial constraints of 373 large Brazilian firms from 1997 to 2004, using panel data. A Bayesian econometric model was used considering ridge regression for multicollinearity problems among the variables in the model. Prior distributions are assumed for the parameters, classifying the model into random or fixed effects. We used a Bayesian approach to estimate the parameters, considering normal and Student t distributions for the error and assumed that the initial values for the lagged dependent variable are not fixed, but generated by a random process. The recursive predictive density criterion was used for model comparisons. Twenty models were tested and the results indicated that multicollinearity does influence the value of the estimated parameters. Controlling for capital intensity, financial constraints are found to be more important for capital-intensive firms, probably due to their lower profitability indexes, higher fixed costs and higher degree of property diversification.
Resumo:
A large number of ridge regression estimators have been proposed and used with little knowledge of their true distributions. Because of this lack of knowledge, these estimators cannot be used to test hypotheses or to form confidence intervals.^ This paper presents a basic technique for deriving the exact distribution functions for a class of generalized ridge estimators. The technique is applied to five prominent generalized ridge estimators. Graphs of the resulting distribution functions are presented. The actual behavior of these estimators is found to be considerably different than the behavior which is generally assumed for ridge estimators.^ This paper also uses the derived distributions to examine the mean squared error properties of the estimators. A technique for developing confidence intervals based on the generalized ridge estimators is also presented. ^
Resumo:
One of the difficulties in the practical application of ridge regression is that, for a given data set, it is unknown whether a selected ridge estimator has smaller squared error than the least squares estimator. The concept of the improvement region is defined, and a technique is developed which obtains approximate confidence intervals for the value of ridge k which produces the maximum reduction in mean squared error. Two simulation experiments were conducted to investigate how accurate these approximate confidence intervals might be. ^
Resumo:
Increasingly semiconductor manufacturers are exploring opportunities for virtual metrology (VM) enabled process monitoring and control as a means of reducing non-value added metrology and achieving ever more demanding wafer fabrication tolerances. However, developing robust, reliable and interpretable VM models can be very challenging due to the highly correlated input space often associated with the underpinning data sets. A particularly pertinent example is etch rate prediction of plasma etch processes from multichannel optical emission spectroscopy data. This paper proposes a novel input-clustering based forward stepwise regression methodology for VM model building in such highly correlated input spaces. Max Separation Clustering (MSC) is employed as a pre-processing step to identify a reduced srt of well-conditioned, representative variables that can then be used as inputs to state-of-the-art model building techniques such as Forward Selection Regression (FSR), Ridge regression, LASSO and Forward Selection Ridge Regression (FCRR). The methodology is validated on a benchmark semiconductor plasma etch dataset and the results obtained are compared with those achieved when the state-of-art approaches are applied directly to the data without the MSC pre-processing step. Significant performance improvements are observed when MSC is combined with FSR (13%) and FSRR (8.5%), but not with Ridge Regression (-1%) or LASSO (-32%). The optimal VM results are obtained using the MSC-FSR and MSC-FSRR generated models. © 2012 IEEE.
Resumo:
Este estudo tem como objetivo analisar o desempenho de vários modelos econométricos ao prever Inflação . Iniciamos o trabalho utilizando como base de comparação para todos os modelos a tradicional curva de Phillips que usa a taxa de desemprego como variável explicativa para diferenças de preço. Dentre os modelos analisados temos univariados e bivariados, sendo estes últimos uma curva de Phillips alternativa já que apenas sustitui a variável desemprego por outra variável macroeconômica. Além destes modelos também comparamos o desempenho de previsão de modelos que usam como covariadas uma combinação das previsões dos modelos anteriores (univariados e bivariados). O resultado deste estudo aponta a combinação de modelos por "ridge regression" como uma técnica - dentre as analisadas para combinação de previsões - de menor erro de previsão sempre; sendo alcançado pela combinação da média em apenas um dos casos analisados. No entanto, a combinação de previsões não apresentou melhor resultado que algumas das covariadas testadas em modelos bivariados
Resumo:
Multiple linear regression model plays a key role in statistical inference and it has extensive applications in business, environmental, physical and social sciences. Multicollinearity has been a considerable problem in multiple regression analysis. When the regressor variables are multicollinear, it becomes difficult to make precise statistical inferences about the regression coefficients. There are some statistical methods that can be used, which are discussed in this thesis are ridge regression, Liu, two parameter biased and LASSO estimators. Firstly, an analytical comparison on the basis of risk was made among ridge, Liu and LASSO estimators under orthonormal regression model. I found that LASSO dominates least squares, ridge and Liu estimators over a significant portion of the parameter space for large dimension. Secondly, a simulation study was conducted to compare performance of ridge, Liu and two parameter biased estimator by their mean squared error criterion. I found that two parameter biased estimator performs better than its corresponding ridge regression estimator. Overall, Liu estimator performs better than both ridge and two parameter biased estimator.
Resumo:
Salinity, sodicity, acidity, and phytotoxic levels of chloride (Cl) in subsoils are major constraints to crop production in many soils of north-eastern Australia because they reduce the ability of crop roots to extract water and nutrients from the soil. The complex interactions and correlations among soil properties result in multi-colinearity between soil properties and crop yield that makes it difficult to determine which constraint is the major limitation. We used ridge-regression analysis to overcome colinearity to evaluate the contribution of soil factors and water supply to the variation in the yields of 5 winter crops on soils with various levels and combinations of subsoil constraints in the region. Subsoil constraints measured were soil Cl, electrical conductivity of the saturation extract (ECse), and exchangeable sodium percentage (ESP). The ridge regression procedure selected several of the variables used in a descriptive model, which included in-crop rainfall, plant-available soil water at sowing in the 0.90-1.10 m soil layer, and soil Cl in the 0.90-1.10 m soil layer, and accounted for 77-85% of the variation in the grain yields of the 5 winter crops. Inclusion of ESP of the top soil (0.0-0.10 m soil layer) marginally increased the descriptive capability of the models for bread wheat, barley and durum wheat. Subsoil Cl concentration was found to be an effective substitute for subsoil water extraction. The estimates of the critical levels of subsoil Cl for a 10% reduction in the grain yield were 492 mg cl/kg for chickpea, 662 mg Cl/kg for durum wheat, 854 mg Cl/kg for bread wheat, 980 mg Cl/kg for canola, and 1012 mg Cl/kg for barley, thus suggesting that chickpea and durum wheat were more sensitive to subsoil Cl than bread wheat, barley, and canola.
Resumo:
Environmental heat can reduce conception rates (the proportion of services that result in pregnancy) in lactating dairy cows. The study objectives were to identify periods of exposure relative to the service date in which environmental heat is most closely associated with conception rates, and to assess whether the total time cows are exposed to high environmental heat within each 24-h period is more closely associated with conception rates than is the maximum environmental heat for each 24-h period. A retrospective observational study was conducted in 25 predominantly Holstein-Friesian commercial dairy herds located in Australia. Associations between weather and conception rates were assessed using 16,878 services performed over a 21-mo period. Services were classified as successful based on rectal palpation. Two measures of heat load were defined for each 24-h period: the maximum temperature-humidity index (THI) for the period, and the number of hours in the 24-h period when the THI was >72. Conception rates were reduced when cows were exposed to a high heat load from the day of service to 6 d after service, and in wk -1. Heat loads in wk -3 to -5 were also associated with reduced conception rates. Thus, management interventions to ameliorate the effects of heat load on conception rates should be implemented at least 5 wk before anticipated service and should continue until at least 1 wk after service. High autocorrelations existed between successive daily values in both measures, and associations between day of heat load relative to service day and conception rates differed substantially when ridge regression was used to account for this autocorrelation. This indicates that when assessing the effects of heat load on conception rates, the autocorrelation in heat load between days should be accounted for in analyses. The results suggest that either weekly averages or totals summarizing the daily heat load are adequate to describe heat load when assessing effects on conception rates in lactating dairy cows.
Resumo:
Métodos para GWS; Teoria dos métodos de regressão; Computação do método Random (Ridge) Regression BLUP (RR-BLUP/GWS); Fenótipos corrigidos; Frequências alélicas, variância dos marcadores e herdabilidade; Marcadores codominantes (SNP) ? Modelo genotípico; Marcadores dominantes (DArT) - Modelo genotípico; Marcadores codominantes (SNP) ? Modelo gamético ou alélico; Número de marcadores com efeitos significativos; Populações de estimação, validação e seleção; População de validação e Jacknife; Correlação e regressão entre valores genéticos preditos e fenótipos na população de validação; Análise de associação na GWAS; Software Selegen Genômica: Random (Ridge) Regression BLUP: RR-BLUP/GWS; Exemplo aplicado ao melhoramento do eucalipto.
Resumo:
Semiconductor fabrication involves several sequential processing steps with the result that critical production variables are often affected by a superposition of affects over multiple steps. In this paper a Virtual Metrology (VM) system for early stage measurement of such variables is presented; the VM system seeks to express the contribution to the output variability that is due to a defined observable part of the production line. The outputs of the processed system may be used for process monitoring and control purposes. A second contribution of this work is the introduction of Elastic Nets, a regularization and variable selection technique for the modelling of highly-correlated datasets, as a technique for the development of VM models. Elastic Nets and the proposed VM system are illustrated using real data from a multi-stage etch process used in the fabrication of disk drive read/write heads. © 2013 IEEE.
Resumo:
Virtual metrology (VM) aims to predict metrology values using sensor data from production equipment and physical metrology values of preceding samples. VM is a promising technology for the semiconductor manufacturing industry as it can reduce the frequency of in-line metrology operations and provide supportive information for other operations such as fault detection, predictive maintenance and run-to-run control. Methods with minimal user intervention are required to perform VM in a real-time industrial process. In this paper we propose extreme learning machines (ELM) as a competitive alternative to popular methods like lasso and ridge regression for developing VM models. In addition, we propose a new way to choose the hidden layer weights of ELMs that leads to an improvement in its prediction performance.
Resumo:
As técnicas estatísticas são fundamentais em ciência e a análise de regressão linear é, quiçá, uma das metodologias mais usadas. É bem conhecido da literatura que, sob determinadas condições, a regressão linear é uma ferramenta estatística poderosíssima. Infelizmente, na prática, algumas dessas condições raramente são satisfeitas e os modelos de regressão tornam-se mal-postos, inviabilizando, assim, a aplicação dos tradicionais métodos de estimação. Este trabalho apresenta algumas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, em particular na estimação de modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. A investigação é desenvolvida em três vertentes, nomeadamente na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, na estimação do parâmetro ridge em regressão ridge e, por último, em novos desenvolvimentos na estimação com máxima entropia. Na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, o trabalho desenvolvido evidencia um melhor desempenho dos estimadores de máxima entropia em relação ao estimador de máxima verosimilhança. Este bom desempenho é notório em modelos com poucas observações por estado e em modelos com um grande número de estados, os quais são comummente afetados por colinearidade. Espera-se que a utilização de estimadores de máxima entropia contribua para o tão desejado aumento de trabalho empírico com estas fronteiras de produção. Em regressão ridge o maior desafio é a estimação do parâmetro ridge. Embora existam inúmeros procedimentos disponíveis na literatura, a verdade é que não existe nenhum que supere todos os outros. Neste trabalho é proposto um novo estimador do parâmetro ridge, que combina a análise do traço ridge e a estimação com máxima entropia. Os resultados obtidos nos estudos de simulação sugerem que este novo estimador é um dos melhores procedimentos existentes na literatura para a estimação do parâmetro ridge. O estimador de máxima entropia de Leuven é baseado no método dos mínimos quadrados, na entropia de Shannon e em conceitos da eletrodinâmica quântica. Este estimador suplanta a principal crítica apontada ao estimador de máxima entropia generalizada, uma vez que prescinde dos suportes para os parâmetros e erros do modelo de regressão. Neste trabalho são apresentadas novas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, tendo por base o estimador de máxima entropia de Leuven, a teoria da informação e a regressão robusta. Os estimadores desenvolvidos revelam um bom desempenho em modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. Por último, são apresentados alguns códigos computacionais para estimação com máxima entropia, contribuindo, deste modo, para um aumento dos escassos recursos computacionais atualmente disponíveis.
Resumo:
Les logiciels utilisés sont Splus et R.