980 resultados para Robust regression
Resumo:
Globalization of dairy cattle breeding has created a need for international sire proofs. Some early methods for converting proofs from one population to another are based on simple linear regression. An alternative robust regression method based on the t-distribution is presented, and maximum likelihood and Bayesian techniques for analysis are described, including the situation in which some proofs are missing. Procedures were used to investigate the relationship between Holstein sire proofs obtained by two Uruguayan genetic evaluation programs. The results suggest that conversion equations developed from data including only sires having proofs in both populations can lead to distorted results, relative to estimates obtained using techniques for incomplete data. There was evidence of non-normality of regression residuals, which constitutes an additional source of bias. A robust estimator may not solve all problems, but can provide simple conversion equations that are less sensitive to outlying proofs and to departures from assumptions.
Resumo:
robreg provides a number of robust estimators for linear regression models. Among them are the high breakdown-point and high efficiency MM-estimator, the Huber and bisquare M-estimator, and the S-estimator, each supporting classic or robust standard errors. Furthermore, basic versions of the LMS/LQS (least median of squares) and LTS (least trimmed squares) estimators are provided. Note that the moremata package, also available from SSC, is required.
Resumo:
Fractal and multifractal are concepts that have grown increasingly popular in recent years in the soil analysis, along with the development of fractal models. One of the common steps is to calculate the slope of a linear fit commonly using least squares method. This shouldn?t be a special problem, however, in many situations using experimental data the researcher has to select the range of scales at which is going to work neglecting the rest of points to achieve the best linearity that in this type of analysis is necessary. Robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and non-parametric methods. In this method we don?t have to assume that the outlier point is simply an extreme observation drawn from the tail of a normal distribution not compromising the validity of the regression results. In this work we have evaluated the capacity of robust regression to select the points in the experimental data used trying to avoid subjective choices. Based on this analysis we have developed a new work methodology that implies two basic steps: ? Evaluation of the improvement of linear fitting when consecutive points are eliminated based on R pvalue. In this way we consider the implications of reducing the number of points. ? Evaluation of the significance of slope difference between fitting with the two extremes points and fitted with the available points. We compare the results applying this methodology and the common used least squares one. The data selected for these comparisons are coming from experimental soil roughness transect and simulated based on middle point displacement method adding tendencies and noise. The results are discussed indicating the advantages and disadvantages of each methodology.
Resumo:
We use sunspot group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups RB above a variable cut-off threshold of observed total whole-spot area (uncorrected for foreshortening) to simulate what a lower acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number RA using a variety of regression techniques. It is found that a very high correlation between RA and RB (rAB > 0.98) does not prevent large errors in the intercalibration (for example sunspot maximum values can be over 30 % too large even for such levels of rAB). In generating the backbone sunspot number (RBB), Svalgaard and Schatten (2015, this issue) force regression fits to pass through the scatter plot origin which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile (“Q Q”) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.
Resumo:
2002 Mathematics Subject Classification: 62J05, 62G35.
Resumo:
Context: Anti-Müllerian hormone (AMH) concentration reflects ovarian aging and is argued to be a useful predictor of age at menopause (AMP). It is hypothesized that AMH falling below a critical threshold corresponds to follicle depletion, which results in menopause. With this threshold, theoretical predictions of AMP can be made. Comparisons of such predictions with observed AMP from population studies support the role for AMH as a forecaster of menopause. Objective: The objective of the study was to investigate whether previous relationships between AMH and AMP are valid using a much larger data set. Setting: AMH was measured in 27 563 women attending fertility clinics. Study Design: From these data a model of age-related AMH change was constructed using a robust regression analysis. Data on AMP from subfertile women were obtained from the population-based Prospect-European Prospective Investigation into Cancer and Nutrition (Prospect- EPIC) cohort (n � 2249). By constructing a probability distribution of age at which AMH falls below a critical threshold and fitting this to Prospect-EPIC menopausal age data using maximum likelihood, such a threshold was estimated. Main Outcome: The main outcome was conformity between observed and predicted AMP. Results: To get a distribution of AMH-predicted AMP that fit the Prospect-EPIC data, we found the critical AMH threshold should vary among women in such a way that women with low age-specific AMH would have lower thresholds, whereas women with high age-specific AMH would have higher thresholds (mean 0.075 ng/mL; interquartile range 0.038–0.15 ng/mL). Such a varying AMH threshold for menopause is a novel and biologically plausible finding. AMH became undetectable (�0.2 ng/mL) approximately 5 years before the occurrence of menopause, in line with a previous report. Conclusions: The conformity of the observed and predicted distributions of AMP supports the hypothesis that declining population averages of AMH are associated with menopause, making AMH an excellent candidate biomarker for AMP prediction. Further research will help establish the accuracy of AMH levels to predict AMP within individuals.
Resumo:
We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice.
Resumo:
基本矩阵作为分析两视图对极几何的有力工具,在视觉领域中占用重要的地位。分析了传统鲁棒方法在基本矩阵的求解问题中存在的不足,引入了稳健回归分析中的LQS方法,并结合Bucket分割技术,提出一种鲁棒估计基本矩阵的新方法,克服了RANSAC方法和LMedS方法的缺陷。模拟数据和真实图像实验结果表明,本文方法具有更高的鲁棒性和精确度。
Resumo:
The flower industry has a reputation for heavy usage of toxic chemicals and polluting the environment, enormous consumption of water, and poor working condition and low wage level in various parts of the world. It is unfortunate that this industry is adamant to change and repeating the same mistakes in Ethiopia. Because of this, - there is a growing concern among the general public and the international community about sustainability of the Ethiopian flower industry. Consequently, working conditions in the flower industry, impacts of wage income on the livelihoods of employees, coping strategies of low wage flower farm workers, impacts of flower farms on the livelihoods of local people and environmental pollution and conflict, were analysed. Both qualitative and quantitative research methods were employed. Four quantitative data sets: labour practice, employees’ income and expenditure, displaced household, and flower grower views survey were collected between 2010 and 2012. Robust regression to identify the determinants of wage levels, and Multinomial logit to identify the determinants of coping strategies of flower farm workers and displaced households were employed. The findings show the working conditions in flower farms are characterized by low wages, job insecurity and frequent violation of employees’ rights, and poor safety measures. To ensure survival of their family, land dispossessed households adopt a wide range of strategies including reduction in food consumption, sharing oxen, renting land, share cropping, and shifting staple food crops. Most experienced scarcity of water resources, lack of grazing areas, death of herds and reduced numbers of livestock due to water source pollution. Despite the Ethiopian government investment in attracting and creating conducive environment for investors, not much was accomplished when it comes to enforcing labour laws and environmental policies. Flower farm expansion in Ethiopia, as it is now, can be viewed as part of the global land and water grab and is not all inclusive and sustainable. Several recommendations are made to improve working conditions, maximize the benefits of flower industry to the society, and to the country at large.
Resumo:
In this work, we address the thermal properties of selected members of a
homologous series of alkyltriethylammonium bisf(trifluoromethyl)sulfonylgimide ionic
liquids. Their phase and glass transition behavior, as well as their standard isobaric heat
capacities at 298.15 K, were studied using differential scanning calorimetry (DSC),
whereas their decomposition temperature was determined by thermal gravimetry analysis.
DSC was further used to measure standard molar heat capacities of the studied ionic liquids
and standard molar heat capacity as a function of temperature for hexyltriethylammonium,
octyltriethylammonium, and dodecyltriethylammonium bisf(trifluoromethyl)sulfonylgimide
ionic liquids. Based on the data obtained, we discuss the influence of the alkyl chain
length of the cation on the studied ionic liquids on the measured properties. Using viscosity
data obtained in a previous work, the liquid fragility of the ionic liquids is then discussed.
Viscosity data were correlated by the VTF equation using a robust regression along a
gnostic influence function. In this way, more reliable VTF model parameters were obtained than in our previous work and a good estimate of the liquid fragility of the ionic liquids was made.
Resumo:
As técnicas estatísticas são fundamentais em ciência e a análise de regressão linear é, quiçá, uma das metodologias mais usadas. É bem conhecido da literatura que, sob determinadas condições, a regressão linear é uma ferramenta estatística poderosíssima. Infelizmente, na prática, algumas dessas condições raramente são satisfeitas e os modelos de regressão tornam-se mal-postos, inviabilizando, assim, a aplicação dos tradicionais métodos de estimação. Este trabalho apresenta algumas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, em particular na estimação de modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. A investigação é desenvolvida em três vertentes, nomeadamente na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, na estimação do parâmetro ridge em regressão ridge e, por último, em novos desenvolvimentos na estimação com máxima entropia. Na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, o trabalho desenvolvido evidencia um melhor desempenho dos estimadores de máxima entropia em relação ao estimador de máxima verosimilhança. Este bom desempenho é notório em modelos com poucas observações por estado e em modelos com um grande número de estados, os quais são comummente afetados por colinearidade. Espera-se que a utilização de estimadores de máxima entropia contribua para o tão desejado aumento de trabalho empírico com estas fronteiras de produção. Em regressão ridge o maior desafio é a estimação do parâmetro ridge. Embora existam inúmeros procedimentos disponíveis na literatura, a verdade é que não existe nenhum que supere todos os outros. Neste trabalho é proposto um novo estimador do parâmetro ridge, que combina a análise do traço ridge e a estimação com máxima entropia. Os resultados obtidos nos estudos de simulação sugerem que este novo estimador é um dos melhores procedimentos existentes na literatura para a estimação do parâmetro ridge. O estimador de máxima entropia de Leuven é baseado no método dos mínimos quadrados, na entropia de Shannon e em conceitos da eletrodinâmica quântica. Este estimador suplanta a principal crítica apontada ao estimador de máxima entropia generalizada, uma vez que prescinde dos suportes para os parâmetros e erros do modelo de regressão. Neste trabalho são apresentadas novas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, tendo por base o estimador de máxima entropia de Leuven, a teoria da informação e a regressão robusta. Os estimadores desenvolvidos revelam um bom desempenho em modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. Por último, são apresentados alguns códigos computacionais para estimação com máxima entropia, contribuindo, deste modo, para um aumento dos escassos recursos computacionais atualmente disponíveis.
Resumo:
Objetivo: Determinar la ocurrencia de reacciones adversas a medicamentos (RAM) como causa de ingreso a una unidad de cuidado intermedio de un hospital universitario. Materiales y Métodos: Se revisaron las historias clínicas de los pacientes admitidos a la Sala de Emergencias – Cuidado Intermedio (SALEM) entre septiembre y diciembre de 2012 que cumplieron los criterios de inclusión y se detectaron los casos sospechosos de reacción adversa a medicamento (RAM) que posteriormente fueron evaluados por cuatro investigadores respecto a la causalidad a través del Algoritmo de Naranjo, prevenibilidad usando los criterios de Shumock y Thornton y la clasificación clínica mediante el empleo del sistema DoTS. Resultados: Se encontraron 96 pacientes que presentaron 108 casos de RAM. Las RAM más frecuentes fueron las arritmias y la hemorragia de vías digestivas altas (12.04%), 20.3% de los casos correspondieron a fallos terapéuticos, y, los medicamentos mayormente asociados fueron el ácido acetil salicílico (15.74%) y el losartán (10.19%). 46 casos fueron catalogados como posibles y uno solo como definitivo. Usando la clasificación DoTS se estableció que en el 82.4% de los casos la dosis era colateral (dentro del rango de dosis terapéutica), 89.8% fueron independientes del tiempo, y entre los factores mayormente asociados a susceptibilidad a la RAM estuvieron las comorbilidades (41.7%) y la edad (49%). 44% de las RAM fueron prevenibles. Conclusión: Las RAM son una causa de ingreso no despreciable en una unidad de cuidado intermedio para las cuales existen diferentes sistemas de evaluación, y una cantidad significativa de ellas es prevenible. Se requieren más estudios a nivel nacional para evaluar la incidencia de estas y establecer estándares de clasificación y medidas para mitigar su efecto.