921 resultados para random coefficient regression model
Resumo:
The paper considers various extended asymmetric multivariate conditional volatility models, and derives appropriate regularity conditions and associated asymptotic theory. This enables checking of internal consistency and allows valid statistical inferences to be drawn based on empirical estimation. For this purpose, we use an underlying vector random coefficient autoregressive process, for which we show the equivalent representation for the asymmetric multivariate conditional volatility model, to derive asymptotic theory for the quasi-maximum likelihood estimator. As an extension, we develop a new multivariate asymmetric long memory volatility model, and discuss the associated asymptotic properties.
Resumo:
Ce mémoire s’intéresse à l’étude du critère de validation croisée pour le choix des modèles relatifs aux petits domaines. L’étude est limitée aux modèles de petits domaines au niveau des unités. Le modèle de base des petits domaines est introduit par Battese, Harter et Fuller en 1988. C’est un modèle de régression linéaire mixte avec une ordonnée à l’origine aléatoire. Il se compose d’un certain nombre de paramètres : le paramètre β de la partie fixe, la composante aléatoire et les variances relatives à l’erreur résiduelle. Le modèle de Battese et al. est utilisé pour prédire, lors d’une enquête, la moyenne d’une variable d’intérêt y dans chaque petit domaine en utilisant une variable auxiliaire administrative x connue sur toute la population. La méthode d’estimation consiste à utiliser une distribution normale, pour modéliser la composante résiduelle du modèle. La considération d’une dépendance résiduelle générale, c’est-à-dire autre que la loi normale donne une méthodologie plus flexible. Cette généralisation conduit à une nouvelle classe de modèles échangeables. En effet, la généralisation se situe au niveau de la modélisation de la dépendance résiduelle qui peut être soit normale (c’est le cas du modèle de Battese et al.) ou non-normale. L’objectif est de déterminer les paramètres propres aux petits domaines avec le plus de précision possible. Cet enjeu est lié au choix de la bonne dépendance résiduelle à utiliser dans le modèle. Le critère de validation croisée sera étudié à cet effet.
Resumo:
Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.
Resumo:
This paper explores the factors associated with the place of death in Burkina Faso, based on mortality data from the Kaya Health and Demographic Surveillance System (Kaya HDSS). A multilevel logistic regression model with random intercept is used to determine the factors associated with the place of death. More than half of the deaths (55%) occur at home. Age, place of residence, distance to the health care centre and cause of death are statistically associated with the place of death. Seniors (50 and over) are more likely to die at home compared to other age grous (66.81 % against 35.9 % for 5-14 years and 44.9 among children under 5 years, p = 0.001). The multivariate results confirm the effect of age, place of residence, living standards quintile and cause of death. The high proportion of deaths occurring at home challenges policy makers in the health care system and calls for programs to adapt the supply of heath care.
Resumo:
The long-term adverse effects on health associated with air pollution exposure can be estimated using either cohort or spatio-temporal ecological designs. In a cohort study, the health status of a cohort of people are assessed periodically over a number of years, and then related to estimated ambient pollution concentrations in the cities in which they live. However, such cohort studies are expensive and time consuming to implement, due to the long-term follow up required for the cohort. Therefore, spatio-temporal ecological studies are also being used to estimate the long-term health effects of air pollution as they are easy to implement due to the routine availability of the required data. Spatio-temporal ecological studies estimate the health impact of air pollution by utilising geographical and temporal contrasts in air pollution and disease risk across $n$ contiguous small-areas, such as census tracts or electoral wards, for multiple time periods. The disease data are counts of the numbers of disease cases occurring in each areal unit and time period, and thus Poisson log-linear models are typically used for the analysis. The linear predictor includes pollutant concentrations and known confounders such as socio-economic deprivation. However, as the disease data typically contain residual spatial or spatio-temporal autocorrelation after the covariate effects have been accounted for, these known covariates are augmented by a set of random effects. One key problem in these studies is estimating spatially representative pollution concentrations in each areal which are typically estimated by applying Kriging to data from a sparse monitoring network, or by computing averages over modelled concentrations (grid level) from an atmospheric dispersion model. The aim of this thesis is to investigate the health effects of long-term exposure to Nitrogen Dioxide (NO2) and Particular matter (PM10) in mainland Scotland, UK. In order to have an initial impression about the air pollution health effects in mainland Scotland, chapter 3 presents a standard epidemiological study using a benchmark method. The remaining main chapters (4, 5, 6) cover the main methodological focus in this thesis which has been threefold: (i) how to better estimate pollution by developing a multivariate spatio-temporal fusion model that relates monitored and modelled pollution data over space, time and pollutant; (ii) how to simultaneously estimate the joint effects of multiple pollutants; and (iii) how to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. Specifically, chapters 4 and 5 are developed to achieve (i), while chapter 6 focuses on (ii) and (iii). In chapter 4, I propose an integrated model for estimating the long-term health effects of NO2, that fuses modelled and measured pollution data to provide improved predictions of areal level pollution concentrations and hence health effects. The air pollution fusion model proposed is a Bayesian space-time linear regression model for relating the measured concentrations to the modelled concentrations for a single pollutant, whilst allowing for additional covariate information such as site type (e.g. roadside, rural, etc) and temperature. However, it is known that some pollutants might be correlated because they may be generated by common processes or be driven by similar factors such as meteorology. The correlation between pollutants can help to predict one pollutant by borrowing strength from the others. Therefore, in chapter 5, I propose a multi-pollutant model which is a multivariate spatio-temporal fusion model that extends the single pollutant model in chapter 4, which relates monitored and modelled pollution data over space, time and pollutant to predict pollution across mainland Scotland. Considering that we are exposed to multiple pollutants simultaneously because the air we breathe contains a complex mixture of particle and gas phase pollutants, the health effects of exposure to multiple pollutants have been investigated in chapter 6. Therefore, this is a natural extension to the single pollutant health effects in chapter 4. Given NO2 and PM10 are highly correlated (multicollinearity issue) in my data, I first propose a temporally-varying linear model to regress one pollutant (e.g. NO2) against another (e.g. PM10) and then use the residuals in the disease model as well as PM10, thus investigating the health effects of exposure to both pollutants simultaneously. Another issue considered in chapter 6 is to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. There are in total four approaches being developed to adjust the exposure uncertainty. Finally, chapter 7 summarises the work contained within this thesis and discusses the implications for future research.
Resumo:
Classical regression analysis can be used to model time series. However, the assumption that model parameters are constant over time is not necessarily adapted to the data. In phytoplankton ecology, the relevance of time-varying parameter values has been shown using a dynamic linear regression model (DLRM). DLRMs, belonging to the class of Bayesian dynamic models, assume the existence of a non-observable time series of model parameters, which are estimated on-line, i.e. after each observation. The aim of this paper was to show how DLRM results could be used to explain variation of a time series of phytoplankton abundance. We applied DLRM to daily concentrations of Dinophysis cf. acuminata, determined in Antifer harbour (French coast of the English Channel), along with physical and chemical covariates (e.g. wind velocity, nutrient concentrations). A single model was built using 1989 and 1990 data, and then applied separately to each year. Equivalent static regression models were investigated for the purpose of comparison. Results showed that most of the Dinophysis cf. acuminata concentration variability was explained by the configuration of the sampling site, the wind regime and tide residual flow. Moreover, the relationships of these factors with the concentration of the microalga varied with time, a fact that could not be detected with static regression. Application of dynamic models to phytoplankton time series, especially in a monitoring context, is discussed.
Resumo:
Pesticide residues in food and environment pose serious health risks to human beings. Plant protection laws, among other things, regulate misuse of agricultural pesticides. Compliance with such laws consequently reduces risks of pesticide residues in food and the environment. Studies were conducted to assess the compliance with plant protection laws among tomato farmers in Mvomero District, Morogoro Region, Tanzania. Compliance was assessed by examining pesticide use practices that are regulated by the Tanzanian Plant Protection Act (PPA) of 1997. A total of 91 tomato farmers were interviewed using a structured questionnaire. Purposive sampling was used in selecting at least 30 respondent farmers from each of the three villages of Msufini, Mlali and Doma in Mvomero District, Morogoro Region. Simple Random Sampling was used to obtain respondents from the sampling frame. Individual and social factors were examined on how they could affect pesticide use practices regulated by the law. Descriptive statistics, mainly frequency, were used to analyze the data while associations between variables were determined using Chi-Square and logistic regression model. The results showed that respondents were generally aware of the existence of laws on agriculture, environment and consumer health, although none of them could name a specific Act. The results revealed further that 94.5% of the farmers read instructions on the pesticides label. However, only 21% used the correct doses of pesticides, 40.7% stored pesticides in special stores, 68.1% used protective gear, while 94.5% always read instructions on the label before using a pesticide product. Training influenced the application rate of pesticide (p < 0.001) while awareness of agricultural laws significantly influenced farmers’ tendency to read information on the labels (p < 0.001). The results showed further that education significantly influenced the use of protective gears by farmers (p = 0.042). Education also significantly affected the manner in which farmers stored pesticide-applying equipment (p = 0.024). Furthermore, farmers’ awareness of environmental laws significantly (p = 0.03) affected farmers’ disposal of empty pesticide containers. Results of this study suggest the need for express provisions on safe use and handling of pesticides and related offences in the Act, and that compliance should be achieved through education rather than coercion. Results also suggest establishment of pesticide disposal mechanisms and structures to reduce unsafe disposal of pesticide containers. It is recommended that farmers should be educated and trained on proper use of pesticides. Farmers’ awareness on laws affecting food, environment and agriculture should be improved.
Resumo:
In this study cross-section data was used to analyze the effect of farmers’ demographic, socioeconomic and institutional setting, market access and physical attributes on the probability and intensity of tissue culture banana (TCB) adoption. The study was carried out between July 2011 and November 2011. Both descriptive (mean, variance, promotions) and regression analysis were used in the analysis. A double hurdle regression model was fitted on the data. Using multistage sampling technique, four counties and eight sub-locations were randomly selected. Using random sampling technique, three hundred and thirty farmers were selected from a list of banana households in the selected sub-locations. The adoption level of tissue culture banana (TCB) was about 32%. The results also revealed that the likelihood of TCB adoption was significantly influenced by: availability of TCB planting material, proportion of banana income to the total farm income, per capita household expenditure and the location of the farmer in Kisii County; while those that significantly influenced the intensity of TCB adoption were: occupation of farmers, family size, labour source, farm size, soil fertility, availability/access of TCB plantlets to farmers, distance to banana market, use of manure in planting banana, access to agricultural extension services and index of TCB/non-TCB banana cultivar attributes which were scored by farmers. Compared to West Pokot County, farmers located in Bungoma County are more significantly and likely to adopt TCB technology. Therefore, the results of the study suggest that the probability of adoption and intensity of the use of TCB should be enhanced. This can be done by taking cognizance of these variables in order to meet the priority needs of the smallholder farmers who were the target group. This would lead to alleviating banana shortage in the region for enhanced food security. Subsequently, actors along the banana value chain are encouraged to target the intervention strategies based on the identified farmer, farm and institutional characteristics for enhanced impact on food provision. Opening up more TCB multiplication centres in different regions will make farmers access the TCB technology for enhanced impact on the target population.
Resumo:
Tourist accommodation expenditure is a widely investigated topic as it represents a major contribution to the total tourist expenditure. The identification of the determinant factors is commonly based on supply-driven applications while little research has been made on important travel characteristics. This paper proposes a demand-driven analysis of tourist accommodation price by focusing on data generated from room bookings. The investigation focuses on modeling the relationship between key travel characteristics and the price paid to book the accommodation. To accommodate the distributional characteristics of the expenditure variable, the analysis is based on the estimation of a quantile regression model. The findings support the econometric approach used and enable the elaboration of relevant managerial implications.
Resumo:
Free-riding behaviors exist in tourism and they should be analyzed from a comprehensive perspective; while the literature has mainly focused on free riders operating in a destination, the destinations themselves might also free ride when they are under the umbrella of a collective brand. The objective of this article is to detect potential free-riding destinations by estimating the contribution of the different individual destinations to their collective brands, from the point of view of consumer perception. We argue that these individual contributions can be better understood by reflecting the various stages that tourists follow to reach their final decision. A hierarchical choice process is proposed in which the following choices are nested (not independent): “whether to buy,” “what collective brand to buy,” and “what individual brand to buy.” A Mixed Logit model confirms this sequence, which permits estimation of individual contributions and detection of free riders.
Resumo:
The Dirichlet process mixture model (DPMM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPMM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. For example, they would not be practical for digital signal processing on embedded hardware, where computational resources are at a serious premium. Here, we develop a simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithm for DPMMs. This algorithm is as simple as DP-means clustering, solves the MAP problem as well as Gibbs sampling, while requiring only a fraction of the computational effort. (For freely available code that implements the MAP-DP algorithm for Gaussian mixtures see http://www.maxlittle.net/.) Unlike related small variance asymptotics (SVA), our method is non-degenerate and so inherits the “rich get richer” property of the Dirichlet process. It also retains a non-degenerate closed-form likelihood which enables out-of-sample calculations and the use of standard tools such as cross-validation. We illustrate the benefits of our algorithm on a range of examples and contrast it to variational, SVA and sampling approaches from both a computational complexity perspective as well as in terms of clustering performance. We demonstrate the wide applicabiity of our approach by presenting an approximate MAP inference method for the infinite hidden Markov model whose performance contrasts favorably with a recently proposed hybrid SVA approach. Similarly, we show how our algorithm can applied to a semiparametric mixed-effects regression model where the random effects distribution is modelled using an infinite mixture model, as used in longitudinal progression modelling in population health science. Finally, we propose directions for future research on approximate MAP inference in Bayesian nonparametrics.
Resumo:
This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow’s milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.
Resumo:
Antecedentes La ectasia corneal post-lasik (ECPL) es una complicación infrecuente, pero devastadora en la cirugía lasik (queratomileusis asistida con éxcimer láser) para el tratamiento de la miopía con o sin astigmatismo. Con base en la tomografía corneal por elevación por imágenes de Scheimpflug (Sistema Pentacam HR, Oculus Wetzlar, Alemania), se propone un novedoso índice acumulativo de riesgo para ser utilizado como prueba diagnóstica de tamizaje y así prevenir esta complicación. Metodología Se realizó un estudio observacional analítico, de corte transversal tipo pruebas diagnósticas, con el fin de evaluar las características operativas del índice NICE teniendo como estándar de referencia el módulo de Belin-Ambrosio (Pentacam HR) utilizando un modelo de regresión logística binaria, tablas de contingencia y estimando el área bajo la curva ROC. Resultados Se evaluaron 361 ojos de los cuales el 59,3% provenían de pacientes de sexo femenino, la edad media global fue de 30 años (RIC 11,0). El modelo logístico binario aplicado se construyó con base en cuatro variables independientes cuantitativas (K2, PAQUI, EP e I-S) y una cualitativa (SEXO), y se determinó su relación con la variable dependiente, NICE (puntaje final). Las variables predictoras fueron estadísticamente significativas clasificando adecuadamente el 92,9% de los ojos evaluados según presencia o ausencia de riesgo. El coeficiente de Nagelkerke fue de 74,4%. Conclusiones El índice acumulativo de riesgo NICE es una herramienta diagnóstica novedosa en la evaluación de candidatos a cirugía refractiva lasik para prevenir la ectasia secundaria.
Resumo:
Logistic regression is a statistical tool widely used for predicting species’ potential distributions starting from presence/absence data and a set of independent variables. However, logistic regression equations compute probability values based not only on the values of the predictor variables but also on the relative proportion of presences and absences in the dataset, which does not adequately describe the environmental favourability for or against species presence. A few strategies have been used to circumvent this, but they usually imply an alteration of the original data or the discarding of potentially valuable information. We propose a way to obtain from logistic regression an environmental favourability function whose results are not affected by an uneven proportion of presences and absences. We tested the method on the distribution of virtual species in an imaginary territory. The favourability models yielded similar values regardless of the variation in the presence/absence ratio. We also illustrate with the example of the Pyrenean desman’s (Galemys pyrenaicus) distribution in Spain. The favourability model yielded more realistic potential distribution maps than the logistic regression model. Favourability values can be regarded as the degree of membership of the fuzzy set of sites whose environmental conditions are favourable to the species, which enables applying the rules of fuzzy logic to distribution modelling. They also allow for direct comparisons between models for species with different presence/absence ratios in the study area. This makes themmore useful to estimate the conservation value of areas, to design ecological corridors, or to select appropriate areas for species reintroductions.
Resumo:
The papers included in this thesis deal with a few aspects of insurance economics that have seldom been dealt with in the applied literature. In the first paper I apply for the first time the tools of the economics of crime to study the determinants of frauds, using data on Italian provinces. The contributions to the literature are manifold: -The price of insuring has a positive correlation with the propensity to defraud -Social norms constraint fraudulent behavior, but their strength is curtailed in economic downturns -I apply a simple extension of the Random Coefficient model, which allows for the presence of time invariant covariates and asymmetries in the impact of the regressors. The second paper assesses how the evolution of macro prudential regulation of insurance companies has been reflected in their equity price. I employ a standard event study methodology, deriving the definition of the “control” and “treatment” groups from what is implied by the regulatory framework. The main results are: -Markets care about the evolution of the legislation. Their perception has shifted from a first positive assessment of a possible implicit “too big to fail” subsidy to a more negative one related to its cost in terms of stricter capital requirement -The size of this phenomenon is positively related to leverage, size and on the geographical location of the insurance companies The third paper introduces a novel methodology to forecast non-life insurance premiums and profitability as function of macroeconomic variables, using the simultaneous equation framework traditionally employed macroeconometric models and a simple theoretical model of insurance pricing to derive a long term relationship between premiums, claims expenses and short term rates. The model is shown to provide a better forecast of premiums and profitability compared with the single equation specifications commonly used in applied analysis.