923 resultados para Generalised Linear Models


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In applied work economists often seek to relate a given response variable y to some causal parameter mu* associated with it. This parameter usually represents a summarization based on some explanatory variables of the distribution of y, such as a regression function, and treating it as a conditional expectation is central to its identification and estimation. However, the interpretation of mu* as a conditional expectation breaks down if some or all of the explanatory variables are endogenous. This is not a problem when mu* is modelled as a parametric function of explanatory variables because it is well known how instrumental variables techniques can be used to identify and estimate mu*. In contrast, handling endogenous regressors in nonparametric models, where mu* is regarded as fully unknown, presents di±cult theoretical and practical challenges. In this paper we consider an endogenous nonparametric model based on a conditional moment restriction. We investigate identification related properties of this model when the unknown function mu* belongs to a linear space. We also investigate underidentification of mu* along with the identification of its linear functionals. Several examples are provided in order to develop intuition about identification and estimation for endogenous nonparametric regression and related models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex diseases, such as cancer, are caused by various genetic and environmental factors, and their interactions. Joint analysis of these factors and their interactions would increase the power to detect risk factors but is statistically. Bayesian generalized linear models using student-t prior distributions on coefficients, is a novel method to simultaneously analyze genetic factors, environmental factors, and interactions. I performed simulation studies using three different disease models and demonstrated that the variable selection performance of Bayesian generalized linear models is comparable to that of Bayesian stochastic search variable selection, an improved method for variable selection when compared to standard methods. I further evaluated the variable selection performance of Bayesian generalized linear models using different numbers of candidate covariates and different sample sizes, and provided a guideline for required sample size to achieve a high power of variable selection using Bayesian generalize linear models, considering different scales of number of candidate covariates. ^ Polymorphisms in folate metabolism genes and nutritional factors have been previously associated with lung cancer risk. In this study, I simultaneously analyzed 115 tag SNPs in folate metabolism genes, 14 nutritional factors, and all possible genetic-nutritional interactions from 1239 lung cancer cases and 1692 controls using Bayesian generalized linear models stratified by never, former, and current smoking status. SNPs in MTRR were significantly associated with lung cancer risk across never, former, and current smokers. In never smokers, three SNPs in TYMS and three gene-nutrient interactions, including an interaction between SHMT1 and vitamin B12, an interaction between MTRR and total fat intake, and an interaction between MTR and alcohol use, were also identified as associated with lung cancer risk. These lung cancer risk factors are worthy of further investigation.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scholars have found that socioeconomic status was one of the key factors that influenced early-stage lung cancer incidence rates in a variety of regions. This thesis examined the association between median household income and lung cancer incidence rates in Texas counties. A total of 254 individual counties in Texas with corresponding lung cancer incidence rates from 2004 to 2008 and median household incomes in 2006 were collected from the National Cancer Institute Surveillance System. A simple linear model and spatial linear models with two structures, Simultaneous Autoregressive Structure (SAR) and Conditional Autoregressive Structure (CAR), were used to link median household income and lung cancer incidence rates in Texas. The residuals of the spatial linear models were analyzed with Moran's I and Geary's C statistics, and the statistical results were used to detect similar lung cancer incidence rate clusters and disease patterns in Texas.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Department of Structural Analysis of the University of Santander has been for a longtime involved in the solution of the country´s practical engineering problems. Some of these have required the use of non-conventional methods of analysis, in order to achieve adequate engineering answers. As an example of the increasing application of non-linear computer codes in the nowadays engineering practice, some cases will be briefly presented. In each case, only the main features of the problem involved and the solution used to solve it will be shown

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Standard factorial designs sometimes may be inadequate for experiments that aim to estimate a generalized linear model, for example, for describing a binary response in terms of several variables. A method is proposed for finding exact designs for such experiments that uses a criterion allowing for uncertainty in the link function, the linear predictor, or the model parameters, together with a design search. Designs are assessed and compared by simulation of the distribution of efficiencies relative to locally optimal designs over a space of possible models. Exact designs are investigated for two applications, and their advantages over factorial and central composite designs are demonstrated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To provide biological insights into transcriptional regulation, a couple of groups have recently presented models relating the promoter DNA-bound transcription factors (TFs) to downstream gene’s mean transcript level or transcript production rates over time. However, transcript production is dynamic in response to changes of TF concentrations over time. Also, TFs are not the only factors binding to promoters; other DNA binding factors (DBFs) bind as well, especially nucleosomes, resulting in competition between DBFs for binding at same genomic location. Additionally, not only TFs, but also some other elements regulate transcription. Within core promoter, various regulatory elements influence RNAPII recruitment, PIC formation, RNAPII searching for TSS, and RNAPII initiating transcription. Moreover, it is proposed that downstream from TSS, nucleosomes resist RNAPII elongation.

Here, we provide a machine learning framework to predict transcript production rates from DNA sequences. We applied this framework in the S. cerevisiae yeast for two scenarios: a) to predict the dynamic transcript production rate during the cell cycle for native promoters; b) to predict the mean transcript production rate over time for synthetic promoters. As far as we know, our framework is the first successful attempt to have a model that can predict dynamic transcript production rates from DNA sequences only: with cell cycle data set, we got Pearson correlation coefficient Cp = 0.751 and coefficient of determination r2 = 0.564 on test set for predicting dynamic transcript production rate over time. Also, for DREAM6 Gene Promoter Expression Prediction challenge, our fitted model outperformed all participant teams, best of all teams, and a model combining best team’s k-mer based sequence features and another paper’s biologically mechanistic features, in terms of all scoring metrics.

Moreover, our framework shows its capability of identifying generalizable fea- tures by interpreting the highly predictive models, and thereby provide support for associated hypothesized mechanisms about transcriptional regulation. With the learned sparse linear models, we got results supporting the following biological insights: a) TFs govern the probability of RNAPII recruitment and initiation possibly through interactions with PIC components and transcription cofactors; b) the core promoter amplifies the transcript production probably by influencing PIC formation, RNAPII recruitment, DNA melting, RNAPII searching for and selecting TSS, releasing RNAPII from general transcription factors, and thereby initiation; c) there is strong transcriptional synergy between TFs and core promoter elements; d) the regulatory elements within core promoter region are more than TATA box and nucleosome free region, suggesting the existence of still unidentified TAF-dependent and cofactor-dependent core promoter elements in yeast S. cerevisiae; e) nucleosome occupancy is helpful for representing +1 and -1 nucleosomes’ regulatory roles on transcription.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received considerably less attention. In this paper, we extend mixtures of g-priors to GLMs by assigning the truncated Compound Confluent Hypergeometric (tCCH) distribution to 1/(1+g) and illustrate how this prior distribution encompasses several special cases of mixtures of g-priors in the literature, such as the Hyper-g, truncated Gamma, Beta-prime, and the Robust prior. Under an integrated Laplace approximation to the likelihood, the posterior distribution of 1/(1+g) is in turn a tCCH distribution, and approximate marginal likelihoods are thus available analytically. We discuss the local geometric properties of the g-prior in GLMs and show that specific choices of the hyper-parameters satisfy the various desiderata for model selection proposed by Bayarri et al, such as asymptotic model selection consistency, information consistency, intrinsic consistency, and measurement invariance. We also illustrate inference using these priors and contrast them to others in the literature via simulation and real examples.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Includes index.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Species occurrence and abundance models are important tools that can be used in biodiversity conservation, and can be applied to predict or plan actions needed to mitigate the environmental impacts of hydropower dams. In this study our objectives were: (i) to model the occurrence and abundance of threatened plant species, (ii) to verify the relationship between predicted occurrence and true abundance, and (iii) to assess whether models based on abundance are more effective in predicting species occurrence than those based on presence–absence data. Individual representatives of nine species were counted within 388 randomly georeferenced plots (10 m × 50 m) around the Barra Grande hydropower dam reservoir in southern Brazil. We modelled their relationship with 15 environmental variables using both occurrence (Generalised Linear Models) and abundance data (Hurdle and Zero-Inflated models). Overall, occurrence models were more accurate than abundance models. For all species, observed abundance was significantly, although not strongly, correlated with the probability of occurrence. This correlation lost significance when zero-abundance (absence) sites were excluded from analysis, but only when this entailed a substantial drop in sample size. The same occurred when analysing relationships between abundance and probability of occurrence from previously published studies on a range of different species, suggesting that future studies could potentially use probability of occurrence as an approximate indicator of abundance when the latter is not possible to obtain. This possibility might, however, depend on life history traits of the species in question, with some traits favouring a relationship between occurrence and abundance. Reconstructing species abundance patterns from occurrence could be an important tool for conservation planning and the management of threatened species, allowing scientists to indicate the best areas for collection and reintroduction of plant germplasm or choose conservation areas most likely to maintain viable populations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Measuring perceptions of customers can be a major problem for marketers of tourism and travel services. Much of the problem is to determine which attributes carry most weight in the purchasing decision. Older travellers weigh many travel features before making their travel decisions. This paper presents a descriptive analysis of neural network methodology and provides a research technique that assesses the weighting of different attributes and uses an unsupervised neural network model to describe a consumer-product relationship. The development of this rich class of models was inspired by the neural architecture of the human brain. These models mathematically emulate the neurophysical structure and decision making of the human brain, and, from a statistical perspective, are closely related to generalised linear models. Artificial neural networks or neural networks are, however, nonlinear and do not require the same restrictive assumptions about the relationship between the independent variables and dependent variables. Using neural networks is one way to determine what trade-offs older travellers make as they decide their travel plans. The sample of this study is from a syndicated data source of 200 valid cases from Western Australia. From senior groups, active learner, relaxed family body, careful participants and elementary vacation were identified and discussed. (C) 2003 Published by Elsevier Science Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Summary Landscapes are continuously changing. Natural forces of change such as heavy rainfall and fires can exert lasting influences on their physical form. However, changes related to human activities have often shaped landscapes more distinctly. In Western Europe, especially modern agricultural practices and the expanse of overbuilt land have left their marks in the landscapes since the middle of the 20th century. In the recent years men realised that mare and more changes that were formerly attributed to natural forces might indirectly be the result of their own action. Perhaps the most striking landscape change indirectly driven by human activity we can witness in these days is the large withdrawal of Alpine glaciers. Together with the landscapes also habitats of animal and plant species have undergone vast and sometimes rapid changes that have been hold responsible for the ongoing loss of biodiversity. Thereby, still little knowledge is available about probable effects of the rate of landscape change on species persistence and disappearance. Therefore, the development and speed of land use/land cover in the Swiss communes between the 1950s and 1990s were reconstructed using 10 parameters from agriculture and housing censuses, and were further correlated with changes in butterfly species occurrences. Cluster analyses were used to detect spatial patterns of change on broad spatial scales. Thereby, clusters of communes showing similar changes or transformation rates were identified for single decades and put into a temporally dynamic sequence. The obtained picture on the changes showed a prevalent replacement of non-intensive agriculture by intensive practices, a strong spreading of urban communes around city centres, and transitions towards larger farm sizes in the mountainous areas. Increasing transformation rates toward more intensive agricultural managements were especially found until the 1970s, whereas afterwards the trends were commonly negative. However, transformation rates representing the development of residential buildings showed positive courses at any time. The analyses concerning the butterfly species showed that grassland species reacted sensitively to the density of livestock in the communes. This might indicate the augmented use of dry grasslands as cattle pastures that show altered plant species compositions. Furthermore, these species also decreased in communes where farms with an agricultural area >5ha have disappeared. The species of the wetland habitats were favoured in communes with smaller fractions of agricultural areas and lower densities of large farms (>10ha) but did not show any correlation to transformation rates. It was concluded from these analyses that transformation rates might influence species disappearance to a certain extent but that states of the environmental predictors might generally outweigh the importance of the corresponding rates. Information on the current distribution of species is evident for nature conservation. Planning authorities that define priority areas for species protection or examine and authorise construction projects need to know about the spatial distribution of species. Hence, models that simulate the potential spatial distribution of species have become important decision tools. The underlying statistical analyses such as the widely used generalised linear models (GLM) often rely on binary species presence-absence data. However, often only species presence data have been colleted, especially for vagrant, rare or cryptic species such as butterflies or reptiles. Modellers have thus introduced randomly selected absence data to design distribution models. Yet, selecting false absence data might bias the model results. Therefore, we investigated several strategies to select more reliable absence data to model the distribution of butterfly species based on historical distribution data. The results showed that better models were obtained when historical data from longer time periods were considered. Furthermore, model performance was additionally increased when long-term data of species that show similar habitat requirements as the modelled species were used. This successful methodological approach was further applied to assess consequences of future landscape changes on the occurrence of butterfly species inhabiting dry grasslands or wetlands. These habitat types have been subjected to strong deterioration in the recent decades, what makes their protection a future mission. Four spatially explicit scenarios that described (i) ongoing land use changes as observed between 1985 and 1997, (ii) liberalised agricultural markets, and (iii) slightly and (iv) strongly lowered agricultural production provided probable directions of landscape change. Current species-environment relationships were derived from a statistical model and used to predict future occurrence probabilities in six major biogeographical regions in Switzerland, comprising the Jura Mountains, the Plateau, the Northern and Southern Alps, as well as the Western and Eastern Central Alps. The main results were that dry grasslands species profited from lowered agricultural production, whereas overgrowth of open areas in the liberalisation scenario might impair species occurrence. The wetland species mostly responded with decreases in their occurrence probabilities in the scenarios, due to a loss of their preferred habitat. Further analyses about factors currently influencing species occurrences confirmed anthropogenic causes such as urbanisation, abandonment of open land, and agricultural intensification. Hence, landscape planning should pay more attention to these forces in areas currently inhabited by these butterfly species to enable sustainable species persistence. In this thesis historical data were intensively used to reconstruct past developments and to make them useful for current investigations. Yet, the availability of historical data and the analyses on broader spatial scales has often limited the explanatory power of the conducted analyses. Meaningful descriptors of former habitat characteristics and abundant species distribution data are generally sparse, especially for fine scale analyses. However, this situation can be ameliorated by broadening the extent of the study site and the used grain size, as was done in this thesis by considering the whole of Switzerland with its communes. Nevertheless, current monitoring projects and data recording techniques are promising data sources that might allow more detailed analyses about effects of long-term species reactions on landscape changes in the near future. This work, however, also showed the value of historical species distribution data as for example their potential to locate still unknown species occurrences. The results might therefore contribute to further research activities that investigate current and future species distributions considering the immense richness of historical distribution data. Résumé Les paysages changent continuellement. Des farces naturelles comme des pluies violentes ou des feux peuvent avoir une influence durable sur la forme du paysage. Cependant, les changements attribués aux activités humaines ont souvent modelé les paysages plus profondément. Depuis les années 1950 surtout, les pratiques agricoles modernes ou l'expansion des surfaces d'habitat et d'infrastructure ont caractérisé le développement du paysage en Europe de l'Ouest. Ces dernières années, l'homme a commencé à réaliser que beaucoup de changements «naturels » pourraient indirectement résulter de ses propres activités. Le changement de paysage le plus apparent dont nous sommes témoins de nos jours est probablement l'immense retraite des glaciers alpins. Avec les paysages, les habitats des animaux et des plantes ont aussi été exposés à des changements vastes et quelquefois rapides, tenus pour coresponsable de la continuelle diminution de la biodiversité. Cependant, nous savons peu des effets probables de la rapidité des changements du paysage sur la persistance et la disparition des espèces. Le développement et la rapidité du changement de l'utilisation et de la couverture du sol dans les communes suisses entre les années 50 et 90 ont donc été reconstruits au moyen de 10 variables issues des recensements agricoles et résidentiels et ont été corrélés avec des changements de présence des papillons diurnes. Des analyses de groupes (Cluster analyses) ont été utilisées pour détecter des arrangements spatiaux de changements à l'échelle de la Suisse. Des communes avec des changements ou rapidités comparables ont été délimitées pour des décennies séparées et ont été placées en séquence temporelle, en rendrent une certaine dynamique du changement. Les résultats ont montré un remplacement répandu d'une agriculture extensive des pratiques intensives, une forte expansion des faubourgs urbains autour des grandes cités et des transitions vers de plus grandes surfaces d'exploitation dans les Alpes. Dans le cas des exploitations agricoles, des taux de changement croissants ont été observés jusqu'aux années 70, alors que la tendance a généralement été inversée dans les années suivantes. Par contre, la vitesse de construction des nouvelles maisons a montré des courbes positives pendant les 50 années. Les analyses sur la réaction des papillons diurnes ont montré que les espèces des prairies sèches supportaient une grande densité de bétail. Il est possible que dans ces communes beaucoup des prairies sèches aient été fertilisées et utilisées comme pâturages, qui ont une autre composition floristique. De plus, les espèces ont diminué dans les communes caractérisées par une rapide perte des fermes avec une surface cultivable supérieure à 5 ha. Les espèces des marais ont été favorisées dans des communes avec peu de surface cultivable et peu de grandes fermes, mais n'ont pas réagi aux taux de changement. Il en a donc été conclu que la rapidité des changements pourrait expliquer les disparitions d'espèces dans certains cas, mais que les variables prédictives qui expriment des états pourraient être des descripteurs plus importants. Des informations sur la distribution récente des espèces sont importantes par rapport aux mesures pour la conservation de la nature. Pour des autorités occupées à définir des zones de protection prioritaires ou à autoriser des projets de construction, ces informations sont indispensables. Les modèles de distribution spatiale d'espèces sont donc devenus des moyens de décision importants. Les méthodes statistiques courantes comme les modèles linéaires généralisés (GLM) demandent des données de présence et d'absence des espèces. Cependant, souvent seules les données de présence sont disponibles, surtout pour les animaux migrants, rares ou cryptiques comme des papillons ou des reptiles. C'est pourquoi certains modélisateurs ont choisi des absences au hasard, avec le risque d'influencer le résultat en choisissant des fausses absences. Nous avons établi plusieurs stratégies, basées sur des données de distribution historique des papillons diurnes, pour sélectionner des absences plus fiables. Les résultats ont démontré que de meilleurs modèles pouvaient être obtenus lorsque les données proviennent des périodes de temps plus longues. En plus, la performance des modèles a pu être augmentée en considérant des données de distribution à long terme d'espèces qui occupent des habitats similaires à ceux de l'espèce cible. Vu le succès de cette stratégie, elle a été utilisée pour évaluer les effets potentiels des changements de paysage futurs sur la distribution des papillons des prairies sèches et marais, deux habitats qui ont souffert de graves détériorations. Quatre scénarios spatialement explicites, décrivant (i) l'extrapolation des changements de l'utilisation de sol tels qu'observés entre 1985 et 1997, (ii) la libéralisation des marchés agricoles, et une production agricole (iii) légèrement amoindrie et (iv) fortement diminuée, ont été utilisés pour générer des directions de changement probables. Les relations actuelles entre la distribution des espèces et l'environnement ont été déterminées par le biais des modèles statistiques et ont été utilisées pour calculer des probabilités de présence selon les scénarios dans six régions biogéographiques majeures de la Suisse, comportant le Jura, le Plateau, les Alpes du Nord, du Sud, centrales orientales et centrales occidentales. Les résultats principaux ont montré que les espèces des prairies sèches pourraient profiter d'une diminution de la production agricole, mais qu'elles pourraient aussi disparaître à cause de l'embroussaillement des terres ouvertes dû à la libéralisation des marchés agricoles. La probabilité de présence des espèces de marais a décrû à cause d'une perte générale des habitats favorables. De plus, les analyses ont confirmé que des causes humaines comme l'urbanisation, l'abandon des terres ouvertes et l'intensification de l'agriculture affectent actuellement ces espèces. Ainsi ces forces devraient être mieux prises en compte lors de planifications paysagères, pour que ces papillons diurnes puissent survivre dans leurs habitats actuels. Dans ce travail de thèse, des données historiques ont été intensivement utilisées pour reconstruire des développements anciens et pour les rendre utiles à des recherches contemporaines. Cependant, la disponibilité des données historiques et les analyses à grande échelle ont souvent limité le pouvoir explicatif des analyses. Des descripteurs pertinents pour caractériser les habitats anciens et des données suffisantes sur la distribution des espèces sont généralement rares, spécialement pour des analyses à des échelles fores. Cette situation peut être améliorée en augmentant l'étendue du site d'étude et la résolution, comme il a été fait dans cette thèse en considérant toute la Suisse avec ses communes. Cependant, les récents projets de surveillance et les techniques de collecte de données sont des sources prometteuses, qui pourraient permettre des analyses plus détaillés sur les réactions à long terme des espèces aux changements de paysage dans le futur. Ce travail a aussi montré la valeur des anciennes données de distribution, par exemple leur potentiel pour aider à localiser des' présences d'espèces encore inconnues. Les résultats peuvent contribuer à des activités de recherche à venir, qui étudieraient les distributions récentes ou futures d'espèces en considérant l'immense richesse des données de distribution historiques.