29 resultados para Glmm


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a two-step pseudo likelihood estimation technique for generalized linear mixed models with the random effects being correlated between groups. The core idea is to deal with the intractable integrals in the likelihood function by multivariate Taylor's approximation. The accuracy of the estimation technique is assessed in a Monte-Carlo study. An application of it with a binary response variable is presented using a real data set on credit defaults from two Swedish banks. Thanks to the use of two-step estimation technique, the proposed algorithm outperforms conventional pseudo likelihood algorithms in terms of computational time.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A two-component survival mixture model is proposed to analyse a set of ischaemic stroke-specific mortality data. The survival experience of stroke patients after index stroke may be described by a subpopulation of patients in the acute condition and another subpopulation of patients in the chronic phase. To adjust for the inherent correlation of observations due to random hospital effects, a mixture model of two survival functions with random effects is formulated. Assuming a Weibull hazard in both components, an EM algorithm is developed for the estimation of fixed effect parameters and variance components. A simulation study is conducted to assess the performance of the two-component survival mixture model estimators. Simulation results confirm the applicability of the proposed model in a small sample setting. Copyright (C) 2004 John Wiley Sons, Ltd.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A mixture model incorporating long-term survivors has been adopted in the field of biostatistics where some individuals may never experience the failure event under study. The surviving fractions may be considered as cured. In most applications, the survival times are assumed to be independent. However, when the survival data are obtained from a multi-centre clinical trial, it is conceived that the environ mental conditions and facilities shared within clinic affects the proportion cured as well as the failure risk for the uncured individuals. It necessitates a long-term survivor mixture model with random effects. In this paper, the long-term survivor mixture model is extended for the analysis of multivariate failure time data using the generalized linear mixed model (GLMM) approach. The proposed model is applied to analyse a numerical data set from a multi-centre clinical trial of carcinoma as an illustration. Some simulation experiments are performed to assess the applicability of the model based on the average biases of the estimates formed. Copyright (C) 2001 John Wiley & Sons, Ltd.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In many occupational safety interventions, the objective is to reduce the injury incidence as well as the mean claims cost once injury has occurred. The claims cost data within a period typically contain a large proportion of zero observations (no claim). The distribution thus comprises a point mass at 0 mixed with a non-degenerate parametric component. Essentially, the likelihood function can be factorized into two orthogonal components. These two components relate respectively to the effect of covariates on the incidence of claims and the magnitude of claims, given that claims are made. Furthermore, the longitudinal nature of the intervention inherently imposes some correlation among the observations. This paper introduces a zero-augmented gamma random effects model for analysing longitudinal data with many zeros. Adopting the generalized linear mixed model (GLMM) approach reduces the original problem to the fitting of two independent GLMMs. The method is applied to evaluate the effectiveness of a workplace risk assessment teams program, trialled within the cleaning services of a Western Australian public hospital.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El volumen de datos provenientes de experimentos basados en genómica y poteómica es grande y de estructura compleja. Solo a través de un análisis bioinformático/bioestadístico eficiente es posible identificar y caracterizar perfiles de expresión de genes y proteínas que se expresan en forma diferencial bajo distintas condiciones experimentales (CE). El objetivo principal es extender las capacidades computacionales y analíticos de los softwares disponibles de análisis de este tipo de datos, en especial para aquellos aplicables a datos de electroforésis bidimensional diferencial (2D-DIGE). En DIGE el método estadístico más usado es la prueba t de Student cuya aplicación presupone una única fuente de variación y el cumplimiento de ciertos supuestos distribucionales de los datos (como independencia y homogeneidad de varianzas), los cuales no siempre se cumplen en la práctica, pudiendo conllevar a errores en las estimaciones e inferencias de los efectos de interés. Los modelos Generalizados lineales mixtos (GLMM) permiten no solo incorporar los efectos que, se asume, afectan la variación de la respuesta sino que también modelan estructuras de covarianzas y de correlaciones más afines a las que se presentan en la realidad, liberando del supuesto de independencia y de normalidad. Estos modelos, más complejos en esencia, simplificará el análisis debido a la modelización directa de los datos crudos sin la aplicación de transformaciones para lograr distribuciones más simétricas. Produciendo también a una estimación estadísticamente más eficiente de los efectos presentes y por tanto a una detección más certera de los genes/ proteínas involucrados en procesos biológicos de interés. La característica relevante de esta tecnología es que no se conoce a priori cuáles son las proteínas presentes. Estas son identificadas mediante otras técnicas más costosas una vez que se detectó un conjunto de manchas diferenciales sobre los geles 2DE. Por ende disminuir los falsos positivos es fundamental en la identificación de tales manchas ya que inducen a resultados erróneas y asociaciones biológica ficticias. Esto no solo se logrará mediante el desarrollo de técnicas de normalización que incorporen explícitamente las CE, sino también con el desarrollo de métodos que permitan salirse del supuesto de gaussianidad y evaluar otros supuestos distribucionales más adecuados para este tipo de datos. También, se desarrollarán técnicas de aprendizaje automática que mediante optimización de funciones de costo específicas nos permitan identificar el subconjunto de proteínas con mayor potencialidad diagnóstica. Este proyecto tiene una alta componente estadístico/bioinformática, pero creemos que es el campo de aplicación, es decir la genómica y la proteómica, los que mas se beneficiarán con los resultados esperados. Para tal fin se utilizarán diversas bases de datos de distintos experimentos provistos por distintos centros de investigación nacionales e internacionales

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El volumen de datos provenientes de experimentos basados en genómica y poteómica es grande y de estructura compleja. Solo a través de un análisis bioinformático/bioestadístico eficiente es posible identificar y caracterizar perfiles de expresión de genes y proteínas que se expresan en forma diferencial bajo distintas condiciones experimentales (CE). El objetivo principal es extender las capacidades computacionales y analíticos de los softwares disponibles de análisis de este tipo de datos, en especial para aquellos aplicables a datos de electroforésis bidimensional diferencial (2D-DIGE). En DIGE el método estadístico más usado es la prueba t de Student cuya aplicación presupone una única fuente de variación y el cumplimiento de ciertos supuestos distribucionales de los datos (como independencia y homogeneidad de varianzas), los cuales no siempre se cumplen en la práctica, pudiendo conllevar a errores en las estimaciones e inferencias de los efectos de interés. Los modelos Generalizados lineales mixtos (GLMM) permiten no solo incorporar los efectos que, se asume, afectan la variación de la respuesta sino que también modelan estructuras de covarianzas y de correlaciones más afines a las que se presentan en la realidad, liberando del supuesto de independencia y de normalidad. Estos modelos, más complejos en esencia, simplificarán el análisis debido a la modelización directa de los datos crudos sin la aplicación de transformaciones para lograr distribuciones más simétricas,produciendo también a una estimación estadísticamente más eficiente de los efectos presentes y por tanto a una detección más certera de los genes/proteínas involucrados en procesos biológicos de interés. La característica relevante de esta tecnología es que no se conoce a priori cuáles son las proteínas presentes. Estas son identificadas mediante otras técnicas más costosas una vez que se detectó un conjunto de manchas diferenciales sobre los geles 2DE. Por ende disminuir los falsos positivos es fundamental en la identificación de tales manchas ya que inducen a resultados erróneas y asociaciones biológica ficticias. Esto no solo se logrará mediante el desarrollo de técnicas de normalización que incorporen explícitamente las CE, sino también con el desarrollo de métodos que permitan salirse del supuesto de gaussianidad y evaluar otros supuestos distribucionales más adecuados para este tipo de datos. También, se desarrollarán técnicas de aprendizaje automática que mediante optimización de funciones de costo específicas nos permitan identificar el subconjunto de proteínas con mayor potencialidad diagnóstica. Este proyecto tiene un alto componente estadístico/bioinformática, pero creemos que es el campo de aplicación, es decir la genómica y la proteómica, los que más se beneficiarán con los resultados esperados. Para tal fin se utilizarán diversas bases de datos de distintos experimentos provistos por distintos centros de investigación nacionales e internacionales.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Understanding the drivers of population divergence, speciation and species persistence is of great interest to molecular ecology, especially for species-rich radiations inhabiting the world's biodiversity hotspots. The toolbox of population genomics holds great promise for addressing these key issues, especially if genomic data are analysed within a spatially and ecologically explicit context. We have studied the earliest stages of the divergence continuum in the Restionaceae, a species-rich and ecologically important plant family of the Cape Floristic Region (CFR) of South Africa, using the widespread CFR endemic Restio capensis (L.) H.P. Linder & C.R. Hardy as an example. We studied diverging populations of this morphotaxon for plastid DNA sequences and >14 400 nuclear DNA polymorphisms from Restriction site Associated DNA (RAD) sequencing and analysed the results jointly with spatial, climatic and phytogeographic data, using a Bayesian generalized linear mixed modelling (GLMM) approach. The results indicate that population divergence across the extreme environmental mosaic of the CFR is mostly driven by isolation by environment (IBE) rather than isolation by distance (IBD) for both neutral and non-neutral markers, consistent with genome hitchhiking or coupling effects during early stages of divergence. Mixed modelling of plastid DNA and single divergent outlier loci from a Bayesian genome scan confirmed the predominant role of climate and pointed to additional drivers of divergence, such as drift and ecological agents of selection captured by phytogeographic zones. Our study demonstrates the usefulness of population genomics for disentangling the effects of IBD and IBE along the divergence continuum often found in species radiations across heterogeneous ecological landscapes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Aim To evaluate the effects of using distinct alternative sets of climatic predictor variables on the performance, spatial predictions and future projections of species distribution models (SDMs) for rare plants in an arid environment. . Location Atacama and Peruvian Deserts, South America (18º30'S - 31º30'S, 0 - 3 000 m) Methods We modelled the present and future potential distributions of 13 species of Heliotropium sect. Cochranea, a plant group with a centre of diversity in the Atacama Desert. We developed and applied a sequential procedure, starting from climate monthly variables, to derive six alternative sets of climatic predictor variables. We used them to fit models with eight modelling techniques within an ensemble forecasting framework, and derived climate change projections for each of them. We evaluated the effects of using these alternative sets of predictor variables on performance, spatial predictions and projections of SDMs using Generalised Linear Mixed Models (GLMM). Results The use of distinct sets of climatic predictor variables did not have a significant effect on overall metrics of model performance, but had significant effects on present and future spatial predictions. Main conclusion Using different sets of climatic predictors can yield the same model fits but different spatial predictions of current and future species distributions. This represents a new form of uncertainty in model-based estimates of extinction risk that may need to be better acknowledged and quantified in future SDM studies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

1. Digital elevation models (DEMs) are often used in landscape ecology to retrieve elevation or first derivative terrain attributes such as slope or aspect in the context of species distribution modelling. However, DEM-derived variables are scale-dependent and, given the increasing availability of very high-resolution (VHR) DEMs, their ecological relevancemust be assessed for different spatial resolutions. 2. In a study area located in the Swiss Western Alps, we computed VHR DEMs-derived variables related to morphometry, hydrology and solar radiation. Based on an original spatial resolution of 0.5 m, we generated DEM-derived variables at 1, 2 and 4 mspatial resolutions, applying a Gaussian Pyramid. Their associations with local climatic factors, measured by sensors (direct and ambient air temperature, air humidity and soil moisture) as well as ecological indicators derived fromspecies composition, were assessed with multivariate generalized linearmodels (GLM) andmixed models (GLMM). 3. Specific VHR DEM-derived variables showed significant associations with climatic factors. In addition to slope, aspect and curvature, the underused wetness and ruggedness indices modelledmeasured ambient humidity and soilmoisture, respectively. Remarkably, spatial resolution of VHR DEM-derived variables had a significant influence on models' strength, with coefficients of determination decreasing with coarser resolutions or showing a local optimumwith a 2 mresolution, depending on the variable considered. 4. These results support the relevance of using multi-scale DEM variables to provide surrogates for important climatic variables such as humidity, moisture and temperature, offering suitable alternatives to direct measurements for evolutionary ecology studies at a local scale.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cette thèse présente des méthodes de traitement de données de comptage en particulier et des données discrètes en général. Il s'inscrit dans le cadre d'un projet stratégique du CRNSG, nommé CC-Bio, dont l'objectif est d'évaluer l'impact des changements climatiques sur la répartition des espèces animales et végétales. Après une brève introduction aux notions de biogéographie et aux modèles linéaires mixtes généralisés aux chapitres 1 et 2 respectivement, ma thèse s'articulera autour de trois idées majeures. Premièrement, nous introduisons au chapitre 3 une nouvelle forme de distribution dont les composantes ont pour distributions marginales des lois de Poisson ou des lois de Skellam. Cette nouvelle spécification permet d'incorporer de l'information pertinente sur la nature des corrélations entre toutes les composantes. De plus, nous présentons certaines propriétés de ladite distribution. Contrairement à la distribution multidimensionnelle de Poisson qu'elle généralise, celle-ci permet de traiter les variables avec des corrélations positives et/ou négatives. Une simulation permet d'illustrer les méthodes d'estimation dans le cas bidimensionnel. Les résultats obtenus par les méthodes bayésiennes par les chaînes de Markov par Monte Carlo (CMMC) indiquent un biais relatif assez faible de moins de 5% pour les coefficients de régression des moyennes contrairement à ceux du terme de covariance qui semblent un peu plus volatils. Deuxièmement, le chapitre 4 présente une extension de la régression multidimensionnelle de Poisson avec des effets aléatoires ayant une densité gamma. En effet, conscients du fait que les données d'abondance des espèces présentent une forte dispersion, ce qui rendrait fallacieux les estimateurs et écarts types obtenus, nous privilégions une approche basée sur l'intégration par Monte Carlo grâce à l'échantillonnage préférentiel. L'approche demeure la même qu'au chapitre précédent, c'est-à-dire que l'idée est de simuler des variables latentes indépendantes et de se retrouver dans le cadre d'un modèle linéaire mixte généralisé (GLMM) conventionnel avec des effets aléatoires de densité gamma. Même si l'hypothèse d'une connaissance a priori des paramètres de dispersion semble trop forte, une analyse de sensibilité basée sur la qualité de l'ajustement permet de démontrer la robustesse de notre méthode. Troisièmement, dans le dernier chapitre, nous nous intéressons à la définition et à la construction d'une mesure de concordance donc de corrélation pour les données augmentées en zéro par la modélisation de copules gaussiennes. Contrairement au tau de Kendall dont les valeurs se situent dans un intervalle dont les bornes varient selon la fréquence d'observations d'égalité entre les paires, cette mesure a pour avantage de prendre ses valeurs sur (-1;1). Initialement introduite pour modéliser les corrélations entre des variables continues, son extension au cas discret implique certaines restrictions. En effet, la nouvelle mesure pourrait être interprétée comme la corrélation entre les variables aléatoires continues dont la discrétisation constitue nos observations discrètes non négatives. Deux méthodes d'estimation des modèles augmentés en zéro seront présentées dans les contextes fréquentiste et bayésien basées respectivement sur le maximum de vraisemblance et l'intégration de Gauss-Hermite. Enfin, une étude de simulation permet de montrer la robustesse et les limites de notre approche.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Els estudis de supervivència s'interessen pel temps que passa des de l'inici de l'estudi (diagnòstic de la malaltia, inici del tractament,...) fins que es produeix l'esdeveniment d'interès (mort, curació, millora,...). No obstant això, moltes vegades aquest esdeveniment s'observa més d'una vegada en un mateix individu durant el període de seguiment (dades de supervivència multivariant). En aquest cas, és necessari utilitzar una metodologia diferent a la utilitzada en l'anàlisi de supervivència estàndard. El principal problema que l'estudi d'aquest tipus de dades comporta és que les observacions poden no ser independents. Fins ara, aquest problema s'ha solucionat de dues maneres diferents en funció de la variable dependent. Si aquesta variable segueix una distribució de la família exponencial s'utilitzen els models lineals generalitzats mixtes (GLMM); i si aquesta variable és el temps, variable amb una distribució de probabilitat no pertanyent a aquesta família, s'utilitza l'anàlisi de supervivència multivariant. El que es pretén en aquesta tesis és unificar aquests dos enfocs, és a dir, utilitzar una variable dependent que sigui el temps amb agrupacions d'individus o d'observacions, a partir d'un GLMM, amb la finalitat d'introduir nous mètodes pel tractament d'aquest tipus de dades.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A significant challenge in the prediction of climate change impacts on ecosystems and biodiversity is quantifying the sources of uncertainty that emerge within and between different models. Statistical species niche models have grown in popularity, yet no single best technique has been identified reflecting differing performance in different situations. Our aim was to quantify uncertainties associated with the application of 2 complimentary modelling techniques. Generalised linear mixed models (GLMM) and generalised additive mixed models (GAMM) were used to model the realised niche of ombrotrophic Sphagnum species in British peatlands. These models were then used to predict changes in Sphagnum cover between 2020 and 2050 based on projections of climate change and atmospheric deposition of nitrogen and sulphur. Over 90% of the variation in the GLMM predictions was due to niche model parameter uncertainty, dropping to 14% for the GAMM. After having covaried out other factors, average variation in predicted values of Sphagnum cover across UK peatlands was the next largest source of variation (8% for the GLMM and 86% for the GAMM). The better performance of the GAMM needs to be weighed against its tendency to overfit the training data. While our niche models are only a first approximation, we used them to undertake a preliminary evaluation of the relative importance of climate change and nitrogen and sulphur deposition and the geographic locations of the largest expected changes in Sphagnum cover. Predicted changes in cover were all small (generally <1% in an average 4 m2 unit area) but also highly uncertain. Peatlands expected to be most affected by climate change in combination with atmospheric pollution were Dartmoor, Brecon Beacons and the western Lake District.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The objective of this study was to evaluate the use of probit and logit link functions for the genetic evaluation of early pregnancy using simulated data. The following simulation/analysis structures were constructed: logit/logit, logit/probit, probit/logit, and probit/probit. The percentages of precocious females were 5, 10, 15, 20, 25 and 30% and were adjusted based on a change in the mean of the latent variable. The parametric heritability (h²) was 0.40. Simulation and genetic evaluation were implemented in the R software. Heritability estimates (ĥ²) were compared with h² using the mean squared error. Pearson correlations between predicted and true breeding values and the percentage of coincidence between true and predicted ranking, considering the 10% of bulls with the highest breeding values (TOP10) were calculated. The mean ĥ² values were under- and overestimated for all percentages of precocious females when logit/probit and probit/logit models used. In addition, the mean squared errors of these models were high when compared with those obtained with the probit/probit and logit/logit models. Considering ĥ², probit/probit and logit/logit were also superior to logit/probit and probit/logit, providing values close to the parametric heritability. Logit/probit and probit/logit presented low Pearson correlations, whereas the correlations obtained with probit/probit and logit/logit ranged from moderate to high. With respect to the TOP10 bulls, logit/probit and probit/logit presented much lower percentages than probit/probit and logit/logit. The genetic parameter estimates and predictions of breeding values of the animals obtained with the logit/logit and probit/probit models were similar. In contrast, the results obtained with probit/logit and logit/probit were not satisfactory. There is need to compare the estimation and prediction ability of logit and probit link functions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Patterns of spatio-temporal distribution of Brachyura are determined by the interaction among life history traits, inter and intraspecific relationships, as well as by the variation of abiotic factors. This study aimed to characterize patterns of spatio-temporal distribution of Persephona lichtensteinii, Persephona mediterranea and Persephona punctata in two regions of the northern coast of Sao Paulo State, southeastern region of Brazil. Collections were done monthly from July 2001 to June 2003 in Caraguatatuba and Ubatuba, using a shrimp fishery boat equipped with double-rig nets. The patterns of species distribution were tested by means of redundancy analysis (RDA) and generalized linear mixed models (GLMM) in relation to the recorded environmental factors (BT: bottom temperature, BS: bottom salinity, OM: organic matter and granulometry (Phi)). The most influent environmental factor over the species distribution was the Phi, and the ascendant order of influence was P. lichtensteinii, P. punctata and P. mediterranea. The greater abundance of P. mediterranea showed a conservative pattern of distribution for the genus in the sampled region. The greater occurrence of P. punctata and P. lichtensteinii, in distinct transects than those occupied by P. mediterranea, seems to be a strategy to avoid competition among congeneric species, which is related to the substratum specificity.