Biblioteca Digital

961 resultados para Generalized Additive Models

Fitting genetic models to twin data with binary and ordered categorical responses: A comparison of structural equation modelling and Bayesian hierarchical models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We compare Bayesian methodology utilizing free-ware BUGS (Bayesian Inference Using Gibbs Sampling) with the traditional structural equation modelling approach based on another free-ware package, Mx. Dichotomous and ordinal (three category) twin data were simulated according to different additive genetic and common environment models for phenotypic variation. Practical issues are discussed in using Gibbs sampling as implemented by BUGS to fit subject-specific Bayesian generalized linear models, where the components of variation may be estimated directly. The simulation study (based on 2000 twin pairs) indicated that there is a consistent advantage in using the Bayesian method to detect a correct model under certain specifications of additive genetics and common environmental effects. For binary data, both methods had difficulty in detecting the correct model when the additive genetic effect was low (between 10 and 20%) or of moderate range (between 20 and 40%). Furthermore, neither method could adequately detect a correct model that included a modest common environmental effect (20%) even when the additive genetic effect was large (50%). Power was significantly improved with ordinal data for most scenarios, except for the case of low heritability under a true ACE model. We illustrate and compare both methods using data from 1239 twin pairs over the age of 50 years, who were registered with the Australian National Health and Medical Research Council Twin Registry (ATR) and presented symptoms associated with osteoarthritis occurring in joints of the hand.

What do we gain from simplicity versus complexity in species distribution models?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence-environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence-environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building 'under fit' models, having insufficient flexibility to describe observed occurrence-environment relationships, we risk misunderstanding the factors shaping species distributions. By building 'over fit' models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.

Gauss-Seidel Estimation of Generalized Linear Mixed Models with Application to Poisson Modeling of Spatially Varying Disease Rates

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases. This paper proposes a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM to subsetted versions of the data. Additional gains in efficiency are achieved for Poisson models, commonly used in disease mapping problems, because of their special collapsibility property which allows data reduction through summaries. Convergence of the proposed iterative procedure is guaranteed for canonical link functions. The strategy is applied to investigate the relationship between ischemic heart disease, socioeconomic status and age/gender category in New South Wales, Australia, based on outcome data consisting of approximately 33 million records. A simulation study demonstrates the algorithm's reliability in analyzing a data set with 12 million records for a (non-collapsible) logistic regression model.

Generalized isotropic Lipkin-Meshkov-Glick models: ground state entanglement and quantum entropies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a new class of generalized isotropic Lipkin–Meshkov–Glick models with su(m+1) spin and long-range non-constant interactions, whose non-degenerate ground state is a Dicke state of su(m+1) type. We evaluate in closed form the reduced density matrix of a block of Lspins when the whole system is in its ground state, and study the corresponding von Neumann and Rényi entanglement entropies in the thermodynamic limit. We show that both of these entropies scale as a log L when L tends to infinity, where the coefficient a is equal to (m − k)/2 in the ground state phase with k vanishing magnon densities. In particular, our results show that none of these generalized Lipkin–Meshkov–Glick models are critical, since when L-->∞ their Rényi entropy R_q becomes independent of the parameter q. We have also computed the Tsallis entanglement entropy of the ground state of these generalized su(m+1) Lipkin–Meshkov–Glick models, finding that it can be made extensive by an appropriate choice of its parameter only when m-k≥3. Finally, in the su(3) case we construct in detail the phase diagram of the ground state in parameter space, showing that it is determined in a simple way by the weights of the fundamental representation of su(3). This is also true in the su(m+1) case; for instance, we prove that the region for which all the magnon densities are non-vanishing is an (m + 1)-simplex in R^m whose vertices are the weights of the fundamental representation of su(m+1).

Using individual tracking data to validate the predictions of species distribution models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The authors would like to thank the College of Life Sciences of Aberdeen University and Marine Scotland Science which funded CP's PhD project. Skate tagging experiments were undertaken as part of Scottish Government project SP004. We thank Ian Burrett for help in catching the fish and the other fishermen and anglers who returned tags. We thank José Manuel Gonzalez-Irusta for extracting and making available the environmental layers used as environmental covariates in the environmental suitability modelling procedure. We also thank Jason Matthiopoulos for insightful suggestions on habitat utilization metrics as well as Stephen C.F. Palmer, and three anonymous reviewers for useful suggestions to improve the clarity and quality of the manuscript.

Data to Decision in a Dynamic Ocean: Robust Species Distribution Models and Spatial Decision Frameworks

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human use of the oceans is increasingly in conflict with conservation of endangered species. Methods for managing the spatial and temporal placement of industries such as military, fishing, transportation and offshore energy, have historically been post hoc; i.e. the time and place of human activity is often already determined before assessment of environmental impacts. In this dissertation, I build robust species distribution models in two case study areas, US Atlantic (Best et al. 2012) and British Columbia (Best et al. 2015), predicting presence and abundance respectively, from scientific surveys. These models are then applied to novel decision frameworks for preemptively suggesting optimal placement of human activities in space and time to minimize ecological impacts: siting for offshore wind energy development, and routing ships to minimize risk of striking whales. Both decision frameworks relate the tradeoff between conservation risk and industry profit with synchronized variable and map views as online spatial decision support systems.

For siting offshore wind energy development (OWED) in the U.S. Atlantic (chapter 4), bird density maps are combined across species with weights of OWED sensitivity to collision and displacement and 10 km2 sites are compared against OWED profitability based on average annual wind speed at 90m hub heights and distance to transmission grid. A spatial decision support system enables toggling between the map and tradeoff plot views by site. A selected site can be inspected for sensitivity to a cetaceans throughout the year, so as to capture months of the year which minimize episodic impacts of pre-operational activities such as seismic airgun surveying and pile driving.

Routing ships to avoid whale strikes (chapter 5) can be similarly viewed as a tradeoff, but is a different problem spatially. A cumulative cost surface is generated from density surface maps and conservation status of cetaceans, before applying as a resistance surface to calculate least-cost routes between start and end locations, i.e. ports and entrance locations to study areas. Varying a multiplier to the cost surface enables calculation of multiple routes with different costs to conservation of cetaceans versus cost to transportation industry, measured as distance. Similar to the siting chapter, a spatial decisions support system enables toggling between the map and tradeoff plot view of proposed routes. The user can also input arbitrary start and end locations to calculate the tradeoff on the fly.

Essential to the input of these decision frameworks are distributions of the species. The two preceding chapters comprise species distribution models from two case study areas, U.S. Atlantic (chapter 2) and British Columbia (chapter 3), predicting presence and density, respectively. Although density is preferred to estimate potential biological removal, per Marine Mammal Protection Act requirements in the U.S., all the necessary parameters, especially distance and angle of observation, are less readily available across publicly mined datasets.

In the case of predicting cetacean presence in the U.S. Atlantic (chapter 2), I extracted datasets from the online OBIS-SEAMAP geo-database, and integrated scientific surveys conducted by ship (n=36) and aircraft (n=16), weighting a Generalized Additive Model by minutes surveyed within space-time grid cells to harmonize effort between the two survey platforms. For each of 16 cetacean species guilds, I predicted the probability of occurrence from static environmental variables (water depth, distance to shore, distance to continental shelf break) and time-varying conditions (monthly sea-surface temperature). To generate maps of presence vs. absence, Receiver Operator Characteristic (ROC) curves were used to define the optimal threshold that minimizes false positive and false negative error rates. I integrated model outputs, including tables (species in guilds, input surveys) and plots (fit of environmental variables, ROC curve), into an online spatial decision support system, allowing for easy navigation of models by taxon, region, season, and data provider.

For predicting cetacean density within the inner waters of British Columbia (chapter 3), I calculated density from systematic, line-transect marine mammal surveys over multiple years and seasons (summer 2004, 2005, 2008, and spring/autumn 2007) conducted by Raincoast Conservation Foundation. Abundance estimates were calculated using two different methods: Conventional Distance Sampling (CDS) and Density Surface Modelling (DSM). CDS generates a single density estimate for each stratum, whereas DSM explicitly models spatial variation and offers potential for greater precision by incorporating environmental predictors. Although DSM yields a more relevant product for the purposes of marine spatial planning, CDS has proven to be useful in cases where there are fewer observations available for seasonal and inter-annual comparison, particularly for the scarcely observed elephant seal. Abundance estimates are provided on a stratum-specific basis. Steller sea lions and harbour seals are further differentiated by ‘hauled out’ and ‘in water’. This analysis updates previous estimates (Williams & Thomas 2007) by including additional years of effort, providing greater spatial precision with the DSM method over CDS, novel reporting for spring and autumn seasons (rather than summer alone), and providing new abundance estimates for Steller sea lion and northern elephant seal. In addition to providing a baseline of marine mammal abundance and distribution, against which future changes can be compared, this information offers the opportunity to assess the risks posed to marine mammals by existing and emerging threats, such as fisheries bycatch, ship strikes, and increased oil spill and ocean noise issues associated with increases of container ship and oil tanker traffic in British Columbia’s continental shelf waters.

Starting with marine animal observations at specific coordinates and times, I combine these data with environmental data, often satellite derived, to produce seascape predictions generalizable in space and time. These habitat-based models enable prediction of encounter rates and, in the case of density surface models, abundance that can then be applied to management scenarios. Specific human activities, OWED and shipping, are then compared within a tradeoff decision support framework, enabling interchangeable map and tradeoff plot views. These products make complex processes transparent for gaming conservation, industry and stakeholders towards optimal marine spatial management, fundamental to the tenets of marine spatial planning, ecosystem-based management and dynamic ocean management.

Double generalized linear model for tissue culture proportion data: a Bayesian perspective

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Joint generalized linear models and double generalized linear models (DGLMs) were designed to model outcomes for which the variability can be explained using factors and/or covariates. When such factors operate, the usual normal regression models, which inherently exhibit constant variance, will under-represent variation in the data and hence may lead to erroneous inferences. For count and proportion data, such noise factors can generate a so-called overdispersion effect, and the use of binomial and Poisson models underestimates the variability and, consequently, incorrectly indicate significant effects. In this manuscript, we propose a DGLM from a Bayesian perspective, focusing on the case of proportion data, where the overdispersion can be modeled using a random effect that depends on some noise factors. The posterior joint density function was sampled using Monte Carlo Markov Chain algorithms, allowing inferences over the model parameters. An application to a data set on apple tissue culture is presented, for which it is shown that the Bayesian approach is quite feasible, even when limited prior information is available, thereby generating valuable insight for the researcher about its experimental results.

Dynamical behaviour on the parameter space: new populational growth models proportional to beta densities

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present new populational growth models, generalized logistic models which are proportional to beta densities with shape parameters p and 2, where p > 1, with Malthusian parameter r. The complex dynamical behaviour of these models is investigated in the parameter space (r, p), in terms of topological entropy, using explicit methods, when the Malthusian parameter r increases. This parameter space is split into different regions, according to the chaotic behaviour of the models.

The effects of air pollution on cardiovascular diseases: lag structures

Relevância:

90.00% 90.00%

Publicador:

Resumo:

OBJECTIVE: To assess the lag structure between air pollution exposure and elderly cardiovascular diseases hospital admissions, by gender. METHODS: Health data of people aged 64 years or older was stratified by gender in São Paulo city, Southeastern Brazil, from 1996 to 2001. Daily levels of air pollutants (CO, PM10, O3, NO2, and SO2) , minimum temperature, and relative humidity were also analyzed. It were fitted generalized additive Poisson regressions and used constrained distributed lag models adjusted for long time trend, weekdays, weather and holidays to assess the lagged effects of air pollutants on hospital admissions up to 20 days after exposure. RESULTS: Interquartile range increases in PM10 (26.21 mug/m³) and SO2 (10.73 mug/m³) were associated with 3.17% (95% CI: 2.09-4.25) increase in congestive heart failure and 0.89% (95% CI: 0.18-1.61) increase in total cardiovascular diseases at lag 0, respectively. Effects were higher among female group for most of the analyzed outcomes. Effects of air pollutants for different outcomes and gender groups were predominately acute and some "harvesting" were found. CONLUSIONS: The results show that cardiovascular diseases in São Paulo are strongly affected by air pollution.

Osteoporotic hip fractures: Bisphosphonates sales and observed turning point in trend. A population-based retrospective study

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The aim is to examine the temporal trends of hip fracture incidence in Portugal by sex and age groups, and explore the relation with anti-osteoporotic medication. From the National Hospital Discharge Database, we selected from 1st January 2000 to 31st December 2008, 77,083 hospital admissions (77.4% women) caused by osteoporotic hip fractures (low energy, patients over 49 years-age), with diagnosis codes 820.x of ICD 9-CM. The 2001 Portuguese population was used as standard to calculate direct age-standardized incidence rates (ASIR) (100,000 inhabitants). Generalized additive and linear models were used to evaluate and quantify temporal trends of age specific rates (AR), by sex. We identified 2003 as a turning point in the trend of ASIR of hip fractures in women. After 2003, the ASIR in women decreased on average by 10.3 cases/100,000 inhabitants, 95% CI (− 15.7 to − 4.8), per 100,000 anti-osteoporotic medication packages sold. For women aged 65–69 and 75–79 we identified the same turning point. However, for women aged over 80, the year 2004 marked a change in the trend, from an increase to a decrease. Among the population aged 70–74 a linear decrease of incidence rate (95% CI) was observed in both sexes, higher for women: − 28.0% (− 36.2 to − 19.5) change vs − 18.8%, (− 32.6 to − 2.3). The abrupt turning point in the trend of ASIR of hip fractures in women is compatible with an intervention, such as a medication. The trends were different according to gender and age group, but compatible with the pattern of bisphosphonates sales.

The ghost of past species occurrence: improving species distribution models for presence-only data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

1. Model-based approaches have been used increasingly in conservation biology over recent years. Species presence data used for predictive species distribution modelling are abundant in natural history collections, whereas reliable absence data are sparse, most notably for vagrant species such as butterflies and snakes. As predictive methods such as generalized linear models (GLM) require absence data, various strategies have been proposed to select pseudo-absence data. However, only a few studies exist that compare different approaches to generating these pseudo-absence data. 2. Natural history collection data are usually available for long periods of time (decades or even centuries), thus allowing historical considerations. However, this historical dimension has rarely been assessed in studies of species distribution, although there is great potential for understanding current patterns, i.e. the past is the key to the present. 3. We used GLM to model the distributions of three 'target' butterfly species, Melitaea didyma, Coenonympha tullia and Maculinea teleius, in Switzerland. We developed and compared four strategies for defining pools of pseudo-absence data and applied them to natural history collection data from the last 10, 30 and 100 years. Pools included: (i) sites without target species records; (ii) sites where butterfly species other than the target species were present; (iii) sites without butterfly species but with habitat characteristics similar to those required by the target species; and (iv) a combination of the second and third strategies. Models were evaluated and compared by the total deviance explained, the maximized Kappa and the area under the curve (AUC). 4. Among the four strategies, model performance was best for strategy 3. Contrary to expectations, strategy 2 resulted in even lower model performance compared with models with pseudo-absence data simulated totally at random (strategy 1). 5. Independent of the strategy model, performance was enhanced when sites with historical species presence data were not considered as pseudo-absence data. Therefore, the combination of strategy 3 with species records from the last 100 years achieved the highest model performance. 6. Synthesis and applications. The protection of suitable habitat for species survival or reintroduction in rapidly changing landscapes is a high priority among conservationists. Model-based approaches offer planning authorities the possibility of delimiting priority areas for species detection or habitat protection. The performance of these models can be enhanced by fitting them with pseudo-absence data relying on large archives of natural history collection species presence data rather than using randomly sampled pseudo-absence data.

Time-series regression models to study the short-term effects of environmental factors on health

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants

El proyecto EMECAS: protocolo del estudio multicéntrico en España de los efectos a corto plazo de la contaminación atmosférica sobre la salud.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The EMECAM Project demonstrated the short-term effect of air pollution on the death rate in 14 cities in Spain throughout the 1990-1995 period. The Spanish Multicentre Study on Health Effects of Air Pollution (EMECAS) is broadening these objectives by incorporating more recent data, information on hospital disease admissions and totaling 16 Spanish cities. This is an ecological time series study in which the response variables are the daily deaths and the emergency hospitalizations due to circulatory system diseases and respiratory diseases among the residents in each city. Pollutants analyses: suspended particles, SO2, NO2, CO and O3. Control variables: meteorological, calendar, seasonality and influenza trend and incidence. Statistical analysis: estimate of the association in each city by means of the construction of generalized additive Poisson regression models and metanalysis for obtaining combined estimators. The EMECAS Project began with the creation of three working groups (Exposure, Epidemiology and Analysis Methodology) which defined the protocol. The average levels of pollutants were below those established under the current regulations for sulfur dioxide, carbon monoxide and ozone. The NO2 and PM10 values were around those established under the regulations (40 mg/m3). This is the first study of the relationship between air pollution and disease rate among one group of Spanish cities. The pollution levels studied are moderate for some pollutants, although for others, especially NO2 and particles, these levels could entail a problem with regard to complying with the regulations in force.

Relaciones ontogénicas y espacio-temporales en la dieta del calamar gigante (Dosidicus gigas) en Perú, utilizando un Modelo Aditivo Generalizado

Relevância:

90.00% 90.00%

Publicador:

Resumo:

El calamar gigante Dosidicus gigas (d'Orbigny, 1835) es un depredador importante en el ecosistema del Perú. Se postula que el papel del calamar gigante varía teniendo en cuenta la talla, tiempo, hora, temperatura y distribución espacial. Para comprobar esta hipótesis se aplicó un modelo aditivo generalizado (GAM) en datos biológicos de alimentación de 4178 calamares gigantes capturados por la flota industrial pesquera a lo largo del litoral peruano (3ºS a 18ºS) desde 2 a 299 millas náuticas (mn) de distancia a la costa desde el año 2004 a 2009 realizados por el Laboratorio de Ecología Trófica del Instituto del Mar del Perú (IMARPE). La talla de los calamares estudiados fluctuó entre 14 y 112 cm de longitud de manto (LM). En total 43 item-presa fueron registrados, los grupos más importantes fueron los cefalópodos (Dosidicus gigas), Teleosteii (Photichthyidae, Myctophidae y Nomeidae) y Malacostraca crustáceos (Euphausiidae). Las presas principales fueron D. gigas (indicando canibalismo) en términos gravimétricos (% W=35.4), los otros cephalopodos en frecuencia de ocurrencia (FO=14.4), y los eufáusidos en términos de abundancia relativa (% N=62.2). Estos resultados reflejan una alta variabilidad de la dieta, y un espectro trófico similar en comparación con otras latitudes en ambos hemisferios (México y Chile). Los modelos GAM muestran que todas las variables predictoras fueron significativas en relación a la variable respuesta llenura estomacal (p <0.0001). La llenura estomacal fue mayor en los individuos juveniles, también durante la noche hubo mayor consumo, mientras no se reflejaron tendencias en la alimentación con relación a la temperatura superficial del mar (TSM), pero espacialmente se observan cambios en la dieta, aumentando el porcentaje de llenura a medida que esta especie se aleja de la costa. Por lo tanto se concluye que la dieta del calamar gigante depende de la talla y su distribución espacio-temporal.

Regression analysis of spatial data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.

«
1
2
3
4
5
6
7
8
...
64
65
»