948 resultados para generalized additive model
Resumo:
O objetivo desse estudo foi caracterizar a composição florística e a estrutura do componente arbóreo em fragmento de Floresta Ombrófila Mista Alto-Montana e avaliar a influência do efeito de borda sobre a organização, estrutura, riqueza e diversidade de espécies. Foram alocadas 50 parcelas permanentes de 10 x 20 m, divididas em cinco transeções distanciadas, no mínimo, 100 m entre si, em um fragmento florestal, no município de Bom Jardim da Serra - SC. As árvores com circunferência ≥ 15,7 cm na altura do peito (CAP) foram mensuradas (CAP e altura total), identificadas e classificadas quanto às guildas de regeneração (pioneiras, climácicas exigentes em luz e climácicas tolerantes à sombra). Os dados foram analisados por meio dos índices de valor de importância (IVI), NMDS (Nonmetric Multidimensional Scaling), modelo aditivo generalizado e regressões lineares simples. Foram observados 1.457 indivíduos arbóreos, distribuídos em 29 famílias, 43 gêneros e 55 espécies. A espécie com maior valor de importância foi Dicksonia sellowiana Hook. Não foi observada influência do efeito de borda sobre a organização, a estrutura (diâmetro médio, altura média e densidade) da comunidade e participação relativa das guildas de regeneração. No entanto, ficaram evidenciados maiores valores de diversidade, riqueza e equabilidade nas áreas de borda. Desta forma, concluí-se que parte das variações dos valores relativos à diversidade de espécies arbóreas na Floresta Ombrófila Mista Ato-Montana foi determinada pela distância da borda.
Resumo:
The objective of this study was to estimate the spatial distribution of work accident risk in the informal work market in the urban zone of an industrialized city in southeast Brazil and to examine concomitant effects of age, gender, and type of occupation after controlling for spatial risk variation. The basic methodology adopted was that of a population-based case-control study with particular interest focused on the spatial location of work. Cases were all casual workers in the city suffering work accidents during a one-year period; controls were selected from the source population of casual laborers by systematic random sampling of urban homes. The spatial distribution of work accidents was estimated via a semiparametric generalized additive model with a nonparametric bidimensional spline of the geographical coordinates of cases and controls as the nonlinear spatial component, and including age, gender, and occupation as linear predictive variables in the parametric component. We analyzed 1,918 cases and 2,245 controls between 1/11/2003 and 31/10/2004 in Piracicaba, Brazil. Areas of significantly high and low accident risk were identified in relation to mean risk in the study region (p < 0.01). Work accident risk for informal workers varied significantly in the study area. Significant age, gender, and occupational group effects on accident risk were identified after correcting for this spatial variation. A good understanding of high-risk groups and high-risk regions underpins the formulation of hypotheses concerning accident causality and the development of effective public accident prevention policies.
Resumo:
One of the objectives of this study is to perform classification of socio-demographic components for the level of city section in City of Lisbon. In order to accomplish suitable platform for the restaurant potentiality map, the socio-demographic components were selected to produce a map of spatial clusters in accordance to restaurant suitability. Consequently, the second objective is to obtain potentiality map in terms of underestimation and overestimation in number of restaurants. To the best of our knowledge there has not been found identical methodology for the estimation of restaurant potentiality. The results were achieved with combination of SOM (Self-Organized Map) which provides a segmentation map and GAM (Generalized Additive Model) with spatial component for restaurant potentiality. Final results indicate that the highest influence in restaurant potentiality is given to tourist sites, spatial autocorrelation in terms of neighboring restaurants (spatial component), and tax value, where lower importance is given to household with 1 or 2 members and employed population, respectively. In addition, an important conclusion is that the most attractive market sites have shown no change or moderate underestimation in terms of restaurants potentiality.
Resumo:
Coastal lagoons are semi-isolated ecosystems exposed to wide fluctuations of environmental conditions and showing habitat fragmentation. These features may play an important role in separating species into different populations, even at small spatial scales. In this study, we evaluate the concordance between mitochondrial (previous published data) and nuclear data analyzing the genetic variability of Pomatoschistus marmoratus in five localities, inside and outside the Mar Menor coastal lagoon (SE Spain) using eight microsatellites. High genetic diversity and similar levels of allele richness were observed across all loci and localities, although significant genic and genotypic differentiation was found between populations inside and outside the lagoon. In contrast to the FST values obtained from previous mitochondrial DNA analyses (control region), the microsatellite data exhibited significant differentiation among samples inside the Mar Menor and between lagoonal and marine samples. This pattern was corroborated using Cavalli-Sforza genetic distances. The habitat fragmentation inside the coastal lagoon and among lagoon and marine localities could be acting as a barrier to gene flow and contributing to the observed genetic structure. Our results from generalized additive models point a significant link between extreme lagoonal environmental conditions (mainly maximum salinity) and P. marmoratus genetic composition. Thereby, these environmental features could be also acting on genetic structure of coastal lagoon populations of P. marmoratus favoring their genetic divergence. The mating strategy of P. marmoratus could be also influencing our results obtained from mitochondrial and nuclear DNA. Therefore, a special consideration must be done in the selection of the DNA markers depending on the reproductive strategy of the species.
Resumo:
The Belt and Road Initiative (BRI) is a project launched by the Chinese Government whose main goal is to connect more than 65 countries in Asia, Europe, Africa and Oceania developing infrastructures and facilities. To support the prevention or mitigation of landslide hazards, which may affect the mainland infrastructures of BRI, a landslide susceptibility analysis in the countries involved has been carried out. Due to the large study area, the analysis has been carried out using a multi-scale approach which consists of mapping susceptibility firstly at continental scale, and then at national scale. The study area selected for the continental assessment is the south-Asia, where a pixel-based landslide susceptibility map has been carried out using the Weight of Evidence method and validated by Receiving Operating Characteristic (ROC) curves. Then, we selected the regions of west Tajikistan and north-east India to be investigated at national scale. Data scarcity is a common condition for many countries involved into the Initiative. Therefore in addition to the landslide susceptibility assessment of west Tajikistan, which has been conducted using a Generalized Additive Model and validated by ROC curves, we have examined, in the same study area, the effect of incomplete landslide dataset on the prediction capacity of statistical models. The entire PhD research activity has been conducted using only open data and open-source software. In this context, to support the analysis of the last years an open-source plugin for QGIS has been implemented. The SZ-tool allows the user to make susceptibility assessments from the data preprocessing, susceptibility mapping, to the final classification. All the output data of the analysis conducted are freely available and downloadable. This text describes the research activity of the last three years. Each chapter reports the text of the articles published in international scientific journal during the PhD.
Resumo:
In a sample of censored survival times, the presence of an immune proportion of individuals who are not subject to death, failure or relapse, may be indicated by a relatively high number of individuals with large censored survival times. In this paper the generalized log-gamma model is modified for the possibility that long-term survivors may be present in the data. The model attempts to separately estimate the effects of covariates on the surviving fraction, that is, the proportion of the population for which the event never occurs. The logistic function is used for the regression model of the surviving fraction. Inference for the model parameters is considered via maximum likelihood. Some influence methods, such as the local influence and total local influence of an individual are derived, analyzed and discussed. Finally, a data set from the medical area is analyzed under the log-gamma generalized mixture model. A residual analysis is performed in order to select an appropriate model.
Resumo:
In epidemiologic studies, measurement error in dietary variables often attenuates association between dietary intake and disease occurrence. To adjust for the attenuation caused by error in dietary intake, regression calibration is commonly used. To apply regression calibration, unbiased reference measurements are required. Short-term reference measurements for foods that are not consumed daily contain excess zeroes that pose challenges in the calibration model. We adapted two-part regression calibration model, initially developed for multiple replicates of reference measurements per individual to a single-replicate setting. We showed how to handle excess zero reference measurements by two-step modeling approach, how to explore heteroscedasticity in the consumed amount with variance-mean graph, how to explore nonlinearity with the generalized additive modeling (GAM) and the empirical logit approaches, and how to select covariates in the calibration model. The performance of two-part calibration model was compared with the one-part counterpart. We used vegetable intake and mortality data from European Prospective Investigation on Cancer and Nutrition (EPIC) study. In the EPIC, reference measurements were taken with 24-hour recalls. For each of the three vegetable subgroups assessed separately, correcting for error with an appropriately specified two-part calibration model resulted in about three fold increase in the strength of association with all-cause mortality, as measured by the log hazard ratio. Further found is that the standard way of including covariates in the calibration model can lead to over fitting the two-part calibration model. Moreover, the extent of adjusting for error is influenced by the number and forms of covariates in the calibration model. For episodically consumed foods, we advise researchers to pay special attention to response distribution, nonlinearity, and covariate inclusion in specifying the calibration model.
Resumo:
Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp). Results Model selection based on cross-validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross-validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.
Resumo:
Standard factorial designs sometimes may be inadequate for experiments that aim to estimate a generalized linear model, for example, for describing a binary response in terms of several variables. A method is proposed for finding exact designs for such experiments that uses a criterion allowing for uncertainty in the link function, the linear predictor, or the model parameters, together with a design search. Designs are assessed and compared by simulation of the distribution of efficiencies relative to locally optimal designs over a space of possible models. Exact designs are investigated for two applications, and their advantages over factorial and central composite designs are demonstrated.
Resumo:
This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.
Resumo:
Consider a multihop network comprising Ethernet switches. The traffic is described with flows and each flow is characterized by its source node, its destination node, its route and parameters in the generalized multiframe model. Output queues on Ethernet switches are scheduled by static-priority scheduling and tasks executing on the processor in an Ethernet switch are scheduled by stride scheduling. We present schedulability analysis for this setting.
Resumo:
The aim of this paper is to predict time series of SO2 concentrations emitted by coal-fired power stations in order to estimate in advance emission episodes and analyze the influence of some meteorological variables in the prediction. An emission episode is said to occur when the series of bi-hourly means of SO2 is greater than a specific level. For coal-fired power stations it is essential to predict emission epi- sodes sufficiently in advance so appropriate preventive measures can be taken. We proposed a meth- odology to predict SO2 emission episodes based on using an additive model and an algorithm for variable selection. The methodology was applied to the estimation of SO2 emissions registered in sampling lo- cations near a coal-fired power station located in Northern Spain. The results obtained indicate a good performance of the model considering only two terms of the time series and that the inclusion of the meteorological variables in the model is not significant.
Resumo:
There is recent interest in the generalization of classical factor models in which the idiosyncratic factors are assumed to be orthogonal and there are identification restrictions on cross-sectional and time dimensions. In this study, we describe and implement a Bayesian approach to generalized factor models. A flexible framework is developed to determine the variations attributed to common and idiosyncratic factors. We also propose a unique methodology to select the (generalized) factor model that best fits a given set of data. Applying the proposed methodology to the simulated data and the foreign exchange rate data, we provide a comparative analysis between the classical and generalized factor models. We find that when there is a shift from classical to generalized, there are significant changes in the estimates of the structures of the covariance and correlation matrices while there are less dramatic changes in the estimates of the factor loadings and the variation attributed to common factors.
Resumo:
This paper introduces local distance-based generalized linear models. These models extend (weighted) distance-based linear models firstly with the generalized linear model concept, then by localizing. Distances between individuals are the only predictor information needed to fit these models. Therefore they are applicable to mixed (qualitative and quantitative) explanatory variables or when the regressor is of functional type. Models can be fitted and analysed with the R package dbstats, which implements several distancebased prediction methods.
Resumo:
Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.