932 resultados para Generalized Linear-models
Resumo:
We investigate the dependence of Bayesian error bars on the distribution of data in input space. For generalized linear regression models we derive an upper bound on the error bars which shows that, in the neighbourhood of the data points, the error bars are substantially reduced from their prior values. For regions of high data density we also show that the contribution to the output variance due to the uncertainty in the weights can exhibit an approximate inverse proportionality to the probability density. Empirical results support these conclusions.
Resumo:
Background - The binding between peptide epitopes and major histocompatibility complex proteins (MHCs) is an important event in the cellular immune response. Accurate prediction of the binding between short peptides and the MHC molecules has long been a principal challenge for immunoinformatics. Recently, the modeling of MHC-peptide binding has come to emphasize quantitative predictions: instead of categorizing peptides as "binders" or "non-binders" or as "strong binders" and "weak binders", recent methods seek to make predictions about precise binding affinities. Results - We developed a quantitative support vector machine regression (SVR) approach, called SVRMHC, to model peptide-MHC binding affinities. As a non-linear method, SVRMHC was able to generate models that out-performed existing linear models, such as the "additive method". By adopting a new "11-factor encoding" scheme, SVRMHC takes into account similarities in the physicochemical properties of the amino acids constituting the input peptides. When applied to MHC-peptide binding data for three mouse class I MHC alleles, the SVRMHC models produced more accurate predictions than those produced previously. Furthermore, comparisons based on Receiver Operating Characteristic (ROC) analysis indicated that SVRMHC was able to out-perform several prominent methods in identifying strongly binding peptides. Conclusion - As a method with demonstrated performance in the quantitative modeling of MHC-peptide binding and in identifying strong binders, SVRMHC is a promising immunoinformatics tool with not inconsiderable future potential.
Resumo:
Prognostic procedures can be based on ranked linear models. Ranked regression type models are designed on the basis of feature vectors combined with set of relations defined on selected pairs of these vectors. Feature vectors are composed of numerical results of measurements on particular objects or events. Ranked relations defined on selected pairs of feature vectors represent additional knowledge and can reflect experts' opinion about considered objects. Ranked models have the form of linear transformations of feature vectors on a line which preserve a given set of relations in the best manner possible. Ranked models can be designed through the minimization of a special type of convex and piecewise linear (CPL) criterion functions. Some sets of ranked relations cannot be well represented by one ranked model. Decomposition of global model into a family of local ranked models could improve representation. A procedures of ranked models decomposition is described in this paper.
Resumo:
2000 Mathematics Subject Classification: 62H12, 62P99
Resumo:
Analysis of risk measures associated with price series data movements and its predictions are of strategic importance in the financial markets as well as to policy makers in particular for short- and longterm planning for setting up economic growth targets. For example, oilprice risk-management focuses primarily on when and how an organization can best prevent the costly exposure to price risk. Value-at-Risk (VaR) is the commonly practised instrument to measure risk and is evaluated by analysing the negative/positive tail of the probability distributions of the returns (profit or loss). In modelling applications, least-squares estimation (LSE)-based linear regression models are often employed for modeling and analyzing correlated data. These linear models are optimal and perform relatively well under conditions such as errors following normal or approximately normal distributions, being free of large size outliers and satisfying the Gauss-Markov assumptions. However, often in practical situations, the LSE-based linear regression models fail to provide optimal results, for instance, in non-Gaussian situations especially when the errors follow distributions with fat tails and error terms possess a finite variance. This is the situation in case of risk analysis which involves analyzing tail distributions. Thus, applications of the LSE-based regression models may be questioned for appropriateness and may have limited applicability. We have carried out the risk analysis of Iranian crude oil price data based on the Lp-norm regression models and have noted that the LSE-based models do not always perform the best. We discuss results from the L1, L2 and L∞-norm based linear regression models. ACM Computing Classification System (1998): B.1.2, F.1.3, F.2.3, G.3, J.2.
Resumo:
In this thesis used four different methods in order to diagnose the precipitation extremes on Northeastern Brazil (NEB): Generalized Linear Model s via logistic regression and Poisson, extreme value theory analysis via generalized extre me value (GEV) and generalized Pareto (GPD) distributions and Vectorial Generalized Linea r Models via GEV (MVLG GEV). The logistic regression and Poisson models were used to identify the interactions between the precipitation extremes and other variables based on the odds ratios and relative risks. It was found that the outgoing longwave radiation was the indicator variable for the occurrence of extreme precipitation on eastern, northern and semi arid NEB, and the relative humidity was verified on southern NEB. The GEV and GPD distribut ions (based on the 95th percentile) showed that the location and scale parameters were presented the maximum on the eastern and northern coast NEB, the GEV verified a maximum core on western of Pernambuco influenced by weather systems and topography. The GEV and GPD shape parameter, for most regions the data fitted by Weibull negative an d Beta distributions (ξ < 0) , respectively. The levels and return periods of GEV (GPD) on north ern Maranhão (centerrn of Bahia) may occur at least an extreme precipitation event excee ding over of 160.9 mm /day (192.3 mm / day) on next 30 years. The MVLG GEV model found tha t the zonal and meridional wind components, evaporation and Atlantic and Pacific se a surface temperature boost the precipitation extremes. The GEV parameters show the following results: a) location ( ), the highest value was 88.26 ± 6.42 mm on northern Maran hão; b) scale ( σ ), most regions showed positive values, except on southern of Maranhão; an d c) shape ( ξ ), most of the selected regions were adjusted by the Weibull negative distr ibution ( ξ < 0 ). The southern Maranhão and southern Bahia have greater accuracy. The level period, it was estimated that the centern of Bahia may occur at least an extreme precipitatio n event equal to or exceeding over 571.2 mm/day on next 30 years.
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
Background: Conifer populations appear disproportionately threatened by global change. Most examples are, however, drawn from the northern hemisphere and long-term rates of population decline are not well documented as historical data are often lacking. We use a large and long-term (1931-2013) repeat photography dataset together with environmental data and fire records to account for the decline of the critically endangered Widdringtonia cedarbergensis. Eighty-seven historical and repeat photo-pairs were analysed to establish 20th century changes in W. cedarbergensis demography. A generalized linear mixed-effects model was fitted to determine the relative importance of environmental factors and fire-return interval on mortality for the species. Results: From an initial total of 1313 live trees in historical photographs, 74% had died and only 44 (3.4%) had recruited in the repeat photographs, leaving 387 live individuals. Juveniles (mature adults) had decreased (increased) from 27% (73%) to 8% (92%) over the intervening period. Our model demonstrates that mortality is related to greater fire frequency, higher temperatures, lower elevations, less rocky habitats and aspect (i.e. east-facing slopes had the least mortality). Conclusions: Our results show that W. cedarbergensis populations have declined significantly over the recorded period, with a pronounced decline in the last 30 years. Individuals that established in open habitats at lower, hotter elevations and experienced a greater fire frequency appear to be more vulnerable to mortality than individuals growing within protected, rocky environments at higher, cooler locations with less frequent fires. Climate models predict increasing temperatures for our study area (and likely increases in wildfires). If these predictions are realised, further declines in the species can be expected. Urgent management interventions, including seedling out-planting in fire-protected high elevation sites, reducing fire frequency in higher elevation populations, and assisted migration, should be considered.
Resumo:
Phytoplankton is a sentinel of marine ecosystem change. Composed by many species with different life-history strategies, it rapidly responds to environment changes. An analysis of the abundance of 54 phytoplankton species in Galicia (NW Spain) between 1989 and 2008 to determine the main components of temporal variability in relation to climate and upwelling showed that most of this variability was stochastic, as seasonality and long term trends contributed to relatively small fractions of the series. In general, trends appeared as non linear, and species clustered in 4 groups according to the trend pattern but there was no defined pattern for diatoms, dinoflagellates or other groups. While, in general, total abundance increased, no clear trend was found for 23 species, 14 species decreased, 4 species increased during the early 1990s, and only 13 species showed a general increase through the series. In contrast, series of local environmental conditions (temperature, stratification, nutrients) and climate-related variables (atmospheric pressure indices, upwelling winds) showed a high fraction of their variability in deterministic seasonality and trends. As a result, each species responded independently to environmental and climate variability, measured by generalized additive models. Most species showed a positive relationship with nutrient concentrations but only a few showed a direct relationship with stratification and upwelling. Climate variables had only measurable effects on some species but no common response emerged. Because its adaptation to frequent disturbances, phytoplankton communities in upwelling ecosystems appear less sensitive to changes in regional climate than other communities characterized by short and well defined productive periods.
Resumo:
We analyze a real data set pertaining to reindeer fecal pellet-group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi-Poisson hierarchical generalized linear model (HGLM), zero-inflated Poisson (ZIP), and hurdle models. The quasi-Poisson HGLM allows for both under- and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi-Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi-Poisson HGLM with spatial random effects.
Resumo:
For decades, global climate change has directly and indirectly affected the structure and function of ecosystems. Abrupt changes in biodiversity have been observed in response to linear or sudden modifications to the environment. These abrupt shifts can cause long-term reorganizations within ecosystems, with communities exhibiting new functional responses to environmental factors. Over the last 3 decades, the Gironde estuary in southwest France has experienced 2 abrupt shifts in both the physical and chemical environments and the pelagic community. Rather than describing these shifts and their origins, we focused on the 3 inter-shift periods, describing the structure of the fish community and its relationship with the environment during these periods. We described fish biodiversity using a limited set of descriptors, taking into account both species composition and relative species abundances. Inter-shift ecosystem states were defined based on the relationship between this description and the hydro-physico-chemical variables and climatic indices defining the main features of the environment. This relationship was described using generalized linear mixed models on the entire time series and for each inter-shift period. Our results indicate that (1) the fish community structure has been significantly modified, (2) environmental drivers influencing fish diversity have changed during these 3 periods, and (3) the fish-environment relationships have been modified over time. From this, we conclude a regime shift has occurred in the Gironde estuary. We also highlight that anthropogenic influences have increased, which re-emphasizes the importance of local management in maintaining fish diversity and associated goods and services within the context of climate change.
Resumo:
Endemic zoonotic diseases remain a serious but poorly recognised problem in affected communities in developing countries. Despite the overall burden of zoonoses on human and animal health, information about their impacts in endemic settings is lacking and most of these diseases are continuously being neglected. The non-specific clinical presentation of these diseases has been identified as a major challenge in their identification (even with good laboratory diagnosis), and control. The signs and symptoms in animals and humans respectively, are easily confused with other non-zoonotic diseases, leading to widespread misdiagnosis in areas where diagnostic capacity is limited. The communities that are mostly affected by these diseases live in close proximity with their animals which they depend on for livelihood, which further complicates the understanding of the epidemiology of zoonoses. This thesis reviewed the pattern of reporting of zoonotic pathogens that cause febrile illness in malaria endemic countries, and evaluates the recognition of animal associations among other risk factors in the transmission and management of zoonoses. The findings of the review chapter were further investigated through a laboratory study of risk factors for bovine leptospirosis, and exposure patterns of livestock coxiellosis in the subsequent chapters. A review was undertaken on 840 articles that were part of a bigger review of zoonotic pathogens that cause human fever. The review process involves three main steps: filtering and reference classification, identification of abstracts that describe risk factors, and data extraction and summary analysis of data. Abstracts of the 840 references were transferred into a Microsoft excel spread sheet, where several subsets of abstracts were generated using excel filters and text searches to classify the content of each abstract. Data was then extracted and summarised to describe geographical patterns of the pathogens reported, and determine the frequency animal related risk factors were considered among studies that investigated risk factors for zoonotic pathogen transmission. Subsequently, a seroprevalence study of bovine leptospirosis in northern Tanzania was undertaken in the second chapter of this thesis. The study involved screening of serum samples, which were obtained from an abattoir survey and cross-sectional study (Bacterial Zoonoses Project), for antibodies against Leptospira serovar Hardjo. The data were analysed using generalised linear mixed models (GLMMs), to identify risk factors for cattle infection. The final chapter was the analysis of Q fever data, which were also obtained from the Bacterial Zoonoses Project, to determine exposure patterns across livestock species using generalized linear mixed models (GLMMs). Leptospira spp. (10.8%, 90/840) and Rickettsia spp. (10.7%, 86/840) were identified as the most frequently reported zoonotic pathogens that cause febrile illness, while Rabies virus (0.4%, 3/840) and Francisella spp. (0.1%, 1/840) were least reported, across malaria endemic countries. The majority of the pathogens were reported in Asia, and the frequency of reporting seems to be higher in areas where outbreaks are mostly reported. It was also observed that animal related risk factors are not often considered among other risk factors for zoonotic pathogens that cause human fever in malaria endemic countries. The seroprevalence study indicated that Leptospira serovar Hardjo is widespread in cattle population in northern Tanzania, and animal husbandry systems and age are the two most important risk factors that influence seroprevalence. Cattle in the pastoral systems and adult cattle were significantly more likely to be seropositive compared to non-pastoral and young animals respectively, while there was no significant effect of cattle breed or sex. Exposure patterns of Coxiella burnetii appear different for each livestock species. While most risk factors were identified for goats (such as animal husbandry systems, age and sex) and sheep (animal husbandry systems and sex), there were none for cattle. In addition, there was no evidence of a significant influence of mixed livestock-keeping on animal coxiellosis. Zoonotic agents that cause human fever are common in developing countries. The role of animals in the transmission of zoonotic pathogens that cause febrile illness is not fully recognised and appreciated. Since Leptospira spp. and C. burnetii are among the most frequently reported pathogens that cause human fever across malaria endemic countries, and are also prevalent in livestock population, control and preventive measures that recognise animals as source of infection would be very important especially in livestock-keeping communities where people live in close proximity with their animals.
Resumo:
Species occurrence and abundance models are important tools that can be used in biodiversity conservation, and can be applied to predict or plan actions needed to mitigate the environmental impacts of hydropower dams. In this study our objectives were: (i) to model the occurrence and abundance of threatened plant species, (ii) to verify the relationship between predicted occurrence and true abundance, and (iii) to assess whether models based on abundance are more effective in predicting species occurrence than those based on presence–absence data. Individual representatives of nine species were counted within 388 randomly georeferenced plots (10 m × 50 m) around the Barra Grande hydropower dam reservoir in southern Brazil. We modelled their relationship with 15 environmental variables using both occurrence (Generalised Linear Models) and abundance data (Hurdle and Zero-Inflated models). Overall, occurrence models were more accurate than abundance models. For all species, observed abundance was significantly, although not strongly, correlated with the probability of occurrence. This correlation lost significance when zero-abundance (absence) sites were excluded from analysis, but only when this entailed a substantial drop in sample size. The same occurred when analysing relationships between abundance and probability of occurrence from previously published studies on a range of different species, suggesting that future studies could potentially use probability of occurrence as an approximate indicator of abundance when the latter is not possible to obtain. This possibility might, however, depend on life history traits of the species in question, with some traits favouring a relationship between occurrence and abundance. Reconstructing species abundance patterns from occurrence could be an important tool for conservation planning and the management of threatened species, allowing scientists to indicate the best areas for collection and reintroduction of plant germplasm or choose conservation areas most likely to maintain viable populations.
Resumo:
Resistant hypertension (RHTN) includes patients with controlled blood pressure (BP) (CRHTN) and uncontrolled BP (UCRHTN). In fact, RHTN patients are more likely to have target organ damage (TOD), and resistin, leptin and adiponectin may affect BP control in these subjects. We assessed the relationship between adipokines levels and arterial stiffness, left ventricular hypertrophy (LVH) and microalbuminuria (MA). This cross-sectional study included CRHTN (n=51) and UCRHTN (n=38) patients for evaluating body mass index, ambulatory blood pressure monitoring, plasma adiponectin, leptin and resistin concentrations, pulse wave velocity (PWV), MA and echocardiography. Leptin and resistin levels were higher in UCRHTN, whereas adiponectin levels were lower in this same subgroup. Similarly, arterial stiffness, LVH and MA were higher in UCRHTN subgroup. Adiponectin levels negatively correlated with PWV (r=-0.42, P<0.01), and MA (r=-0.48, P<0.01) only in UCRHTN. Leptin was positively correlated with PWV (r=0.37, P=0.02) in UCRHTN subgroup, whereas resistin was not correlated with TOD in both subgroups. Adiponectin is associated with arterial stiffness and renal injury in UCRHTN patients, whereas leptin is associated with arterial stiffness in the same subgroup. Taken together, our results showed that those adipokines may contribute to vascular and renal damage in UCRHTN patients.
Resumo:
The use of screening techniques, such as an alternative light source (ALS), is important for finding biological evidence at a crime scene. The objective of this study was to evaluate whether biological fluid (blood, semen, saliva, and urine) deposited on different surfaces changes as a function of the age of the sample. Stains were illuminated with a Megamaxx™ ALS System and photographed with a Canon EOS Utility™ camera. Adobe Photoshop™ was utilized to prepare photographs for analysis, and then ImageJ™ was used to record the brightness values of pixels in the images. Data were submitted to analysis of variance using a generalized linear mixed model with two fixed effects (surface and fluid). Time was treated as a random effect (through repeated measures) with a first-order autoregressive covariance structure. Means of significant effects were compared by the Tukey test. The fluorescence of the analyzed biological material varied depending on the age of the sample. Fluorescence was lower when the samples were moist. Fluorescence remained constant when the sample was dry, up to the maximum period analyzed (60 days), independent of the substrate on which the fluid was deposited, showing the novelty of this study. Therefore, the forensic expert can detect biological fluids at the crime scene using an ALS even several days after a crime has occurred.