932 resultados para Generalized Linear-models
Resumo:
Магдалина Василева Тодорова - В статията е описан подход за верификация на процедурни програми чрез изграждане на техни модели, дефинирани чрез обобщени мрежи. Подходът интегрира концепцията “design by contract” с подходи за верификация от тип доказателство на теореми и проверка на съгласуваност на модели. За целта разделно се верифицират функциите, които изграждат програмата относно спецификации според предназначението им. Изгражда се обобщен мрежов модел, специфициащ връзките между функциите във вид на коректни редици от извиквания. За главната функция на програмата се построява обобщен мрежов модел и се проверява дали той съответства на мрежовия модел на връзките между функциите на програмата. Всяка от функциите на програмата, която използва други функции се верифицира и относно спецификацията, зададена чрез мрежовия модел на връзките между функциите на програмата.
Resumo:
To provide biological insights into transcriptional regulation, a couple of groups have recently presented models relating the promoter DNA-bound transcription factors (TFs) to downstream gene’s mean transcript level or transcript production rates over time. However, transcript production is dynamic in response to changes of TF concentrations over time. Also, TFs are not the only factors binding to promoters; other DNA binding factors (DBFs) bind as well, especially nucleosomes, resulting in competition between DBFs for binding at same genomic location. Additionally, not only TFs, but also some other elements regulate transcription. Within core promoter, various regulatory elements influence RNAPII recruitment, PIC formation, RNAPII searching for TSS, and RNAPII initiating transcription. Moreover, it is proposed that downstream from TSS, nucleosomes resist RNAPII elongation.
Here, we provide a machine learning framework to predict transcript production rates from DNA sequences. We applied this framework in the S. cerevisiae yeast for two scenarios: a) to predict the dynamic transcript production rate during the cell cycle for native promoters; b) to predict the mean transcript production rate over time for synthetic promoters. As far as we know, our framework is the first successful attempt to have a model that can predict dynamic transcript production rates from DNA sequences only: with cell cycle data set, we got Pearson correlation coefficient Cp = 0.751 and coefficient of determination r2 = 0.564 on test set for predicting dynamic transcript production rate over time. Also, for DREAM6 Gene Promoter Expression Prediction challenge, our fitted model outperformed all participant teams, best of all teams, and a model combining best team’s k-mer based sequence features and another paper’s biologically mechanistic features, in terms of all scoring metrics.
Moreover, our framework shows its capability of identifying generalizable fea- tures by interpreting the highly predictive models, and thereby provide support for associated hypothesized mechanisms about transcriptional regulation. With the learned sparse linear models, we got results supporting the following biological insights: a) TFs govern the probability of RNAPII recruitment and initiation possibly through interactions with PIC components and transcription cofactors; b) the core promoter amplifies the transcript production probably by influencing PIC formation, RNAPII recruitment, DNA melting, RNAPII searching for and selecting TSS, releasing RNAPII from general transcription factors, and thereby initiation; c) there is strong transcriptional synergy between TFs and core promoter elements; d) the regulatory elements within core promoter region are more than TATA box and nucleosome free region, suggesting the existence of still unidentified TAF-dependent and cofactor-dependent core promoter elements in yeast S. cerevisiae; e) nucleosome occupancy is helpful for representing +1 and -1 nucleosomes’ regulatory roles on transcription.
Resumo:
Includes index.
Resumo:
Species distribution and ecological niche models are increasingly used in biodiversity management and conservation. However, one thing that is important but rarely done is to follow up on the predictive performance of these models over time, to check if their predictions are fulfilled and maintain accuracy, or if they apply only to the set in which they were produced. In 2003, a distribution model of the Eurasian otter (Lutra lutra) in Spain was published, based on the results of a country-wide otter survey published in 1998. This model was built with logistic regression of otter presence-absence in UTM 10 km2 cells on a diverse set of environmental, human and spatial variables, selected according to statistical criteria. Here we evaluate this model against the results of the most recent otter survey, carried out a decade later and after a significant expansion of the otter distribution area in this country. Despite the time elapsed and the evident changes in this species’ distribution, the model maintained a good predictive capacity, considering both discrimination and calibration measures. Otter distribution did not expand randomly or simply towards vicinity areas,m but specifically towards the areas predicted as most favourable by the model based on data from 10 years before. This corroborates the utility of predictive distribution models, at least in the medium term and when they are made with robust methods and relevant predictor variables.
Resumo:
This is an ecological, analytical and retrospective study comprising the 645 municipalities in the State of São Paulo, the scope of which was to determine the relationship between socioeconomic, demographic variables and the model of care in relation to infant mortality rates in the period from 1998 to 2008. The ratio of average annual change for each indicator per stratum coverage was calculated. Infant mortality was analyzed according to the model for repeated measures over time, adjusted for the following correction variables: the city's population, proportion of Family Health Programs (PSFs) deployed, proportion of Growth Acceleration Programs (PACs) deployed, per capita GDP and SPSRI (São Paulo social responsibility index). The analysis was performed by generalized linear models, considering the gamma distribution. Multiple comparisons were performed with the likelihood ratio with chi-square approximate distribution, considering a significance level of 5%. There was a decrease in infant mortality over the years (p < 0.05), with no significant difference from 2004 to 2008 (p > 0.05). The proportion of PSFs deployed (p < 0.0001) and per capita GDP (p < 0.0001) were significant in the model. The decline of infant mortality in this period was influenced by the growth of per capita GDP and PSFs.
Resumo:
In many occupational safety interventions, the objective is to reduce the injury incidence as well as the mean claims cost once injury has occurred. The claims cost data within a period typically contain a large proportion of zero observations (no claim). The distribution thus comprises a point mass at 0 mixed with a non-degenerate parametric component. Essentially, the likelihood function can be factorized into two orthogonal components. These two components relate respectively to the effect of covariates on the incidence of claims and the magnitude of claims, given that claims are made. Furthermore, the longitudinal nature of the intervention inherently imposes some correlation among the observations. This paper introduces a zero-augmented gamma random effects model for analysing longitudinal data with many zeros. Adopting the generalized linear mixed model (GLMM) approach reduces the original problem to the fitting of two independent GLMMs. The method is applied to evaluate the effectiveness of a workplace risk assessment teams program, trialled within the cleaning services of a Western Australian public hospital.
Resumo:
This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.
Resumo:
OBJECTIVE: Myocardial infarction is an acute and severe cardiovascular disease that generally leads to patient admissions to intensive care units and few cases are initially admitted to infirmaries. The objective of the study was to assess whether estimates of air pollution effects on myocardial infarction morbidity are modified by the source of health information. METHODS: The study was carried out in hospitals of the Brazilian Health System in the city of São Paulo, Southern Brazil. A time series study (1998-1999) was performed using two outcomes: infarction admissions to infirmaries and to intensive care units, both for people older than 64 years of age. Generalized linear models controlling for seasonality (long and short-term trends) and weather were used. The eight-day cumulative effects of air pollutants were assessed using third degree polynomial distributed lag models. RESULTS: Almost 70% of daily hospital admissions due to myocardial infarction were to infirmaries. Despite that, the effects of air pollutants on infarction were higher for intensive care units admissions. All pollutants were positively associated with the study outcomes but SO2 presented the strongest statistically significant association. An interquartile range increase on SO2 concentration was associated with increases of 13% (95% CI: 6-19) and 8% (95% CI: 2-13) of intensive care units and infirmary infarction admissions, respectively. CONCLUSIONS: It may be assumed there is a misclassification of myocardial infarction admissions to infirmaries leading to overestimation. Also, despite the absolute number of events, admissions to intensive care units data provides a more adequate estimate of the magnitude of air pollution effects on infarction admissions.
Resumo:
Dissertação de Mestrado, Estudos Integrados dos Oceanos, 25 de Março de 2013, Universidade dos Açores.
Resumo:
OBJECTIVE To analyze vaccination coverage and factors associated with a complete immunization scheme in children < 5 years old. METHODS This cross-sectional household census survey evaluated 1,209 children < 5 years old living in Bom Jesus, Angola, in 2010. Data were obtained from interviews, questionnaires, child immunization histories, and maternal health histories. The statistical analysis used generalized linear models, in which the dependent variable followed a binary distribution (vaccinated, unvaccinated) and the association function was logarithmic and had the children’s individual, familial, and socioeconomic factors as independent variables. RESULTS Vaccination coverage was 37.0%, higher in children < 1 year (55.0%) and heterogeneous across neighborhoods; 52.0% of children of both sexes had no immunization records. The prevalence rate of vaccination significantly varied according to child age, mother’s level of education, family size, ownership of household appliances, and destination of domestic waste. CONCLUSIONS Vulnerable groups with vaccination coverage below recommended levels continue to be present. Some factors indicate inequalities that represent barriers to full immunization, indicating the need to implement more equitable policies. The knowledge of these factors contributes to planning immunization promotion measures that focus on the most vulnerable groups.
Resumo:
Introduction This study aimed to analyze the relationship between the incidence of severe dengue during the 2008 epidemic in Rio de Janeiro, Brazil, and socioeconomic indicators, as well as indicators of health service availability and previous circulation of the dengue virus serotype-3 (DENV-3). Methods In this ecological study, the units of analysis were the districts of Rio de Janeiro. The data were incorporated into generalized linear models, and the incidence of severe dengue in each district was the outcome variable. Results The districts with more cases of dengue fever in the 2001 epidemic and a higher percentage of residents who declared their skin color or race as black had higher incidence rates of severe dengue in the 2008 epidemic [incidence rate ratio (IRR)= 1.21; 95% confidence interval (95%CI)= 1.05-1.40 and IRR= 1.34; 95%CI= 1.16-1.54, respectively]. In contrast, the districts with Family Health Strategy (FHS) clinics were more likely to have lower incidence rates of severe dengue in the 2008 epidemic (IRR= 0.81; 95%CI= 0.70-0.93). Conclusions At the ecological level, our findings suggest the persistence of health inequalities in this region of Brazil that are possibly due to greater social vulnerability among the self-declared black population. Additionally, the protective effect of FHS clinics may be due to the ease of access to other levels of care in the health system or to a reduced vulnerability to dengue transmission that is afforded by local practices to promote health.
Resumo:
1. Landscape modification is often considered the principal cause of population decline in many bat species. Thus, schemes for bat conservation rely heavily on knowledge about species-landscape relationships. So far, however, few studies have quantified the possible influence of landscape structure on large-scale spatial patterns in bat communities. 2. This study presents quantitative models that use landscape structure to predict (i) spatial patterns in overall community composition and (ii) individual species' distributions through canonical correspondence analysis and generalized linear models, respectively. A geographical information system (GIS) was then used to draw up maps of (i) overall community patterns and (ii) distribution of potential species' habitats. These models relied on field data from the Swiss Jura mountains. 3. Fight descriptors of landscape structure accounted for 30% of the variation in bat community composition. For some species, more than 60% of the variance in distribution could be explained by landscape structure. Elevation, forest or woodland cover, lakes and suburbs, were the most frequent predictors. 4. This study shows that community composition in bats is related to landscape structure through species-specific relationships to resources. Due to their nocturnal activities and the difficulties of remote identification, a comprehensive bat census is rarely possible, and we suggest that predictive modelling of the type described here provides an indispensable conservation tool.
Resumo:
Western European landscapes have drastically changed since the 1950s, with agricultural intensifications and the spread of urban settlements considered the most important drivers of this land-use/land-cover change. Losses of habitat for fauna and flora have been a direct consequence of this development. In the present study, we relate butterfly occurrence to land-use/land-cover changes over five decades between 1951 and 2000. The study area covers the entire Swiss territory. The 10 explanatory variables originate from agricultural statistics and censuses. Both state as well as rate was used as explanatory variables. Species distribution data were obtained from natural history collections. We selected eight butterfly species: four species occur on wetlands and four occur on dry grasslands. We used cluster analysis to track land-use/land-cover changes and to group communes based on similar trajectories of change. Generalized linear models were applied to identify factors that were significantly correlated with the persistence or disappearance of butterfly species. Results showed that decreasing agricultural areas and densities of farms with more than 10 ha of cultivated land are significantly related with wetland species decline, and increasing densities of livestock seem to have favored disappearance of dry grassland species. Moreover, we show that species declines are not only dependent on land-use/land-cover states but also on the rates of change; that is, the higher the transformation rate from small to large farms, the higher the loss of dry grassland species. We suggest that more attention should be paid to the rates of landscape change as feasible drivers of species change and derive some management suggestions.
Resumo:
Aim To explore the respective power of climate and topography to predict the distribution of reptiles in Switzerland, hence at a mesoscale level. A more detailed knowledge of these relationships, in combination with maps of the potential distribution derived from the models, is a valuable contribution to the design of conservation strategies. Location All of Switzerland. Methods Generalized linear models are used to derive predictive habitat distribution models from eco-geographical predictors in a geographical information system, using species data from a field survey conducted between 1980 and 1999. Results The maximum amount of deviance explained by climatic models is 65%, and 50% by topographical models. Low values were obtained with both sets of predictors for three species that are widely distributed in all parts of the country (Anguis fragilis , Coronella austriaca , and Natrix natrix), a result that suggests that including other important predictors, such as resources, should improve the models in further studies. With respect to topographical predictors, low values were also obtained for two species where we anticipated a strong response to aspect and slope, Podarcis muralis and Vipera aspis . Main conclusions Overall, both models and maps derived from climatic predictors more closely match the actual reptile distributions than those based on topography. These results suggest that the distributional limits of reptile species with a restricted range in Switzerland are largely set by climatic, predominantly temperature-related, factors.