846 resultados para generalized linear models
Resumo:
This paper presents a modeling effort for developing safety performance models (SPM) for urban intersections for three major Brazilian cities. The proposed methodology for calibrating SPM has been divided into the following steps: defining the safety study objective, choosing predictive variables and sample size, data acquisition, defining model expression and model parameters and model evaluation. Among the predictive variables explored in the calibration phase were exposure variables (AADT), number of lanes, number of approaches and central median status. SPMs were obtained for three cities: Fortaleza, Belo Horizonte and Brasilia. The SPM developed for signalized intersections in Fortaleza and Belo Horizonte had the same structure and the most significant independent variables, which were AADT entering the intersection and number of lanes, and in addition, the coefficient of the best models were in the same range of values. For Brasilia, because of the sample size, the signalized and unsignalized intersections were grouped, and the AADT was split in minor and major approaches, which were the most significant variables. This paper also evaluated SPM transferability to other jurisdiction. The SPM for signalized intersections from Fortaleza and Belo Horizonte have been recalibrated (in terms of the COx) to the city of Porto Alegre. The models were adjusted following the Highway Safety Manual (HSM) calibration procedure and yielded C-x of 0.65 and 2.06 for Fortaleza and Belo Horizonte SPM respectively. This paper showed the experience and future challenges toward the initiatives on development of SPMs in Brazil, that can serve as a guide for other countries that are in the same stage in this subject. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The choice of an appropriate family of linear models for the analysis of longitudinal data is often a matter of concern for practitioners. To attenuate such difficulties, we discuss some issues that emerge when analyzing this type of data via a practical example involving pretestposttest longitudinal data. In particular, we consider log-normal linear mixed models (LNLMM), generalized linear mixed models (GLMM), and models based on generalized estimating equations (GEE). We show how some special features of the data, like a nonconstant coefficient of variation, may be handled in the three approaches and evaluate their performance with respect to the magnitude of standard errors of interpretable and comparable parameters. We also show how different diagnostic tools may be employed to identify outliers and comment on available software. We conclude by noting that the results are similar, but that GEE-based models may be preferable when the goal is to compare the marginal expected responses.
Resumo:
Spatial linear models have been applied in numerous fields such as agriculture, geoscience and environmental sciences, among many others. Spatial dependence structure modelling, using a geostatistical approach, is an indispensable tool to estimate the parameters that define this structure. However, this estimation may be greatly affected by the presence of atypical observations in the sampled data. The purpose of this paper is to use diagnostic techniques to assess the sensitivity of the maximum-likelihood estimators, covariance functions and linear predictor to small perturbations in the data and/or the spatial linear model assumptions. The methodology is illustrated with two real data sets. The results allowed us to conclude that the presence of atypical values in the sample data have a strong influence on thematic maps, changing the spatial dependence structure.
Resumo:
In the simultaneous estimation of a large number of related quantities, multilevel models provide a formal mechanism for efficiently making use of the ensemble of information for deriving individual estimates. In this article we investigate the ability of the likelihood to identify the relationship between signal and noise in multilevel linear mixed models. Specifically, we consider the ability of the likelihood to diagnose conjugacy or independence between the signals and noises. Our work was motivated by the analysis of data from high-throughput experiments in genomics. The proposed model leads to a more flexible family. However, we further demonstrate that adequately capitalizing on the benefits of a well fitting fully-specified likelihood in the terms of gene ranking is difficult.
Resumo:
In applied work economists often seek to relate a given response variable y to some causal parameter mu* associated with it. This parameter usually represents a summarization based on some explanatory variables of the distribution of y, such as a regression function, and treating it as a conditional expectation is central to its identification and estimation. However, the interpretation of mu* as a conditional expectation breaks down if some or all of the explanatory variables are endogenous. This is not a problem when mu* is modelled as a parametric function of explanatory variables because it is well known how instrumental variables techniques can be used to identify and estimate mu*. In contrast, handling endogenous regressors in nonparametric models, where mu* is regarded as fully unknown, presents di±cult theoretical and practical challenges. In this paper we consider an endogenous nonparametric model based on a conditional moment restriction. We investigate identification related properties of this model when the unknown function mu* belongs to a linear space. We also investigate underidentification of mu* along with the identification of its linear functionals. Several examples are provided in order to develop intuition about identification and estimation for endogenous nonparametric regression and related models.
Resumo:
Scholars have found that socioeconomic status was one of the key factors that influenced early-stage lung cancer incidence rates in a variety of regions. This thesis examined the association between median household income and lung cancer incidence rates in Texas counties. A total of 254 individual counties in Texas with corresponding lung cancer incidence rates from 2004 to 2008 and median household incomes in 2006 were collected from the National Cancer Institute Surveillance System. A simple linear model and spatial linear models with two structures, Simultaneous Autoregressive Structure (SAR) and Conditional Autoregressive Structure (CAR), were used to link median household income and lung cancer incidence rates in Texas. The residuals of the spatial linear models were analyzed with Moran's I and Geary's C statistics, and the statistical results were used to detect similar lung cancer incidence rate clusters and disease patterns in Texas.^
Resumo:
The Department of Structural Analysis of the University of Santander has been for a longtime involved in the solution of the country´s practical engineering problems. Some of these have required the use of non-conventional methods of analysis, in order to achieve adequate engineering answers. As an example of the increasing application of non-linear computer codes in the nowadays engineering practice, some cases will be briefly presented. In each case, only the main features of the problem involved and the solution used to solve it will be shown
Resumo:
Understanding spatial distributions and how environmental conditions influence catch-per-unit-effort (CPUE) is important for increased fishing efficiency and sustainable fisheries management. This study investigated the relationship between CPUE, spatial factors, temperature, and depth using generalized additive models. Combinations of factors, and not one single factor, were frequently included in the best model. Parameters which best described CPUE varied by geographic region. The amount of variance, or deviance, explained by the best models ranged from a low of 29% (halibut, Charlotte region) to a high of 94% (sablefish, Charlotte region). Depth, latitude, and longitude influenced most species in several regions. On the broad geographic scale, depth was associated with CPUE for every species, except dogfish. Latitude and longitude influenced most species, except halibut (Areas 4 A/D), sablefish, and cod. Temperature was important for describing distributions of halibut in Alaska, arrowtooth flounder in British Columbia, dogfish, Alaska skate, and Aleutian skate. The species-habitat relationships revealed in this study can be used to create improved fishing and management strategies.
Resumo:
The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.
Resumo:
Магдалина Василева Тодорова - В статията е описан подход за верификация на процедурни програми чрез изграждане на техни модели, дефинирани чрез обобщени мрежи. Подходът интегрира концепцията “design by contract” с подходи за верификация от тип доказателство на теореми и проверка на съгласуваност на модели. За целта разделно се верифицират функциите, които изграждат програмата относно спецификации според предназначението им. Изгражда се обобщен мрежов модел, специфициащ връзките между функциите във вид на коректни редици от извиквания. За главната функция на програмата се построява обобщен мрежов модел и се проверява дали той съответства на мрежовия модел на връзките между функциите на програмата. Всяка от функциите на програмата, която използва други функции се верифицира и относно спецификацията, зададена чрез мрежовия модел на връзките между функциите на програмата.
Resumo:
To provide biological insights into transcriptional regulation, a couple of groups have recently presented models relating the promoter DNA-bound transcription factors (TFs) to downstream gene’s mean transcript level or transcript production rates over time. However, transcript production is dynamic in response to changes of TF concentrations over time. Also, TFs are not the only factors binding to promoters; other DNA binding factors (DBFs) bind as well, especially nucleosomes, resulting in competition between DBFs for binding at same genomic location. Additionally, not only TFs, but also some other elements regulate transcription. Within core promoter, various regulatory elements influence RNAPII recruitment, PIC formation, RNAPII searching for TSS, and RNAPII initiating transcription. Moreover, it is proposed that downstream from TSS, nucleosomes resist RNAPII elongation.
Here, we provide a machine learning framework to predict transcript production rates from DNA sequences. We applied this framework in the S. cerevisiae yeast for two scenarios: a) to predict the dynamic transcript production rate during the cell cycle for native promoters; b) to predict the mean transcript production rate over time for synthetic promoters. As far as we know, our framework is the first successful attempt to have a model that can predict dynamic transcript production rates from DNA sequences only: with cell cycle data set, we got Pearson correlation coefficient Cp = 0.751 and coefficient of determination r2 = 0.564 on test set for predicting dynamic transcript production rate over time. Also, for DREAM6 Gene Promoter Expression Prediction challenge, our fitted model outperformed all participant teams, best of all teams, and a model combining best team’s k-mer based sequence features and another paper’s biologically mechanistic features, in terms of all scoring metrics.
Moreover, our framework shows its capability of identifying generalizable fea- tures by interpreting the highly predictive models, and thereby provide support for associated hypothesized mechanisms about transcriptional regulation. With the learned sparse linear models, we got results supporting the following biological insights: a) TFs govern the probability of RNAPII recruitment and initiation possibly through interactions with PIC components and transcription cofactors; b) the core promoter amplifies the transcript production probably by influencing PIC formation, RNAPII recruitment, DNA melting, RNAPII searching for and selecting TSS, releasing RNAPII from general transcription factors, and thereby initiation; c) there is strong transcriptional synergy between TFs and core promoter elements; d) the regulatory elements within core promoter region are more than TATA box and nucleosome free region, suggesting the existence of still unidentified TAF-dependent and cofactor-dependent core promoter elements in yeast S. cerevisiae; e) nucleosome occupancy is helpful for representing +1 and -1 nucleosomes’ regulatory roles on transcription.