885 resultados para negative binomial


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper explains how Poisson regression can be used in studies in which the dependent variable describes the number of occurrences of some rare event such as suicide. After pointing out why ordinary linear regression is inappropriate for treating dependent variables of this sort, we go on to present the basic Poisson regression model and show how it fits in the broad class of generalized linear models. Then we turn to discussing a major problem of Poisson regression known as overdispersion and suggest possible solutions, including the correction of standard errors and negative binomial regression. The paper ends with a detailed empirical example, drawn from our own research on suicide.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Run-off-road (ROR) crashes have increasingly become a serious concern for transportation officials in the State of Florida. These types of crashes have increased proportionally in recent years statewide and have been the focus of the Florida Department of Transportation. The goal of this research was to develop statistical models that can be used to investigate the possible causal relationships between roadway geometric features and ROR crashes on Florida's rural and urban principal arterials. ^ In this research, Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) Regression models were used to better model the excessive number of roadway segments with no ROR crashes. Since Florida covers a diverse area and since there are sixty-seven counties, it was divided into four geographical regions to minimize possible unobserved heterogeneity. Three years of crash data (2000–2002) encompassing those for principal arterials on the Florida State Highway System were used. Several statistical models based on the ZIP and ZINB regression methods were fitted to predict the expected number of ROR crashes on urban and rural roads for each region. Each region was further divided into urban and rural areas, resulting in a total of eight crash models. A best-fit predictive model was identified for each of these eight models in terms of AIC values. The ZINB regression was found to be appropriate for seven of the eight models and the ZIP regression was found to be more appropriate for the remaining model. To achieve model convergence, some explanatory variables that were not statistically significant were included. Therefore, strong conclusions cannot be derived from some of these models. ^ Given the complex nature of crashes, recommendations for additional research are made. The interaction of weather and human condition would be quite valuable in discerning additional causal relationships for these types of crashes. Additionally, roadside data should be considered and incorporated into future research of ROR crashes. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Crash reduction factors (CRFs) are used to estimate the potential number of traffic crashes expected to be prevented from investment in safety improvement projects. The method used to develop CRFs in Florida has been based on the commonly used before-and-after approach. This approach suffers from a widely recognized problem known as regression-to-the-mean (RTM). The Empirical Bayes (EB) method has been introduced as a means to addressing the RTM problem. This method requires the information from both the treatment and reference sites in order to predict the expected number of crashes had the safety improvement projects at the treatment sites not been implemented. The information from the reference sites is estimated from a safety performance function (SPF), which is a mathematical relationship that links crashes to traffic exposure. The objective of this dissertation was to develop the SPFs for different functional classes of the Florida State Highway System. Crash data from years 2001 through 2003 along with traffic and geometric data were used in the SPF model development. SPFs for both rural and urban roadway categories were developed. The modeling data used were based on one-mile segments that contain homogeneous traffic and geometric conditions within each segment. Segments involving intersections were excluded. The scatter plots of data show that the relationships between crashes and traffic exposure are nonlinear, that crashes increase with traffic exposure in an increasing rate. Four regression models, namely, Poisson (PRM), Negative Binomial (NBRM), zero-inflated Poisson (ZIP), and zero-inflated Negative Binomial (ZINB), were fitted to the one-mile segment records for individual roadway categories. The best model was selected for each category based on a combination of the Likelihood Ratio test, the Vuong statistical test, and the Akaike's Information Criterion (AIC). The NBRM model was found to be appropriate for only one category and the ZINB model was found to be more appropriate for six other categories. The overall results show that the Negative Binomial distribution model generally provides a better fit for the data than the Poisson distribution model. In addition, the ZINB model was found to give the best fit when the count data exhibit excess zeros and over-dispersion for most of the roadway categories. While model validation shows that most data points fall within the 95% prediction intervals of the models developed, the Pearson goodness-of-fit measure does not show statistical significance. This is expected as traffic volume is only one of the many factors contributing to the overall crash experience, and that the SPFs are to be applied in conjunction with Accident Modification Factors (AMFs) to further account for the safety impacts of major geometric features before arriving at the final crash prediction. However, with improved traffic and crash data quality, the crash prediction power of SPF models may be further improved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This dissertation focused on the longitudinal analysis of business start-ups using three waves of data from the Kauffman Firm Survey. ^ The first essay used the data from years 2004-2008, and examined the simultaneous relationship between a firm's capital structure, human resource policies, and its impact on the level of innovation. The firm leverage was calculated as, debt divided by total financial resources. Index of employee well-being was determined by a set of nine dichotomous questions asked in the survey. A negative binomial fixed effects model was used to analyze the effect of employee well-being and leverage on the count data of patents and copyrights, which were used as a proxy for innovation. The paper demonstrated that employee well-being positively affects the firm's innovation, while a higher leverage ratio had a negative impact on the innovation. No significant relation was found between leverage and employee well-being.^ The second essay used the data from years 2004-2009, and inquired whether a higher entrepreneurial speed of learning is desirable, and whether there is a linkage between the speed of learning and growth rate of the firm. The change in the speed of learning was measured using a pooled OLS estimator in repeated cross-sections. There was evidence of a declining speed of learning over time, and it was concluded that a higher speed of learning is not necessarily a good thing, because speed of learning is contingent on the entrepreneur's initial knowledge, and the precision of the signals he receives from the market. Also, there was no reason to expect speed of learning to be related to the growth of the firm in one direction over another.^ The third essay used the data from years 2004-2010, and determined the timing of diversification activities by the business start-ups. It captured when a start-up diversified for the first time, and explored the association between an early diversification strategy adopted by a firm, and its survival rate. A semi-parametric Cox proportional hazard model was used to examine the survival pattern. The results demonstrated that firms diversifying at an early stage in their lives show a higher survival rate; however, this effect fades over time.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In 2010, the American Association of State Highway and Transportation Officials (AASHTO) released a safety analysis software system known as SafetyAnalyst. SafetyAnalyst implements the empirical Bayes (EB) method, which requires the use of Safety Performance Functions (SPFs). The system is equipped with a set of national default SPFs, and the software calibrates the default SPFs to represent the agency's safety performance. However, it is recommended that agencies generate agency-specific SPFs whenever possible. Many investigators support the view that the agency-specific SPFs represent the agency data better than the national default SPFs calibrated to agency data. Furthermore, it is believed that the crash trends in Florida are different from the states whose data were used to develop the national default SPFs. In this dissertation, Florida-specific SPFs were developed using the 2008 Roadway Characteristics Inventory (RCI) data and crash and traffic data from 2007-2010 for both total and fatal and injury (FI) crashes. The data were randomly divided into two sets, one for calibration (70% of the data) and another for validation (30% of the data). The negative binomial (NB) model was used to develop the Florida-specific SPFs for each of the subtypes of roadway segments, intersections and ramps, using the calibration data. Statistical goodness-of-fit tests were performed on the calibrated models, which were then validated using the validation data set. The results were compared in order to assess the transferability of the Florida-specific SPF models. The default SafetyAnalyst SPFs were calibrated to Florida data by adjusting the national default SPFs with local calibration factors. The performance of the Florida-specific SPFs and SafetyAnalyst default SPFs calibrated to Florida data were then compared using a number of methods, including visual plots and statistical goodness-of-fit tests. The plots of SPFs against the observed crash data were used to compare the prediction performance of the two models. Three goodness-of-fit tests, represented by the mean absolute deviance (MAD), the mean square prediction error (MSPE), and Freeman-Tukey R2 (R2FT), were also used for comparison in order to identify the better-fitting model. The results showed that Florida-specific SPFs yielded better prediction performance than the national default SPFs calibrated to Florida data. The performance of Florida-specific SPFs was further compared with that of the full SPFs, which include both traffic and geometric variables, in two major applications of SPFs, i.e., crash prediction and identification of high crash locations. The results showed that both SPF models yielded very similar performance in both applications. These empirical results support the use of the flow-only SPF models adopted in SafetyAnalyst, which require much less effort to develop compared to full SPFs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Highway Safety Manual (HSM) estimates roadway safety performance based on predictive models that were calibrated using national data. Calibration factors are then used to adjust these predictive models to local conditions for local applications. The HSM recommends that local calibration factors be estimated using 30 to 50 randomly selected sites that experienced at least a total of 100 crashes per year. It also recommends that the factors be updated every two to three years, preferably on an annual basis. However, these recommendations are primarily based on expert opinions rather than data-driven research findings. Furthermore, most agencies do not have data for many of the input variables recommended in the HSM. This dissertation is aimed at determining the best way to meet three major data needs affecting the estimation of calibration factors: (1) the required minimum sample sizes for different roadway facilities, (2) the required frequency for calibration factor updates, and (3) the influential variables affecting calibration factors. In this dissertation, statewide segment and intersection data were first collected for most of the HSM recommended calibration variables using a Google Maps application. In addition, eight years (2005-2012) of traffic and crash data were retrieved from existing databases from the Florida Department of Transportation. With these data, the effect of sample size criterion on calibration factor estimates was first studied using a sensitivity analysis. The results showed that the minimum sample sizes not only vary across different roadway facilities, but they are also significantly higher than those recommended in the HSM. In addition, results from paired sample t-tests showed that calibration factors in Florida need to be updated annually. To identify influential variables affecting the calibration factors for roadway segments, the variables were prioritized by combining the results from three different methods: negative binomial regression, random forests, and boosted regression trees. Only a few variables were found to explain most of the variation in the crash data. Traffic volume was consistently found to be the most influential. In addition, roadside object density, major and minor commercial driveway densities, and minor residential driveway density were also identified as influential variables.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Over the past decade, annual heath exams have been de-emphasized for the general population but emphasized for adults with intellectual and developmental disabilities (IDD). The purpose of this project was to determine if there has been an increase in the uptake of the health exam among adults with IDD in Ontario, to what extent, and the effect on the quality of preventive care provided. Methods: Using administrative health data, the proportion of adults (18-64 years old) with IDD who received a health exam (long appointment, general assessment, and “true” health exam), a high value on the primary care quality composite score (PCQS), and a health exam or high PCQS each year was compared to the proportion in a propensity score matched sample of the general population. Negative binomial and segmented negative binomial regression controlling for age and sex were used to determine the relative risk of having a health exam/high PCQS/health exam or PCQS over time. Results: Pre joinpoint, the long appointment and general assessment health exam definitions saw a decrease and the “true” health exam saw an increase in the likelihood of adults having a health exam. Post joinpoint, all health exam definitions saw a decrease in the likelihood of adults having a health exam. Pre joinpoint, all PCQS measures (high PCQS, long appointment or high PCQS, “true” health exam or high PCQS) saw an increase in the likelihood for adults to achieve a high PCQS or high PCQS/have a health exam. Post joinpoint, all PCQS measures saw a decrease in the likelihood for adults to achieve a high PCQS or high PCQS/have a health exam. Achieving a high PCQS was strongly associated with having a health exam regardless of health exam definition or IDD status. Conclusions: Despite the publication of guidelines, only a small proportion of adults with IDD are receiving health exams. This indicates that the publication of guidelines alone was not sufficient to change practice. More targeted measures, such as the implementation of an IDD-specific health exam fee code, should be considered to increase the uptake of the health exam among adults with IDD.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Temporal replicate counts are often aggregated to improve model fit by reducing zero-inflation and count variability, and in the case of migration counts collected hourly throughout a migration, allows one to ignore nonindependence. However, aggregation can represent a loss of potentially useful information on the hourly or seasonal distribution of counts, which might impact our ability to estimate reliable trends. We simulated 20-year hourly raptor migration count datasets with known rate of change to test the effect of aggregating hourly counts to daily or annual totals on our ability to recover known trend. We simulated data for three types of species, to test whether results varied with species abundance or migration strategy: a commonly detected species, e.g., Northern Harrier, Circus cyaneus; a rarely detected species, e.g., Peregrine Falcon, Falco peregrinus; and a species typically counted in large aggregations with overdispersed counts, e.g., Broad-winged Hawk, Buteo platypterus. We compared accuracy and precision of estimated trends across species and count types (hourly/daily/annual) using hierarchical models that assumed a Poisson, negative binomial (NB) or zero-inflated negative binomial (ZINB) count distribution. We found little benefit of modeling zero-inflation or of modeling the hourly distribution of migration counts. For the rare species, trends analyzed using daily totals and an NB or ZINB data distribution resulted in a higher probability of detecting an accurate and precise trend. In contrast, trends of the common and overdispersed species benefited from aggregation to annual totals, and for the overdispersed species in particular, trends estimating using annual totals were more precise, and resulted in lower probabilities of estimating a trend (1) in the wrong direction, or (2) with credible intervals that excluded the true trend, as compared with hourly and daily counts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Este estudio presenta la validación de las observaciones que realizó el programa de observación pesquera llamado Programa Bitácoras de Pesca (PBP) durante el periodo 2005 - 2011 en el área de distribución donde operan las embarcaciones industriales de cerco dedicadas a la pesca del stock norte-centro de la anchoveta peruana (Engraulis ringens). Además, durante ese mismo periodo y área de distribución, se estimó la magnitud del descarte por exceso de captura, descarte de juveniles y la captura incidental de dicha pesquera. Se observaron 3 768 viajes de un total de 302 859, representando un porcentaje de 1.2 %. Los datos del descarte por exceso de captura, descarte de juveniles y captura incidental registrados en los viajes observados, se caracterizaron por presentar un alta proporción de ceros. Para la validación de las observaciones, se realizó un estudio de simulación basado en la metodología de Monte Carlo usando un modelo de distribución binomial negativo. Esta permite inferir sobre el nivel de cobertura óptima y conocer si la información obtenida en el programa de observación es contable. De este análisis, se concluye que los niveles de observación actual se deberían incrementar hasta tener un nivel de cobertura de al menos el 10% del total de viajes que realicen en el año las embarcaciones industriales de cerco dedicadas a la pesca del stock norte-centro de la anchoveta peruana. La estimación del descarte por exceso de captura, descarte de juveniles y captura incidental se realizó mediante tres metodologías: Bootstrap, Modelo General Lineal (GLM) y Modelo Delta. Cada metodología estimó distintas magnitudes con tendencias similares. Las magnitudes estimadas fueron comparadas usando un ANOVA Bayesiano, la cual muestra que hubo escasa evidencia que las magnitudes estimadas del descarte por exceso de captura por metodología sean diferentes, lo mismo se presentó para el caso de la captura incidental, mientras que para el descarte de juveniles mostró que hubieron diferencias sustanciales de ser diferentes. La metodología que cumplió los supuestos y explico la mayor variabilidad de las variables modeladas fue el Modelo Delta, el cual parece ser una mejor alternativa para la estimación, debido a la alta proporción de ceros en los datos. Las estimaciones promedio del descarte por exceso de captura, descarte de juveniles y captura incidental aplicando el Modelo Delta, fueron 252 580, 41 772, 44 823 toneladas respectivamente, que en conjunto representaron el 5.74% de los desembarques. Además, con la magnitud de la estimación del descarte de juveniles, se realizó un ejercicio de proyección de biomasa bajo el escenario hipotético de no mortalidad por pesca y que los individuos juveniles descartados sólo presentaron tallas de 8 y 11 cm., en la cual se obtuvo que la biomasa que no estará disponible a la pesca está entre los 52 mil y 93 mil toneladas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Building on previous research, the goal of this project was to identify significant influencing factors for the Iowa Department of Transportation (DOT) to consider in future updates of its Instructional Memorandum (I.M.) 3.213, which provides guidelines for determining the need for traffic barriers (guardrail and bridge rail) at secondary roadway bridges—specifically, factors that might be significant for the bridge rail rating system component of I.M. 3.213. A literature review was conducted of policies and guidelines in other states and, specifically, of studies related to traffic barrier safety countermeasures at bridges in several states. In addition, a safety impact study was conducted to evaluate possible non-driver-related behavior characteristics of crashes on secondary road structures in Iowa using road data, structure data, and crash data from 2004 to 2013. Statistical models (negative binomial regression) were used to determine which factors were significant in terms of crash volume and crash severity. The study found that crashes are somewhat more frequent on or at bridges possessing certain characteristics—traffic volume greater than 400 vehicles per day (vpd) (paved) or greater than 50 vpd (unpaved), bridge length greater than 150 ft (paved) or greater than 35 ft (unpaved), bridge width narrower than its approach (paved) or narrower than 20 ft (unpaved), and bridges older than 25 years (both paved and unpaved). No specific roadway or bridge characteristic was found to contribute to more serious crashes. The study also confirmed previous research findings that crashes with bridges on secondary roads are rare, low-severity events. Although the findings of the study support the need for appropriate use of bridge rails, it concludes that prescriptive guidelines for bridge rail use on secondary roads may not be necessary, given the limited crash expectancy and lack of differences in crash expectancy among the various combinations of explanatory characteristics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The relationship between workplace absenteeism and adverse lifestyle factors (smoking, physical inactivity and poor dietary patterns) remains ambiguous. Reliance on self-reported absenteeism and obesity measures may contribute to this uncertainty. Using objective absenteeism and health status measures, the present study aimed to investigate what health status outcomes and lifestyle factors influence workplace absenteeism. Cross-sectional data were obtained from a complex workplace dietary intervention trial, the Food Choice at Work Study. Four multinational manufacturing workplaces in Cork, Republic of Ireland. Participants included 540 randomly selected employees from the four workplaces. Annual count absenteeism data were collected. Physical assessments included objective health status measures (BMI, midway waist circumference and blood pressure). FFQ measured diet quality from which DASH (Dietary Approaches to Stop Hypertension) scores were constructed. A zero-inflated negative binomial (zinb) regression model examined associations between health status outcomes, lifestyle characteristics and absenteeism. The mean number of absences was 2·5 (sd 4·5) d. After controlling for sociodemographic and lifestyle characteristics, the zinb model indicated that absenteeism was positively associated with central obesity, increasing expected absence rate by 72 %. Consuming a high-quality diet and engaging in moderate levels of physical activity were negatively associated with absenteeism and reduced expected frequency by 50 % and 36 %, respectively. Being in a managerial/supervisory position also reduced expected frequency by 50 %. To reduce absenteeism, workplace health promotion policies should incorporate recommendations designed to prevent and manage excess weight, improve diet quality and increase physical activity levels of employees.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This dissertation focused on the longitudinal analysis of business start-ups using three waves of data from the Kauffman Firm Survey. The first essay used the data from years 2004-2008, and examined the simultaneous relationship between a firm’s capital structure, human resource policies, and its impact on the level of innovation. The firm leverage was calculated as, debt divided by total financial resources. Index of employee well-being was determined by a set of nine dichotomous questions asked in the survey. A negative binomial fixed effects model was used to analyze the effect of employee well-being and leverage on the count data of patents and copyrights, which were used as a proxy for innovation. The paper demonstrated that employee well-being positively affects the firm's innovation, while a higher leverage ratio had a negative impact on the innovation. No significant relation was found between leverage and employee well-being. The second essay used the data from years 2004-2009, and inquired whether a higher entrepreneurial speed of learning is desirable, and whether there is a linkage between the speed of learning and growth rate of the firm. The change in the speed of learning was measured using a pooled OLS estimator in repeated cross-sections. There was evidence of a declining speed of learning over time, and it was concluded that a higher speed of learning is not necessarily a good thing, because speed of learning is contingent on the entrepreneur's initial knowledge, and the precision of the signals he receives from the market. Also, there was no reason to expect speed of learning to be related to the growth of the firm in one direction over another. The third essay used the data from years 2004-2010, and determined the timing of diversification activities by the business start-ups. It captured when a start-up diversified for the first time, and explored the association between an early diversification strategy adopted by a firm, and its survival rate. A semi-parametric Cox proportional hazard model was used to examine the survival pattern. The results demonstrated that firms diversifying at an early stage in their lives show a higher survival rate; however, this effect fades over time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In 2010, the American Association of State Highway and Transportation Officials (AASHTO) released a safety analysis software system known as SafetyAnalyst. SafetyAnalyst implements the empirical Bayes (EB) method, which requires the use of Safety Performance Functions (SPFs). The system is equipped with a set of national default SPFs, and the software calibrates the default SPFs to represent the agency’s safety performance. However, it is recommended that agencies generate agency-specific SPFs whenever possible. Many investigators support the view that the agency-specific SPFs represent the agency data better than the national default SPFs calibrated to agency data. Furthermore, it is believed that the crash trends in Florida are different from the states whose data were used to develop the national default SPFs. In this dissertation, Florida-specific SPFs were developed using the 2008 Roadway Characteristics Inventory (RCI) data and crash and traffic data from 2007-2010 for both total and fatal and injury (FI) crashes. The data were randomly divided into two sets, one for calibration (70% of the data) and another for validation (30% of the data). The negative binomial (NB) model was used to develop the Florida-specific SPFs for each of the subtypes of roadway segments, intersections and ramps, using the calibration data. Statistical goodness-of-fit tests were performed on the calibrated models, which were then validated using the validation data set. The results were compared in order to assess the transferability of the Florida-specific SPF models. The default SafetyAnalyst SPFs were calibrated to Florida data by adjusting the national default SPFs with local calibration factors. The performance of the Florida-specific SPFs and SafetyAnalyst default SPFs calibrated to Florida data were then compared using a number of methods, including visual plots and statistical goodness-of-fit tests. The plots of SPFs against the observed crash data were used to compare the prediction performance of the two models. Three goodness-of-fit tests, represented by the mean absolute deviance (MAD), the mean square prediction error (MSPE), and Freeman-Tukey R2 (R2FT), were also used for comparison in order to identify the better-fitting model. The results showed that Florida-specific SPFs yielded better prediction performance than the national default SPFs calibrated to Florida data. The performance of Florida-specific SPFs was further compared with that of the full SPFs, which include both traffic and geometric variables, in two major applications of SPFs, i.e., crash prediction and identification of high crash locations. The results showed that both SPF models yielded very similar performance in both applications. These empirical results support the use of the flow-only SPF models adopted in SafetyAnalyst, which require much less effort to develop compared to full SPFs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The interactions between host individual, host population, and environmental factors modulate parasite abundance in a given host population. Since adult exophilic ticks are highly aggregated in red deer (Cervus elaphus) and this ungulate exhibits significant sexual size dimorphism, life history traits and segregation, we hypothesized that tick parasitism on males and hinds would be differentially influenced by each of these factors. To test the hypothesis, ticks from 306 red deer-182 males and 124 females-were collected during 7 years in a red deer population in south-central Spain. By using generalized linear models, with a negative binomial error distribution and a logarithmic link function, we modeled tick abundance on deer with 20 potential predictors. Three models were developed: one for red deer males, another for hinds, and one combining data for males and females and including "sex" as factor. Our rationale was that if tick burdens on males and hinds relate to the explanatory factors in a differential way, it is not possible to precisely and accurately predict the tick burden on one sex using the model fitted on the other sex, or with the model that combines data from both sexes. Our results showed that deer males were the primary target for ticks, the weight of each factor differed between sexes, and each sex specific model was not able to accurately predict burdens on the animals of the other sex. That is, results support for sex-biased differences. The higher weight of host individual and population factors in the model for males show that intrinsic deer factors more strongly explain tick burden than environmental host-seeking tick abundance. In contrast, environmental variables predominated in the models explaining tick burdens in hinds.