11 resultados para Negative Binomial Regression Model (NBRM)

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Crash reduction factors (CRFs) are used to estimate the potential number of traffic crashes expected to be prevented from investment in safety improvement projects. The method used to develop CRFs in Florida has been based on the commonly used before-and-after approach. This approach suffers from a widely recognized problem known as regression-to-the-mean (RTM). The Empirical Bayes (EB) method has been introduced as a means to addressing the RTM problem. This method requires the information from both the treatment and reference sites in order to predict the expected number of crashes had the safety improvement projects at the treatment sites not been implemented. The information from the reference sites is estimated from a safety performance function (SPF), which is a mathematical relationship that links crashes to traffic exposure. The objective of this dissertation was to develop the SPFs for different functional classes of the Florida State Highway System. Crash data from years 2001 through 2003 along with traffic and geometric data were used in the SPF model development. SPFs for both rural and urban roadway categories were developed. The modeling data used were based on one-mile segments that contain homogeneous traffic and geometric conditions within each segment. Segments involving intersections were excluded. The scatter plots of data show that the relationships between crashes and traffic exposure are nonlinear, that crashes increase with traffic exposure in an increasing rate. Four regression models, namely, Poisson (PRM), Negative Binomial (NBRM), zero-inflated Poisson (ZIP), and zero-inflated Negative Binomial (ZINB), were fitted to the one-mile segment records for individual roadway categories. The best model was selected for each category based on a combination of the Likelihood Ratio test, the Vuong statistical test, and the Akaike's Information Criterion (AIC). The NBRM model was found to be appropriate for only one category and the ZINB model was found to be more appropriate for six other categories. The overall results show that the Negative Binomial distribution model generally provides a better fit for the data than the Poisson distribution model. In addition, the ZINB model was found to give the best fit when the count data exhibit excess zeros and over-dispersion for most of the roadway categories. While model validation shows that most data points fall within the 95% prediction intervals of the models developed, the Pearson goodness-of-fit measure does not show statistical significance. This is expected as traffic volume is only one of the many factors contributing to the overall crash experience, and that the SPFs are to be applied in conjunction with Accident Modification Factors (AMFs) to further account for the safety impacts of major geometric features before arriving at the final crash prediction. However, with improved traffic and crash data quality, the crash prediction power of SPF models may be further improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiple linear regression model plays a key role in statistical inference and it has extensive applications in business, environmental, physical and social sciences. Multicollinearity has been a considerable problem in multiple regression analysis. When the regressor variables are multicollinear, it becomes difficult to make precise statistical inferences about the regression coefficients. There are some statistical methods that can be used, which are discussed in this thesis are ridge regression, Liu, two parameter biased and LASSO estimators. Firstly, an analytical comparison on the basis of risk was made among ridge, Liu and LASSO estimators under orthonormal regression model. I found that LASSO dominates least squares, ridge and Liu estimators over a significant portion of the parameter space for large dimension. Secondly, a simulation study was conducted to compare performance of ridge, Liu and two parameter biased estimator by their mean squared error criterion. I found that two parameter biased estimator performs better than its corresponding ridge regression estimator. Overall, Liu estimator performs better than both ridge and two parameter biased estimator.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Adaptability and invisibility are hallmarks of modern terrorism, and keeping pace with its dynamic nature presents a serious challenge for societies throughout the world. Innovations in computer science have incorporated applied mathematics to develop a wide array of predictive models to support the variety of approaches to counterterrorism. Predictive models are usually designed to forecast the location of attacks. Although this may protect individual structures or locations, it does not reduce the threat—it merely changes the target. While predictive models dedicated to events or social relationships receive much attention where the mathematical and social science communities intersect, models dedicated to terrorist locations such as safe-houses (rather than their targets or training sites) are rare and possibly nonexistent. At the time of this research, there were no publically available models designed to predict locations where violent extremists are likely to reside. This research uses France as a case study to present a complex systems model that incorporates multiple quantitative, qualitative and geospatial variables that differ in terms of scale, weight, and type. Though many of these variables are recognized by specialists in security studies, there remains controversy with respect to their relative importance, degree of interaction, and interdependence. Additionally, some of the variables proposed in this research are not generally recognized as drivers, yet they warrant examination based on their potential role within a complex system. This research tested multiple regression models and determined that geographically-weighted regression analysis produced the most accurate result to accommodate non-stationary coefficient behavior, demonstrating that geographic variables are critical to understanding and predicting the phenomenon of terrorism. This dissertation presents a flexible prototypical model that can be refined and applied to other regions to inform stakeholders such as policy-makers and law enforcement in their efforts to improve national security and enhance quality-of-life.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In 2010, the American Association of State Highway and Transportation Officials (AASHTO) released a safety analysis software system known as SafetyAnalyst. SafetyAnalyst implements the empirical Bayes (EB) method, which requires the use of Safety Performance Functions (SPFs). The system is equipped with a set of national default SPFs, and the software calibrates the default SPFs to represent the agency's safety performance. However, it is recommended that agencies generate agency-specific SPFs whenever possible. Many investigators support the view that the agency-specific SPFs represent the agency data better than the national default SPFs calibrated to agency data. Furthermore, it is believed that the crash trends in Florida are different from the states whose data were used to develop the national default SPFs. In this dissertation, Florida-specific SPFs were developed using the 2008 Roadway Characteristics Inventory (RCI) data and crash and traffic data from 2007-2010 for both total and fatal and injury (FI) crashes. The data were randomly divided into two sets, one for calibration (70% of the data) and another for validation (30% of the data). The negative binomial (NB) model was used to develop the Florida-specific SPFs for each of the subtypes of roadway segments, intersections and ramps, using the calibration data. Statistical goodness-of-fit tests were performed on the calibrated models, which were then validated using the validation data set. The results were compared in order to assess the transferability of the Florida-specific SPF models. The default SafetyAnalyst SPFs were calibrated to Florida data by adjusting the national default SPFs with local calibration factors. The performance of the Florida-specific SPFs and SafetyAnalyst default SPFs calibrated to Florida data were then compared using a number of methods, including visual plots and statistical goodness-of-fit tests. The plots of SPFs against the observed crash data were used to compare the prediction performance of the two models. Three goodness-of-fit tests, represented by the mean absolute deviance (MAD), the mean square prediction error (MSPE), and Freeman-Tukey R2 (R2FT), were also used for comparison in order to identify the better-fitting model. The results showed that Florida-specific SPFs yielded better prediction performance than the national default SPFs calibrated to Florida data. The performance of Florida-specific SPFs was further compared with that of the full SPFs, which include both traffic and geometric variables, in two major applications of SPFs, i.e., crash prediction and identification of high crash locations. The results showed that both SPF models yielded very similar performance in both applications. These empirical results support the use of the flow-only SPF models adopted in SafetyAnalyst, which require much less effort to develop compared to full SPFs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Highway Safety Manual (HSM) estimates roadway safety performance based on predictive models that were calibrated using national data. Calibration factors are then used to adjust these predictive models to local conditions for local applications. The HSM recommends that local calibration factors be estimated using 30 to 50 randomly selected sites that experienced at least a total of 100 crashes per year. It also recommends that the factors be updated every two to three years, preferably on an annual basis. However, these recommendations are primarily based on expert opinions rather than data-driven research findings. Furthermore, most agencies do not have data for many of the input variables recommended in the HSM. This dissertation is aimed at determining the best way to meet three major data needs affecting the estimation of calibration factors: (1) the required minimum sample sizes for different roadway facilities, (2) the required frequency for calibration factor updates, and (3) the influential variables affecting calibration factors. In this dissertation, statewide segment and intersection data were first collected for most of the HSM recommended calibration variables using a Google Maps application. In addition, eight years (2005-2012) of traffic and crash data were retrieved from existing databases from the Florida Department of Transportation. With these data, the effect of sample size criterion on calibration factor estimates was first studied using a sensitivity analysis. The results showed that the minimum sample sizes not only vary across different roadway facilities, but they are also significantly higher than those recommended in the HSM. In addition, results from paired sample t-tests showed that calibration factors in Florida need to be updated annually. To identify influential variables affecting the calibration factors for roadway segments, the variables were prioritized by combining the results from three different methods: negative binomial regression, random forests, and boosted regression trees. Only a few variables were found to explain most of the variation in the crash data. Traffic volume was consistently found to be the most influential. In addition, roadside object density, major and minor commercial driveway densities, and minor residential driveway density were also identified as influential variables.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In 2010, the American Association of State Highway and Transportation Officials (AASHTO) released a safety analysis software system known as SafetyAnalyst. SafetyAnalyst implements the empirical Bayes (EB) method, which requires the use of Safety Performance Functions (SPFs). The system is equipped with a set of national default SPFs, and the software calibrates the default SPFs to represent the agency’s safety performance. However, it is recommended that agencies generate agency-specific SPFs whenever possible. Many investigators support the view that the agency-specific SPFs represent the agency data better than the national default SPFs calibrated to agency data. Furthermore, it is believed that the crash trends in Florida are different from the states whose data were used to develop the national default SPFs. In this dissertation, Florida-specific SPFs were developed using the 2008 Roadway Characteristics Inventory (RCI) data and crash and traffic data from 2007-2010 for both total and fatal and injury (FI) crashes. The data were randomly divided into two sets, one for calibration (70% of the data) and another for validation (30% of the data). The negative binomial (NB) model was used to develop the Florida-specific SPFs for each of the subtypes of roadway segments, intersections and ramps, using the calibration data. Statistical goodness-of-fit tests were performed on the calibrated models, which were then validated using the validation data set. The results were compared in order to assess the transferability of the Florida-specific SPF models. The default SafetyAnalyst SPFs were calibrated to Florida data by adjusting the national default SPFs with local calibration factors. The performance of the Florida-specific SPFs and SafetyAnalyst default SPFs calibrated to Florida data were then compared using a number of methods, including visual plots and statistical goodness-of-fit tests. The plots of SPFs against the observed crash data were used to compare the prediction performance of the two models. Three goodness-of-fit tests, represented by the mean absolute deviance (MAD), the mean square prediction error (MSPE), and Freeman-Tukey R2 (R2FT), were also used for comparison in order to identify the better-fitting model. The results showed that Florida-specific SPFs yielded better prediction performance than the national default SPFs calibrated to Florida data. The performance of Florida-specific SPFs was further compared with that of the full SPFs, which include both traffic and geometric variables, in two major applications of SPFs, i.e., crash prediction and identification of high crash locations. The results showed that both SPF models yielded very similar performance in both applications. These empirical results support the use of the flow-only SPF models adopted in SafetyAnalyst, which require much less effort to develop compared to full SPFs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Run-off-road (ROR) crashes have increasingly become a serious concern for transportation officials in the State of Florida. These types of crashes have increased proportionally in recent years statewide and have been the focus of the Florida Department of Transportation. The goal of this research was to develop statistical models that can be used to investigate the possible causal relationships between roadway geometric features and ROR crashes on Florida's rural and urban principal arterials. ^ In this research, Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) Regression models were used to better model the excessive number of roadway segments with no ROR crashes. Since Florida covers a diverse area and since there are sixty-seven counties, it was divided into four geographical regions to minimize possible unobserved heterogeneity. Three years of crash data (2000–2002) encompassing those for principal arterials on the Florida State Highway System were used. Several statistical models based on the ZIP and ZINB regression methods were fitted to predict the expected number of ROR crashes on urban and rural roads for each region. Each region was further divided into urban and rural areas, resulting in a total of eight crash models. A best-fit predictive model was identified for each of these eight models in terms of AIC values. The ZINB regression was found to be appropriate for seven of the eight models and the ZIP regression was found to be more appropriate for the remaining model. To achieve model convergence, some explanatory variables that were not statistically significant were included. Therefore, strong conclusions cannot be derived from some of these models. ^ Given the complex nature of crashes, recommendations for additional research are made. The interaction of weather and human condition would be quite valuable in discerning additional causal relationships for these types of crashes. Additionally, roadside data should be considered and incorporated into future research of ROR crashes. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tropical coastal marine ecosystems including mangroves, seagrass beds and coral reef communities are undergoing intense degradation in response to natural and human disturbances, therefore, understanding the causes and mechanisms present challenges for scientist and managers. In order to protect our marine resources, determining the effects of nutrient loads on these coastal systems has become a key management goal. Data from monitoring programs were used to detect trends of macroalgae abundances and develop correlations with nutrient availability, as well as forecast potential responses of the communities monitored. Using eight years of data (1996–2003) from complementary but independent monitoring programs in seagrass beds and water quality of the Florida Keys National Marine Sanctuary (FKNMS), we: (1) described the distribution and abundance of macroalgae groups; (2) analyzed the status and spatiotemporal trends of macroalgae groups; and (3) explored the connection between water quality and the macroalgae distribution in the FKNMS. In the seagrass beds of the FKNMS calcareous green algae were the dominant macroalgae group followed by the red group; brown and calcareous red algae were present but in lower abundance. Spatiotemporal patterns of the macroalgae groups were analyzed with a non-linear regression model of the abundance data. For the period of record, all macroalgae groups increased in abundance (Abi) at most sites, with calcareous green algae increasing the most. Calcareous green algae and red algae exhibited seasonal pattern with peak abundances (Φi) mainly in summer for calcareous green and mainly in winter for red. Macroalgae Abi and long-term trend (mi) were correlated in a distinctive way with water quality parameters. Both the Abi and mi of calcareous green algae had positive correlations with NO3−, NO2−, total nitrogen (TN) and total organic carbon (TOC). Red algae Abi had a positive correlation with NO2−, TN, total phosphorus and TOC, and the mi in red algae was positively correlated with N:P. In contrast brown and calcareous red algae Abi had negative correlations with N:P. These results suggest that calcareous green algae and red algae are responding mainly to increases in N availability, a process that is happening in inshore sites. A combination of spatially variable factors such as local current patterns, nutrient sources, and habitat characteristics result in a complex array of the macroalgae community in the seagrass beds of the FKNMS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation focused on the longitudinal analysis of business start-ups using three waves of data from the Kauffman Firm Survey. ^ The first essay used the data from years 2004-2008, and examined the simultaneous relationship between a firm's capital structure, human resource policies, and its impact on the level of innovation. The firm leverage was calculated as, debt divided by total financial resources. Index of employee well-being was determined by a set of nine dichotomous questions asked in the survey. A negative binomial fixed effects model was used to analyze the effect of employee well-being and leverage on the count data of patents and copyrights, which were used as a proxy for innovation. The paper demonstrated that employee well-being positively affects the firm's innovation, while a higher leverage ratio had a negative impact on the innovation. No significant relation was found between leverage and employee well-being.^ The second essay used the data from years 2004-2009, and inquired whether a higher entrepreneurial speed of learning is desirable, and whether there is a linkage between the speed of learning and growth rate of the firm. The change in the speed of learning was measured using a pooled OLS estimator in repeated cross-sections. There was evidence of a declining speed of learning over time, and it was concluded that a higher speed of learning is not necessarily a good thing, because speed of learning is contingent on the entrepreneur's initial knowledge, and the precision of the signals he receives from the market. Also, there was no reason to expect speed of learning to be related to the growth of the firm in one direction over another.^ The third essay used the data from years 2004-2010, and determined the timing of diversification activities by the business start-ups. It captured when a start-up diversified for the first time, and explored the association between an early diversification strategy adopted by a firm, and its survival rate. A semi-parametric Cox proportional hazard model was used to examine the survival pattern. The results demonstrated that firms diversifying at an early stage in their lives show a higher survival rate; however, this effect fades over time.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The dissertation takes a multivariate approach to answer the question of how applicant age, after controlling for other variables, affects employment success in a public organization. In addition to applicant age, there are five other categories of variables examined: organization/applicant variables describing the relationship of the applicant to the organization; organization/position variables describing the target position as it relates to the organization; episodic variables such as applicant age relative to the ages of competing applicants; economic variables relating to the salary needs of older applicants; and cognitive variables that may affect the decision maker's evaluation of the applicant. ^ An exploratory phase of research employs archival data from approximately 500 decisions made in the past three years to hire or promote applicants for positions in one public health administration organization. A logit regression model is employed to examine the probability that the variables modify the effect of applicant age on employment success. A confirmatory phase of the dissertation is a controlled experiment in which hiring decision makers from the same public organization perform a simulated hiring decision exercise to evaluate hypothetical applicants of similar qualifications but of different ages. The responses of the decision makers to a series of bipolar adjective scales add support to the cognitive component of the theoretical model of the hiring decision. A final section contains information gathered from interviews with key informants. ^ Applicant age has tended to have a curvilinear relationship with employment success. For some positions, the mean age of the applicants most likely to succeed varies with the values of the five groups of moderating variables. The research contributes not only to the practice of public personnel administration, but is useful in examining larger public policy issues associated with an aging workforce. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation focused on the longitudinal analysis of business start-ups using three waves of data from the Kauffman Firm Survey. The first essay used the data from years 2004-2008, and examined the simultaneous relationship between a firm’s capital structure, human resource policies, and its impact on the level of innovation. The firm leverage was calculated as, debt divided by total financial resources. Index of employee well-being was determined by a set of nine dichotomous questions asked in the survey. A negative binomial fixed effects model was used to analyze the effect of employee well-being and leverage on the count data of patents and copyrights, which were used as a proxy for innovation. The paper demonstrated that employee well-being positively affects the firm's innovation, while a higher leverage ratio had a negative impact on the innovation. No significant relation was found between leverage and employee well-being. The second essay used the data from years 2004-2009, and inquired whether a higher entrepreneurial speed of learning is desirable, and whether there is a linkage between the speed of learning and growth rate of the firm. The change in the speed of learning was measured using a pooled OLS estimator in repeated cross-sections. There was evidence of a declining speed of learning over time, and it was concluded that a higher speed of learning is not necessarily a good thing, because speed of learning is contingent on the entrepreneur's initial knowledge, and the precision of the signals he receives from the market. Also, there was no reason to expect speed of learning to be related to the growth of the firm in one direction over another. The third essay used the data from years 2004-2010, and determined the timing of diversification activities by the business start-ups. It captured when a start-up diversified for the first time, and explored the association between an early diversification strategy adopted by a firm, and its survival rate. A semi-parametric Cox proportional hazard model was used to examine the survival pattern. The results demonstrated that firms diversifying at an early stage in their lives show a higher survival rate; however, this effect fades over time.