12 resultados para NEGATIVE BINOMIAL-DISTRIBUTION
em Digital Commons at Florida International University
Resumo:
Crash reduction factors (CRFs) are used to estimate the potential number of traffic crashes expected to be prevented from investment in safety improvement projects. The method used to develop CRFs in Florida has been based on the commonly used before-and-after approach. This approach suffers from a widely recognized problem known as regression-to-the-mean (RTM). The Empirical Bayes (EB) method has been introduced as a means to addressing the RTM problem. This method requires the information from both the treatment and reference sites in order to predict the expected number of crashes had the safety improvement projects at the treatment sites not been implemented. The information from the reference sites is estimated from a safety performance function (SPF), which is a mathematical relationship that links crashes to traffic exposure. The objective of this dissertation was to develop the SPFs for different functional classes of the Florida State Highway System. Crash data from years 2001 through 2003 along with traffic and geometric data were used in the SPF model development. SPFs for both rural and urban roadway categories were developed. The modeling data used were based on one-mile segments that contain homogeneous traffic and geometric conditions within each segment. Segments involving intersections were excluded. The scatter plots of data show that the relationships between crashes and traffic exposure are nonlinear, that crashes increase with traffic exposure in an increasing rate. Four regression models, namely, Poisson (PRM), Negative Binomial (NBRM), zero-inflated Poisson (ZIP), and zero-inflated Negative Binomial (ZINB), were fitted to the one-mile segment records for individual roadway categories. The best model was selected for each category based on a combination of the Likelihood Ratio test, the Vuong statistical test, and the Akaike's Information Criterion (AIC). The NBRM model was found to be appropriate for only one category and the ZINB model was found to be more appropriate for six other categories. The overall results show that the Negative Binomial distribution model generally provides a better fit for the data than the Poisson distribution model. In addition, the ZINB model was found to give the best fit when the count data exhibit excess zeros and over-dispersion for most of the roadway categories. While model validation shows that most data points fall within the 95% prediction intervals of the models developed, the Pearson goodness-of-fit measure does not show statistical significance. This is expected as traffic volume is only one of the many factors contributing to the overall crash experience, and that the SPFs are to be applied in conjunction with Accident Modification Factors (AMFs) to further account for the safety impacts of major geometric features before arriving at the final crash prediction. However, with improved traffic and crash data quality, the crash prediction power of SPF models may be further improved.
Resumo:
Run-off-road (ROR) crashes have increasingly become a serious concern for transportation officials in the State of Florida. These types of crashes have increased proportionally in recent years statewide and have been the focus of the Florida Department of Transportation. The goal of this research was to develop statistical models that can be used to investigate the possible causal relationships between roadway geometric features and ROR crashes on Florida's rural and urban principal arterials. ^ In this research, Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) Regression models were used to better model the excessive number of roadway segments with no ROR crashes. Since Florida covers a diverse area and since there are sixty-seven counties, it was divided into four geographical regions to minimize possible unobserved heterogeneity. Three years of crash data (2000–2002) encompassing those for principal arterials on the Florida State Highway System were used. Several statistical models based on the ZIP and ZINB regression methods were fitted to predict the expected number of ROR crashes on urban and rural roads for each region. Each region was further divided into urban and rural areas, resulting in a total of eight crash models. A best-fit predictive model was identified for each of these eight models in terms of AIC values. The ZINB regression was found to be appropriate for seven of the eight models and the ZIP regression was found to be more appropriate for the remaining model. To achieve model convergence, some explanatory variables that were not statistically significant were included. Therefore, strong conclusions cannot be derived from some of these models. ^ Given the complex nature of crashes, recommendations for additional research are made. The interaction of weather and human condition would be quite valuable in discerning additional causal relationships for these types of crashes. Additionally, roadside data should be considered and incorporated into future research of ROR crashes. ^
Resumo:
This dissertation focused on the longitudinal analysis of business start-ups using three waves of data from the Kauffman Firm Survey. ^ The first essay used the data from years 2004-2008, and examined the simultaneous relationship between a firm's capital structure, human resource policies, and its impact on the level of innovation. The firm leverage was calculated as, debt divided by total financial resources. Index of employee well-being was determined by a set of nine dichotomous questions asked in the survey. A negative binomial fixed effects model was used to analyze the effect of employee well-being and leverage on the count data of patents and copyrights, which were used as a proxy for innovation. The paper demonstrated that employee well-being positively affects the firm's innovation, while a higher leverage ratio had a negative impact on the innovation. No significant relation was found between leverage and employee well-being.^ The second essay used the data from years 2004-2009, and inquired whether a higher entrepreneurial speed of learning is desirable, and whether there is a linkage between the speed of learning and growth rate of the firm. The change in the speed of learning was measured using a pooled OLS estimator in repeated cross-sections. There was evidence of a declining speed of learning over time, and it was concluded that a higher speed of learning is not necessarily a good thing, because speed of learning is contingent on the entrepreneur's initial knowledge, and the precision of the signals he receives from the market. Also, there was no reason to expect speed of learning to be related to the growth of the firm in one direction over another.^ The third essay used the data from years 2004-2010, and determined the timing of diversification activities by the business start-ups. It captured when a start-up diversified for the first time, and explored the association between an early diversification strategy adopted by a firm, and its survival rate. A semi-parametric Cox proportional hazard model was used to examine the survival pattern. The results demonstrated that firms diversifying at an early stage in their lives show a higher survival rate; however, this effect fades over time.^
Resumo:
In 2010, the American Association of State Highway and Transportation Officials (AASHTO) released a safety analysis software system known as SafetyAnalyst. SafetyAnalyst implements the empirical Bayes (EB) method, which requires the use of Safety Performance Functions (SPFs). The system is equipped with a set of national default SPFs, and the software calibrates the default SPFs to represent the agency's safety performance. However, it is recommended that agencies generate agency-specific SPFs whenever possible. Many investigators support the view that the agency-specific SPFs represent the agency data better than the national default SPFs calibrated to agency data. Furthermore, it is believed that the crash trends in Florida are different from the states whose data were used to develop the national default SPFs. In this dissertation, Florida-specific SPFs were developed using the 2008 Roadway Characteristics Inventory (RCI) data and crash and traffic data from 2007-2010 for both total and fatal and injury (FI) crashes. The data were randomly divided into two sets, one for calibration (70% of the data) and another for validation (30% of the data). The negative binomial (NB) model was used to develop the Florida-specific SPFs for each of the subtypes of roadway segments, intersections and ramps, using the calibration data. Statistical goodness-of-fit tests were performed on the calibrated models, which were then validated using the validation data set. The results were compared in order to assess the transferability of the Florida-specific SPF models. The default SafetyAnalyst SPFs were calibrated to Florida data by adjusting the national default SPFs with local calibration factors. The performance of the Florida-specific SPFs and SafetyAnalyst default SPFs calibrated to Florida data were then compared using a number of methods, including visual plots and statistical goodness-of-fit tests. The plots of SPFs against the observed crash data were used to compare the prediction performance of the two models. Three goodness-of-fit tests, represented by the mean absolute deviance (MAD), the mean square prediction error (MSPE), and Freeman-Tukey R2 (R2FT), were also used for comparison in order to identify the better-fitting model. The results showed that Florida-specific SPFs yielded better prediction performance than the national default SPFs calibrated to Florida data. The performance of Florida-specific SPFs was further compared with that of the full SPFs, which include both traffic and geometric variables, in two major applications of SPFs, i.e., crash prediction and identification of high crash locations. The results showed that both SPF models yielded very similar performance in both applications. These empirical results support the use of the flow-only SPF models adopted in SafetyAnalyst, which require much less effort to develop compared to full SPFs.
Resumo:
The Highway Safety Manual (HSM) estimates roadway safety performance based on predictive models that were calibrated using national data. Calibration factors are then used to adjust these predictive models to local conditions for local applications. The HSM recommends that local calibration factors be estimated using 30 to 50 randomly selected sites that experienced at least a total of 100 crashes per year. It also recommends that the factors be updated every two to three years, preferably on an annual basis. However, these recommendations are primarily based on expert opinions rather than data-driven research findings. Furthermore, most agencies do not have data for many of the input variables recommended in the HSM. This dissertation is aimed at determining the best way to meet three major data needs affecting the estimation of calibration factors: (1) the required minimum sample sizes for different roadway facilities, (2) the required frequency for calibration factor updates, and (3) the influential variables affecting calibration factors. In this dissertation, statewide segment and intersection data were first collected for most of the HSM recommended calibration variables using a Google Maps application. In addition, eight years (2005-2012) of traffic and crash data were retrieved from existing databases from the Florida Department of Transportation. With these data, the effect of sample size criterion on calibration factor estimates was first studied using a sensitivity analysis. The results showed that the minimum sample sizes not only vary across different roadway facilities, but they are also significantly higher than those recommended in the HSM. In addition, results from paired sample t-tests showed that calibration factors in Florida need to be updated annually. To identify influential variables affecting the calibration factors for roadway segments, the variables were prioritized by combining the results from three different methods: negative binomial regression, random forests, and boosted regression trees. Only a few variables were found to explain most of the variation in the crash data. Traffic volume was consistently found to be the most influential. In addition, roadside object density, major and minor commercial driveway densities, and minor residential driveway density were also identified as influential variables.
Resumo:
This dissertation focused on the longitudinal analysis of business start-ups using three waves of data from the Kauffman Firm Survey. The first essay used the data from years 2004-2008, and examined the simultaneous relationship between a firm’s capital structure, human resource policies, and its impact on the level of innovation. The firm leverage was calculated as, debt divided by total financial resources. Index of employee well-being was determined by a set of nine dichotomous questions asked in the survey. A negative binomial fixed effects model was used to analyze the effect of employee well-being and leverage on the count data of patents and copyrights, which were used as a proxy for innovation. The paper demonstrated that employee well-being positively affects the firm's innovation, while a higher leverage ratio had a negative impact on the innovation. No significant relation was found between leverage and employee well-being. The second essay used the data from years 2004-2009, and inquired whether a higher entrepreneurial speed of learning is desirable, and whether there is a linkage between the speed of learning and growth rate of the firm. The change in the speed of learning was measured using a pooled OLS estimator in repeated cross-sections. There was evidence of a declining speed of learning over time, and it was concluded that a higher speed of learning is not necessarily a good thing, because speed of learning is contingent on the entrepreneur's initial knowledge, and the precision of the signals he receives from the market. Also, there was no reason to expect speed of learning to be related to the growth of the firm in one direction over another. The third essay used the data from years 2004-2010, and determined the timing of diversification activities by the business start-ups. It captured when a start-up diversified for the first time, and explored the association between an early diversification strategy adopted by a firm, and its survival rate. A semi-parametric Cox proportional hazard model was used to examine the survival pattern. The results demonstrated that firms diversifying at an early stage in their lives show a higher survival rate; however, this effect fades over time.
Resumo:
In 2010, the American Association of State Highway and Transportation Officials (AASHTO) released a safety analysis software system known as SafetyAnalyst. SafetyAnalyst implements the empirical Bayes (EB) method, which requires the use of Safety Performance Functions (SPFs). The system is equipped with a set of national default SPFs, and the software calibrates the default SPFs to represent the agency’s safety performance. However, it is recommended that agencies generate agency-specific SPFs whenever possible. Many investigators support the view that the agency-specific SPFs represent the agency data better than the national default SPFs calibrated to agency data. Furthermore, it is believed that the crash trends in Florida are different from the states whose data were used to develop the national default SPFs. In this dissertation, Florida-specific SPFs were developed using the 2008 Roadway Characteristics Inventory (RCI) data and crash and traffic data from 2007-2010 for both total and fatal and injury (FI) crashes. The data were randomly divided into two sets, one for calibration (70% of the data) and another for validation (30% of the data). The negative binomial (NB) model was used to develop the Florida-specific SPFs for each of the subtypes of roadway segments, intersections and ramps, using the calibration data. Statistical goodness-of-fit tests were performed on the calibrated models, which were then validated using the validation data set. The results were compared in order to assess the transferability of the Florida-specific SPF models. The default SafetyAnalyst SPFs were calibrated to Florida data by adjusting the national default SPFs with local calibration factors. The performance of the Florida-specific SPFs and SafetyAnalyst default SPFs calibrated to Florida data were then compared using a number of methods, including visual plots and statistical goodness-of-fit tests. The plots of SPFs against the observed crash data were used to compare the prediction performance of the two models. Three goodness-of-fit tests, represented by the mean absolute deviance (MAD), the mean square prediction error (MSPE), and Freeman-Tukey R2 (R2FT), were also used for comparison in order to identify the better-fitting model. The results showed that Florida-specific SPFs yielded better prediction performance than the national default SPFs calibrated to Florida data. The performance of Florida-specific SPFs was further compared with that of the full SPFs, which include both traffic and geometric variables, in two major applications of SPFs, i.e., crash prediction and identification of high crash locations. The results showed that both SPF models yielded very similar performance in both applications. These empirical results support the use of the flow-only SPF models adopted in SafetyAnalyst, which require much less effort to develop compared to full SPFs.
Resumo:
In this study, I determined the identity, taxonomic placement, and distribution of digenetic trematodes parasitizing the snails Pomacea paludosa and Planorbella duryi at Pa-hay-okee, Everglades National Park. I also characterized temporal and geographic variation in the probability of parasite infection for these snails based on two years of sampling. Although studies indicate that digenean parasites may have important effects both on individual species and the structure of communities, there have been no studies of digenean parasitism on snails within the Everglades ecosystem. For example, the endangered Everglade Snail Kite, a specialist that feeds almost exclusively on Pomacea paludosa, and is known to be a definitive host of digenean parasites, may suffer direct and indirect effects from consumption of parasitized apple snails. Therefore, information on the diversity and abundance of parasites harbored in snail populations in the Everglades should be of considerable interest for management and conservation of wildlife. Juvenile digeneans (cercariae) representing 20 species were isolated from these two snails, representing a quadrupling of the number of species known. Species were characterized based on morphological, morphometric, and sequence data (18S rDNA, COI, and ITS). Species richness of shed cercariae from P. duryi was greater than P. paludosa, with 13 and 7 species respectively. These species represented 14 families. P. paludosa and P. duryi had no digenean species in common. Probability of digenean infection was higher for P. duryi than P. paludosa and adults showed a greater risk of infection than juveniles for both of these snails. Planorbella duryi showed variation in probability of infection between sampling sites and hydrological seasons. The number of unique combinations of multi-species infections was greatest among P. duryi individuals, while the overall percentage of multi-species infections was greatest in P. paludosa. Analyses of six frequently-observed multiple infections from P. duryi suggest the presence of negative interactions, positive interactions, and neutral associations between larval digeneans. These results should contribute to an understanding of the factors controlling the abundance and distribution of key species in the Everglades ecosystem and may in particular help in the management and recovery planning for the Everglade Snail Kite.
Resumo:
Lateral load distribution factor is a key factor for designing and analyzing curved steel I-girder bridges. In this dissertation, the effects of various parameters on moment and shear distribution for curved steel I-girder bridges were studied using the Finite Element Method (FEM). The parameters considered in the study were: radius of curvature, girder spacing, overhang, span length, number of girders, ratio of girder stiffness to overall bridge stiffness, slab thickness, girder longitudinal stiffness, cross frame spacing, and girder torsional inertia. The variations of these parameters were based on the statistical analysis of the real bridge database, which was created by extracting data from existing or newly designed curved steel I-girder bridge plans collected all over the nation. A hypothetical bridge superstructure model that was made of all the mean values of the data was created and used for the parameter study. ^ The study showed that cross frame spacing and girder torsional inertia had negligible effects. Other parameters had been identified as key parameters. Regression analysis was conducted based on the FEM analysis results and simplified formulas for predicting positive moment, negative moment, and shear distribution factors were developed. Thirty-three real bridges were analyzed using FEM to verify the formulas. The ratio of the distribution factor obtained from the formula to the one obtained from the FEM analysis, which was referred to as the g-ratio, was examined. The results showed that the standard deviation of the g-ratios was within 0.04 to 0.06 and the mean value of the g-ratios was greater than unity by one standard deviation. This indicates that the formulas are conservative in most cases but not overly conservative. The final formulas are similar in format to the current American Association of State Highway and Transportation Officials (AASHTO) Load Resistance and Factor Design (LRFD) specifications. ^ The developed formulas were compared with other simplified methods. The outcomes showed that the proposed formulas had the most accurate results among all methods. ^ The formulas developed in this study will assist bridge engineers and researchers in predicting the actual live load distribution in horizontally curved steel I-girder bridges. ^
Resumo:
The purpose of this study was to determine whether there was a relationship between pressure to perform on state mandated, high-stakes tests and the rate of student escape behavior defined as the number of school suspensions and absences. The state assigned grade of a school was used as a surrogate measure of pressure with the assumption that pressure increased as the school grade decreased. Student attendance and suspension data were gathered from all 33 of the regular public high schools in Miami-Dade County Public Schools. The research questions were: Is the number of suspensions highest in the third quarter, when most FCAT preparation takes place for each of the 3 school years 2007-08 through 2009-10? How accurately does the high school's grade predict the number of suspensions and number of absences during each of the 4 school years 2005-06 through 2008-09? The research questions were answered using repeated measures analysis of variance for research question #1 and non-linear multiple regression for research question #2. No significant difference could be found between the numbers of suspensions in each of the grading periods nor was there a relationship between the number of suspensions and school grade. A statistically significant relationship was found between student attendance and school grade. When plotted, this relationship was found to be quadratic in nature and formed a loose inverted U for each of the four years during which data were collected. This indicated that students in very high and very low performing schools had low levels of absences while those in the midlevel of the distribution of school performance (C schools) had the greatest rates of absence. Identifying a relationship between the pressures associated with high stakes testing and student escape behavior suggests that it might be useful for building administrators to reevaluate test preparation activities and procedures being used in their building and to include anxiety reducing strategies. As a relationship was found, it sets the foundation for future studies to identify whether testing related activities are impacting some students emotionally and are causing unintended consequences of testing mandates.
Resumo:
Group testing has long been considered as a safe and sensible relative to one-at-a-time testing in applications where the prevalence rate p is small. In this thesis, we applied Bayes approach to estimate p using Beta-type prior distribution. First, we showed two Bayes estimators of p from prior on p derived from two different loss functions. Second, we presented two more Bayes estimators of p from prior on π according to two loss functions. We also displayed credible and HPD interval for p. In addition, we did intensive numerical studies. All results showed that the Bayes estimator was preferred over the usual maximum likelihood estimator (MLE) for small p. We also presented the optimal β for different p, m, and k.
Resumo:
The purpose of this study was to determine whether there was a relationship between pressure to perform on state mandated, high-stakes tests and the rate of student escape behavior defined as the number of school suspensions and absences. The state assigned grade of a school was used as a surrogate measure of pressure with the assumption that pressure increased as the school grade decreased. Student attendance and suspension data were gathered from all 33 of the regular public high schools in Miami-Dade County Public Schools. The research questions were: Is the number of suspensions highest in the third quarter, when most FCAT preparation takes place for each of the 3 school years 2007-08 through 2009-10? How accurately does the high school’s grade predict the number of suspensions and number of absences during each of the 4 school years 2005-06 through 2008-09? The research questions were answered using repeated measures analysis of variance for research question #1 and non-linear multiple regression for research question #2. No significant difference could be found between the numbers of suspensions in each of the grading periods nor was there a relationship between the number of suspensions and school grade. A statistically significant relationship was found between student attendance and school grade. When plotted, this relationship was found to be quadratic in nature and formed a loose inverted U for each of the four years during which data were collected. This indicated that students in very high and very low performing schools had low levels of absences while those in the midlevel of the distribution of school performance (C schools) had the greatest rates of absence. Identifying a relationship between the pressures associated with high stakes testing and student escape behavior suggests that it might be useful for building administrators to reevaluate test preparation activities and procedures being used in their building and to include anxiety reducing strategies. As a relationship was found, it sets the foundation for future studies to identify whether testing related activities are impacting some students emotionally and are causing unintended consequences of testing mandates.