967 resultados para Railroad safety, Bayesian methods, Accident modification factor, Countermeasure selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

"Summary and analysis of accidents on railroads in the United States."

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Crash reduction factors (CRFs) are used to estimate the potential number of traffic crashes expected to be prevented from investment in safety improvement projects. The method used to develop CRFs in Florida has been based on the commonly used before-and-after approach. This approach suffers from a widely recognized problem known as regression-to-the-mean (RTM). The Empirical Bayes (EB) method has been introduced as a means to addressing the RTM problem. This method requires the information from both the treatment and reference sites in order to predict the expected number of crashes had the safety improvement projects at the treatment sites not been implemented. The information from the reference sites is estimated from a safety performance function (SPF), which is a mathematical relationship that links crashes to traffic exposure. The objective of this dissertation was to develop the SPFs for different functional classes of the Florida State Highway System. Crash data from years 2001 through 2003 along with traffic and geometric data were used in the SPF model development. SPFs for both rural and urban roadway categories were developed. The modeling data used were based on one-mile segments that contain homogeneous traffic and geometric conditions within each segment. Segments involving intersections were excluded. The scatter plots of data show that the relationships between crashes and traffic exposure are nonlinear, that crashes increase with traffic exposure in an increasing rate. Four regression models, namely, Poisson (PRM), Negative Binomial (NBRM), zero-inflated Poisson (ZIP), and zero-inflated Negative Binomial (ZINB), were fitted to the one-mile segment records for individual roadway categories. The best model was selected for each category based on a combination of the Likelihood Ratio test, the Vuong statistical test, and the Akaike's Information Criterion (AIC). The NBRM model was found to be appropriate for only one category and the ZINB model was found to be more appropriate for six other categories. The overall results show that the Negative Binomial distribution model generally provides a better fit for the data than the Poisson distribution model. In addition, the ZINB model was found to give the best fit when the count data exhibit excess zeros and over-dispersion for most of the roadway categories. While model validation shows that most data points fall within the 95% prediction intervals of the models developed, the Pearson goodness-of-fit measure does not show statistical significance. This is expected as traffic volume is only one of the many factors contributing to the overall crash experience, and that the SPFs are to be applied in conjunction with Accident Modification Factors (AMFs) to further account for the safety impacts of major geometric features before arriving at the final crash prediction. However, with improved traffic and crash data quality, the crash prediction power of SPF models may be further improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background and Purpose - Loss of motor function is common after stroke and leads to significant chronic disability. Stem cells are capable of self-renewal and of differentiating into multiple cell types, including neurones, glia, and vascular cells. We assessed the safety of granulocyte-colony-stimulating factor (G-CSF) after stroke and its effect on circulating CD34 stem cells. Methods - We performed a 2-center, dose-escalation, double-blind, randomized, placebo-controlled pilot trial (ISRCTN 16784092) of G-CSF (6 blocks of 1 to 10 g/kg SC, 1 or 5 daily doses) in 36 patients with recent ischemic stroke. Circulating CD34 stem cells were measured by flow cytometry; blood counts and measures of safety and functional outcome were also monitored. All measures were made blinded to treatment. Results - Thirty-six patients, whose mean SD age was 768 years and of whom 50% were male, were recruited. G-CSF (5 days of 10 g/kg) increased CD34 count in a dose-dependent manner, from 2.5 to 37.7 at day 5 (area under curve, P0.005). A dose-dependent rise in white cell count (P0.001) was also seen. There was no difference between treatment groups in the number of patients with serious adverse events: G-CSF, 7/24 (29%) versus placebo 3/12 (25%), or in their dependence (modified Rankin Scale, median 4, interquartile range, 3 to 5) at 90 days. Conclusions - ”G-CSF is effective at mobilizing bone marrow CD34 stem cells in patients with recent ischemic stroke. Administration is feasible and appears to be safe and well tolerated. The fate of mobilized cells and their effect on functional outcome remain to be determined. (Stroke. 2006;37:2979-2983.)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hardy-Weinberg Equilibrium (HWE) is an important genetic property that populations should have whenever they are not observing adverse situations as complete lack of panmixia, excess of mutations, excess of selection pressure, etc. HWE for decades has been evaluated; both frequentist and Bayesian methods are in use today. While historically the HWE formula was developed to examine the transmission of alleles in a population from one generation to the next, use of HWE concepts has expanded in human diseases studies to detect genotyping error and disease susceptibility (association); Ryckman and Williams (2008). Most analyses focus on trying to answer the question of whether a population is in HWE. They do not try to quantify how far from the equilibrium the population is. In this paper, we propose the use of a simple disequilibrium coefficient to a locus with two alleles. Based on the posterior density of this disequilibrium coefficient, we show how one can conduct a Bayesian analysis to verify how far from HWE a population is. There are other coefficients introduced in the literature and the advantage of the one introduced in this paper is the fact that, just like the standard correlation coefficients, its range is bounded and it is symmetric around zero (equilibrium) when comparing the positive and the negative values. To test the hypothesis of equilibrium, we use a simple Bayesian significance test, the Full Bayesian Significance Test (FBST); see Pereira, Stern andWechsler (2008) for a complete review. The disequilibrium coefficient proposed provides an easy and efficient way to make the analyses, especially if one uses Bayesian statistics. A routine in R programs (R Development Core Team, 2009) that implements the calculations is provided for the readers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The PHENIX experiment at the Relativistic Heavy Ion Collider has performed systematic measurements of phi meson production in the K(+)K(-) decay channel at midrapidity in p + p, d + Au, Cu + Cu, and Au + Au collisions at root s(NN) = 200 GeV. Results are presented on the phi invariant yield and the nuclear modification factor R(AA) for Au + Au and Cu + Cu, and R(dA) for d + Au collisions, studied as a function of transverse momentum (1 < p(T) < 7 GeV/c) and centrality. In central and midcentral Au + Au collisions, the R(AA) of phi exhibits a suppression relative to expectations from binary scaled p + p results. The amount of suppression is smaller than that of the pi(0) and the. in the intermediate p(T) range (2-5 GeV/c), whereas, at higher p(T), the phi, pi(0), and. show similar suppression. The baryon (proton and antiproton) excess observed in central Au + Au collisions at intermediate p(T) is not observed for the phi meson despite the similar masses of the proton and the phi. This suggests that the excess is linked to the number of valence quarks in the hadron rather than its mass. The difference gradually disappears with decreasing centrality, and, for peripheral collisions, the R(AA) values for both particle species are consistent with binary scaling. Cu + Cu collisions show the same yield and suppression as Au + Au collisions for the same number of N(part). The R(dA) of phi shows no evidence for cold nuclear effects within uncertainties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decades, the air traffic system has been changing to adapt itself to new social demands, mainly the safe growth of worldwide traffic capacity. Those changes are ruled by the Communication, Navigation, Surveillance/Air Traffic Management (CNS/ATM) paradigm, based on digital communication technologies (mainly satellites) as a way of improving communication, surveillance, navigation and air traffic management services. However, CNS/ATM poses new challenges and needs, mainly related to the safety assessment process. In face of these new challenges, and considering the main characteristics of the CNS/ATM, a methodology is proposed at this work by combining ""absolute"" and ""relative"" safety assessment methods adopted by the International Civil Aviation Organization (ICAO) in ICAO Doc.9689 [14], using Fluid Stochastic Petri Nets (FSPN) as the modeling formalism, and compares the safety metrics estimated from the simulation of both the proposed (in analysis) and the legacy system models. To demonstrate its usefulness, the proposed methodology was applied to the ""Automatic Dependent Surveillance-Broadcasting"" (ADS-B) based air traffic control system. As conclusions, the proposed methodology assured to assess CNS/ATM system safety properties, in which FSPN formalism provides important modeling capabilities, and discrete event simulation allowing the estimation of the desired safety metric. (C) 2011 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Block factor methods offer an attractive approach to forecasting with many predictors. These extract the information in these predictors into factors reflecting different blocks of variables (e.g. a price block, a housing block, a financial block, etc.). However, a forecasting model which simply includes all blocks as predictors risks being over-parameterized. Thus, it is desirable to use a methodology which allows for different parsimonious forecasting models to hold at different points in time. In this paper, we use dynamic model averaging and dynamic model selection to achieve this goal. These methods automatically alter the weights attached to different forecasting models as evidence comes in about which has forecast well in the recent past. In an empirical study involving forecasting output growth and inflation using 139 UK monthly time series variables, we find that the set of predictors changes substantially over time. Furthermore, our results show that dynamic model averaging and model selection can greatly improve forecast performance relative to traditional forecasting methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years there has been increasing concern about the identification of parameters in dynamic stochastic general equilibrium (DSGE) models. Given the structure of DSGE models it may be difficult to determine whether a parameter is identified. For the researcher using Bayesian methods, a lack of identification may not be evident since the posterior of a parameter of interest may differ from its prior even if the parameter is unidentified. We show that this can even be the case even if the priors assumed on the structural parameters are independent. We suggest two Bayesian identification indicators that do not suffer from this difficulty and are relatively easy to compute. The first applies to DSGE models where the parameters can be partitioned into those that are known to be identified and the rest where it is not known whether they are identified. In such cases the marginal posterior of an unidentified parameter will equal the posterior expectation of the prior for that parameter conditional on the identified parameters. The second indicator is more generally applicable and considers the rate at which the posterior precision gets updated as the sample size (T) is increased. For identified parameters the posterior precision rises with T, whilst for an unidentified parameter its posterior precision may be updated but its rate of update will be slower than T. This result assumes that the identified parameters are pT-consistent, but similar differential rates of updates for identified and unidentified parameters can be established in the case of super consistent estimators. These results are illustrated by means of simple DSGE models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Block factor methods offer an attractive approach to forecasting with many predictors. These extract the information in these predictors into factors reflecting different blocks of variables (e.g. a price block, a housing block, a financial block, etc.). However, a forecasting model which simply includes all blocks as predictors risks being over-parameterized. Thus, it is desirable to use a methodology which allows for different parsimonious forecasting models to hold at different points in time. In this paper, we use dynamic model averaging and dynamic model selection to achieve this goal. These methods automatically alter the weights attached to different forecasting model as evidence comes in about which has forecast well in the recent past. In an empirical study involving forecasting output and inflation using 139 UK monthly time series variables, we find that the set of predictors changes substantially over time. Furthermore, our results show that dynamic model averaging and model selection can greatly improve forecast performance relative to traditional forecasting methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper extends the Nelson-Siegel linear factor model by developing a flexible macro-finance framework for modeling and forecasting the term structure of US interest rates. Our approach is robust to parameter uncertainty and structural change, as we consider instabilities in parameters and volatilities, and our model averaging method allows for investors' model uncertainty over time. Our time-varying parameter Nelson-Siegel Dynamic Model Averaging (NS-DMA) predicts yields better than standard benchmarks and successfully captures plausible time-varying term premia in real time. The proposed model has significant in-sample and out-of-sample predictability for excess bond returns, and the predictability is of economic value.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper tries to resolve some of the main shortcomings in the empirical literature of location decisions for new plants, i.e. spatial effects and overdispersion. Spatial effects are omnipresent, being a source of overdispersion in the data as well as a factor shaping the functional relationship between the variables that explain a firm’s location decisions. Using Count Data models, empirical researchers have dealt with overdispersion and excess zeros by developments of the Poisson regression model. This study aims to take this a step further, by adopting Bayesian methods and models in order to tackle the excess of zeros, spatial and non-spatial overdispersion and spatial dependence simultaneously. Data for Catalonia is used and location determinants are analysed to that end. The results show that spatial effects are determinant. Additionally, overdispersion is descomposed into an unstructured iid effect and a spatially structured effect. Keywords: Bayesian Analysis, Spatial Models, Firm Location. JEL Classification: C11, C21, R30.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND The objective of this research was to evaluate data from a randomized clinical trial that tested injectable diacetylmorphine (DAM) and oral methadone (MMT) for substitution treatment, using a multi-domain dichotomous index, with a Bayesian approach. METHODS Sixty two long-term, socially-excluded heroin injectors, not benefiting from available treatments were randomized to receive either DAM or MMT for 9 months in Granada, Spain. Completers were 44 and data at the end of the study period was obtained for 50. Participants were determined to be responders or non responders using a multi-domain outcome index accounting for their physical and mental health and psychosocial integration, used in a previous trial. Data was analyzed with Bayesian methods, using information from a similar study conducted in The Netherlands to select a priori distributions. On adding the data from the present study to update the a priori information, the distribution of the difference in response rates were obtained and used to build credibility intervals and relevant probability computations. RESULTS In the experimental group (n = 27), the rate of responders to treatment was 70.4% (95% CI 53.287.6), and in the control group (n = 23), it was 34.8% (95% CI 15.354.3). The probability of success in the experimental group using the a posteriori distributions was higher after a proper sensitivity analysis. Almost the whole distribution of the rates difference (the one for diacetylmorphine minus methadone) was located to the right of the zero, indicating the superiority of the experimental treatment. CONCLUSION The present analysis suggests a clinical superiority of injectable diacetylmorphine compared to oral methadone in the treatment of severely affected heroin injectors not benefiting sufficiently from the available treatments. TRIAL REGISTRATION Current Controlled Trials ISRCTN52023186.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: European Surveillance of Congenital Anomalies (EUROCAT) is a network of population-based congenital anomaly registries in Europe surveying more than 1 million births per year, or 25% of the births in the European Union. This paper describes the potential of the EUROCAT collaboration for pharmacoepidemiology and drug safety surveillance. METHODS: The 34 full members and 6 associate members of the EUROCAT network were sent a questionnaire about their data sources on drug exposure and on drug coding. Available data on drug exposure during the first trimester available in the central EUROCAT database for the years 1996-2000 was summarised for 15 out of 25 responding full members. RESULTS: Of the 40 registries, 29 returned questionnaires (25 full and 4 associate members). Four of these registries do not collect data on maternal drug use. Of the full members, 15 registries use the EUROCAT drug code, 4 use the international ATC drug code, 3 registries use another coding system and 7 use a combination of these coding systems. Obstetric records are the most frequently used sources of drug information for the registries, followed by interviews with the mother. Only one registry uses pharmacy data. Percentages of cases with drug exposure (excluding vitamins/minerals) varied from 4.4% to 26.0% among different registries. The categories of drugs recorded varied widely between registries. CONCLUSIONS: Practices vary widely between registries regarding recording drug exposure information. EUROCAT has the potential to be an effective collaborative framework to contribute to post-marketing drug surveillance in relation to teratogenic effects, but work is needed to implement ATC drug coding more widely, and to diversify the sources of information used to determine drug exposure in each registry.