244 resultados para regression discrete models
Resumo:
Hot spot identification (HSID) aims to identify potential sites—roadway segments, intersections, crosswalks, interchanges, ramps, etc.—with disproportionately high crash risk relative to similar sites. An inefficient HSID methodology might result in either identifying a safe site as high risk (false positive) or a high risk site as safe (false negative), and consequently lead to the misuse the available public funds, to poor investment decisions, and to inefficient risk management practice. Current HSID methods suffer from issues like underreporting of minor injury and property damage only (PDO) crashes, challenges of accounting for crash severity into the methodology, and selection of a proper safety performance function to model crash data that is often heavily skewed by a preponderance of zeros. Addressing these challenges, this paper proposes a combination of a PDO equivalency calculation and quantile regression technique to identify hot spots in a transportation network. In particular, issues related to underreporting and crash severity are tackled by incorporating equivalent PDO crashes, whilst the concerns related to the non-count nature of equivalent PDO crashes and the skewness of crash data are addressed by the non-parametric quantile regression technique. The proposed method identifies covariate effects on various quantiles of a population, rather than the population mean like most methods in practice, which more closely corresponds with how black spots are identified in practice. The proposed methodology is illustrated using rural road segment data from Korea and compared against the traditional EB method with negative binomial regression. Application of a quantile regression model on equivalent PDO crashes enables identification of a set of high-risk sites that reflect the true safety costs to the society, simultaneously reduces the influence of under-reported PDO and minor injury crashes, and overcomes the limitation of traditional NB model in dealing with preponderance of zeros problem or right skewed dataset.
Resumo:
Due to knowledge gaps in relation to urban stormwater quality processes, an in-depth understanding of model uncertainty can enhance decision making. Uncertainty in stormwater quality models can originate from a range of sources such as the complexity of urban rainfall-runoff-stormwater pollutant processes and the paucity of observed data. Unfortunately, studies relating to epistemic uncertainty, which arises from the simplification of reality are limited and often deemed mostly unquantifiable. This paper presents a statistical modelling framework for ascertaining epistemic uncertainty associated with pollutant wash-off under a regression modelling paradigm using Ordinary Least Squares Regression (OLSR) and Weighted Least Squares Regression (WLSR) methods with a Bayesian/Gibbs sampling statistical approach. The study results confirmed that WLSR assuming probability distributed data provides more realistic uncertainty estimates of the observed and predicted wash-off values compared to OLSR modelling. It was also noted that the Bayesian/Gibbs sampling approach is superior compared to the most commonly adopted classical statistical and deterministic approaches commonly used in water quality modelling. The study outcomes confirmed that the predication error associated with wash-off replication is relatively higher due to limited data availability. The uncertainty analysis also highlighted the variability of the wash-off modelling coefficient k as a function of complex physical processes, which is primarily influenced by surface characteristics and rainfall intensity.
Resumo:
An important aspect of decision support systems involves applying sophisticated and flexible statistical models to real datasets and communicating these results to decision makers in interpretable ways. An important class of problem is the modelling of incidence such as fire, disease etc. Models of incidence known as point processes or Cox processes are particularly challenging as they are ‘doubly stochastic’ i.e. obtaining the probability mass function of incidents requires two integrals to be evaluated. Existing approaches to the problem either use simple models that obtain predictions using plug-in point estimates and do not distinguish between Cox processes and density estimation but do use sophisticated 3D visualization for interpretation. Alternatively other work employs sophisticated non-parametric Bayesian Cox process models, but do not use visualization to render interpretable complex spatial temporal forecasts. The contribution here is to fill this gap by inferring predictive distributions of Gaussian-log Cox processes and rendering them using state of the art 3D visualization techniques. This requires performing inference on an approximation of the model on a discretized grid of large scale and adapting an existing spatial-diurnal kernel to the log Gaussian Cox process context.
Resumo:
This paper presents a method for the estimation of thrust model parameters of uninhabited airborne systems using specific flight tests. Particular tests are proposed to simplify the estimation. The proposed estimation method is based on three steps. The first step uses a regression model in which the thrust is assumed constant. This allows us to obtain biased initial estimates of the aerodynamic coeficients of the surge model. In the second step, a robust nonlinear state estimator is implemented using the initial parameter estimates, and the model is augmented by considering the thrust as random walk. In the third step, the estimate of the thrust obtained by the observer is used to fit a polynomial model in terms of the propeller advanced ratio. We consider a numerical example based on Monte-Carlo simulations to quantify the sampling properties of the proposed estimator given realistic flight conditions.
Resumo:
This paper presents the application of a statistical method for model structure selection of lift-drag and viscous damping components in ship manoeuvring models. The damping model is posed as a family of linear stochastic models, which is postulated based on previous work in the literature. Then a nested test of hypothesis problem is considered. The testing reduces to a recursive comparison of two competing models, for which optimal tests in the Neyman sense exist. The method yields a preferred model structure and its initial parameter estimates. Alternatively, the method can give a reduced set of likely models. Using simulated data we study how the selection method performs when there is both uncorrelated and correlated noise in the measurements. The first case is related to instrumentation noise, whereas the second case is related to spurious wave-induced motion often present during sea trials. We then consider the model structure selection of a modern high-speed trimaran ferry from full scale trial data.
Resumo:
Commodity price modeling is normally approached in terms of structural time-series models, in which the different components (states) have a financial interpretation. The parameters of these models can be estimated using maximum likelihood. This approach results in a non-linear parameter estimation problem and thus a key issue is how to obtain reliable initial estimates. In this paper, we focus on the initial parameter estimation problem for the Schwartz-Smith two-factor model commonly used in asset valuation. We propose the use of a two-step method. The first step considers a univariate model based only on the spot price and uses a transfer function model to obtain initial estimates of the fundamental parameters. The second step uses the estimates obtained in the first step to initialize a re-parameterized state-space-innovations based estimator, which includes information related to future prices. The second step refines the estimates obtained in the first step and also gives estimates of the remaining parameters in the model. This paper is part tutorial in nature and gives an introduction to aspects of commodity price modeling and the associated parameter estimation problem.
Resumo:
In this paper, we present fully Bayesian experimental designs for nonlinear mixed effects models, in which we develop simulation-based optimal design methods to search over both continuous and discrete design spaces. Although Bayesian inference has commonly been performed on nonlinear mixed effects models, there is a lack of research into performing Bayesian optimal design for nonlinear mixed effects models that require searches to be performed over several design variables. This is likely due to the fact that it is much more computationally intensive to perform optimal experimental design for nonlinear mixed effects models than it is to perform inference in the Bayesian framework. In this paper, the design problem is to determine the optimal number of subjects and samples per subject, as well as the (near) optimal urine sampling times for a population pharmacokinetic study in horses, so that the population pharmacokinetic parameters can be precisely estimated, subject to cost constraints. The optimal sampling strategies, in terms of the number of subjects and the number of samples per subject, were found to be substantially different between the examples considered in this work, which highlights the fact that the designs are rather problem-dependent and require optimisation using the methods presented in this paper.
Resumo:
This paper uses a correlated multinomial logit model and a Poisson regression model to measure the factors affecting demand for different types of transportation by elderly and disabled people in rural Virginia. The major results are: (a) A paratransit system providing door-to-door service is highly valued by transportation-handicapped people; (b) Taxis are probably a potential but inferior alternative even when subsidized; (c) Buses are a poor alternative, especially in rural areas where distances to bus stops may be long; (d) Making buses handicap-accessible would have a statistically significant but small effect on mode choice; (e) Demand is price inelastic; and (f) The total number of trips taken is insensitive to mode availability and characteristics. These results suggest that transportation-handicapped people take a limited number of trips. Those they do take are in some sense necessary (given the low elasticity with respect to mode price or availability). People will substitute away from relying upon others when appropriate transportation is available, at least to some degree. But such transportation needs to be flexible enough to meet the needs of the people involved.
Resumo:
This paper demonstrates the use of a spreadsheet in exploring non-linear difference equations that describe digital control systems used in radio engineering, communication and computer architecture. These systems, being the focus of intensive studies of mathematicians and engineers over the last 40 years, may exhibit extremely complicated behaviour interpreted in contemporary terms as transition from global asymptotic stability to chaos through period-doubling bifurcations. The authors argue that embedding advanced mathematical ideas in the technological tool enables one to introduce fundamentals of discrete control systems in tertiary curricula without learners having to deal with complex machinery that rigorous mathematical methods of investigation require. In particular, in the appropriately designed spreadsheet environment, one can effectively visualize a qualitative difference in the behviour of systems with different types of non-linear characteristic.
Resumo:
This paper presents an event-based failure model to predict the number of failures that occur in water distribution assets. Often, such models have been based on analysis of historical failure data combined with pipe characteristics and environmental conditions. In this paper weather data have been added to the model to take into account the commonly observed seasonal variation of the failure rate. The theoretical basis of existing logistic regression models is briefly described in this paper, along with the refinements made to the model for inclusion of seasonal variation of weather. The performance of these refinements is tested using data from two Australian water authorities.
Resumo:
A discrete agent-based model on a periodic lattice of arbitrary dimension is considered. Agents move to nearest-neighbor sites by a motility mechanism accounting for general interactions, which may include volume exclusion. The partial differential equation describing the average occupancy of the agent population is derived systematically. A diffusion equation arises for all types of interactions and is nonlinear except for the simplest interactions. In addition, multiple species of interacting subpopulations give rise to an advection-diffusion equation for each subpopulation. This work extends and generalizes previous specific results, providing a construction method for determining the transport coefficients in terms of a single conditional transition probability, which depends on the occupancy of sites in an influence region. These coefficients characterize the diffusion of agents in a crowded environment in biological and physical processes.
Resumo:
Land-use regression (LUR) is a technique that can improve the accuracy of air pollution exposure assessment in epidemiological studies. Most LUR models are developed for single cities, which places limitations on their applicability to other locations. We sought to develop a model to predict nitrogen dioxide (NO2) concentrations with national coverage of Australia by using satellite observations of tropospheric NO2 columns combined with other predictor variables. We used a generalised estimating equation (GEE) model to predict annual and monthly average ambient NO2 concentrations measured by a national monitoring network from 2006 through 2011. The best annual model explained 81% of spatial variation in NO2 (absolute RMS error=1.4 ppb), while the best monthly model explained 76% (absolute RMS error=1.9 ppb). We applied our models to predict NO2 concentrations at the ~350,000 census mesh blocks across the country (a mesh block is the smallest spatial unit in the Australian census). National population-weighted average concentrations ranged from 7.3 ppb (2006) to 6.3 ppb (2011). We found that a simple approach using tropospheric NO2 column data yielded models with slightly better predictive ability than those produced using a more involved approach that required simulation of surface-to-column ratios. The models were capable of capturing within-urban variability in NO2, and offer the ability to estimate ambient NO2 concentrations at monthly and annual time scales across Australia from 2006–2011. We are making our model predictions freely available for research.
Resumo:
A computationally efficient sequential Monte Carlo algorithm is proposed for the sequential design of experiments for the collection of block data described by mixed effects models. The difficulty in applying a sequential Monte Carlo algorithm in such settings is the need to evaluate the observed data likelihood, which is typically intractable for all but linear Gaussian models. To overcome this difficulty, we propose to unbiasedly estimate the likelihood, and perform inference and make decisions based on an exact-approximate algorithm. Two estimates are proposed: using Quasi Monte Carlo methods and using the Laplace approximation with importance sampling. Both of these approaches can be computationally expensive, so we propose exploiting parallel computational architectures to ensure designs can be derived in a timely manner. We also extend our approach to allow for model uncertainty. This research is motivated by important pharmacological studies related to the treatment of critically ill patients.
Resumo:
Wound healing and tumour growth involve collective cell spreading, which is driven by individual motility and proliferation events within a population of cells. Mathematical models are often used to interpret experimental data and to estimate the parameters so that predictions can be made. Existing methods for parameter estimation typically assume that these parameters are constants and often ignore any uncertainty in the estimated values. We use approximate Bayesian computation (ABC) to estimate the cell diffusivity, D, and the cell proliferation rate, λ, from a discrete model of collective cell spreading, and we quantify the uncertainty associated with these estimates using Bayesian inference. We use a detailed experimental data set describing the collective cell spreading of 3T3 fibroblast cells. The ABC analysis is conducted for different combinations of initial cell densities and experimental times in two separate scenarios: (i) where collective cell spreading is driven by cell motility alone, and (ii) where collective cell spreading is driven by combined cell motility and cell proliferation. We find that D can be estimated precisely, with a small coefficient of variation (CV) of 2–6%. Our results indicate that D appears to depend on the experimental time, which is a feature that has been previously overlooked. Assuming that the values of D are the same in both experimental scenarios, we use the information about D from the first experimental scenario to obtain reasonably precise estimates of λ, with a CV between 4 and 12%. Our estimates of D and λ are consistent with previously reported values; however, our method is based on a straightforward measurement of the position of the leading edge whereas previous approaches have involved expensive cell counting techniques. Additional insights gained using a fully Bayesian approach justify the computational cost, especially since it allows us to accommodate information from different experiments in a principled way.
Resumo:
Tumour microenvironment greatly influences the development and metastasis of cancer progression. The development of three dimensional (3D) culture models which mimic that displayed in vivo can improve cancer biology studies and accelerate novel anticancer drug screening. Inspired by a systems biology approach, we have formed 3D in vitro bioengineered tumour angiogenesis microenvironments within a glycosaminoglycan-based hydrogel culture system. This microenvironment model can routinely recreate breast and prostate tumour vascularisation. The multiple cell types cultured within this model were less sensitive to chemotherapy when compared with two dimensional (2D) cultures, and displayed comparative tumour regression to that displayed in vivo. These features highlight the use of our in vitro culture model as a complementary testing platform in conjunction with animal models, addressing key reduction and replacement goals of the future. We anticipate that this biomimetic model will provide a platform for the in-depth analysis of cancer development and the discovery of novel therapeutic targets.