Biblioteca Digital

937 resultados para Bose-Einstein condensation statistical model

Cluster-based network model for time-course gene expression data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.

Empirical investigation of interactive highway safety design model accident prediction algorithm : Rural intersections

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One major gap in transportation system safety management is the ability to assess the safety ramifications of design changes for both new road projects and modifications to existing roads. To fulfill this need, FHWA and its many partners are developing a safety forecasting tool, the Interactive Highway Safety Design Model (IHSDM). The tool will be used by roadway design engineers, safety analysts, and planners throughout the United States. As such, the statistical models embedded in IHSDM will need to be able to forecast safety impacts under a wide range of roadway configurations and environmental conditions for a wide range of driver populations and will need to be able to capture elements of driving risk across states. One of the IHSDM algorithms developed by FHWA and its contractors is for forecasting accidents on rural road segments and rural intersections. The methodological approach is to use predictive models for specific base conditions, with traffic volume information as the sole explanatory variable for crashes, and then to apply regional or state calibration factors and accident modification factors (AMFs) to estimate the impact on accidents of geometric characteristics that differ from the base model conditions. In the majority of past approaches, AMFs are derived from parameter estimates associated with the explanatory variables. A recent study for FHWA used a multistate database to examine in detail the use of the algorithm with the base model-AMF approach and explored alternative base model forms as well as the use of full models that included nontraffic-related variables and other approaches to estimate AMFs. That research effort is reported. The results support the IHSDM methodology.

Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros

Accident prediction model for railway-highway interfaces

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Considerable past research has explored relationships between vehicle accidents and geometric design and operation of road sections, but relatively little research has examined factors that contribute to accidents at railway-highway crossings. Between 1998 and 2002 in Korea, about 95% of railway accidents occurred at highway-rail grade crossings, resulting in 402 accidents, of which about 20% resulted in fatalities. These statistics suggest that efforts to reduce crashes at these locations may significantly reduce crash costs. The objective of this paper is to examine factors associated with railroad crossing crashes. Various statistical models are used to examine the relationships between crossing accidents and features of crossings. The paper also compares accident models developed in the United States and the safety effects of crossing elements obtained using Korea data. Crashes were observed to increase with total traffic volume and average daily train volumes. The proximity of crossings to commercial areas and the distance of the train detector from crossings are associated with larger numbers of accidents, as is the time duration between the activation of warning signals and gates. The unique contributions of the paper are the application of the gamma probability model to deal with underdispersion and the insights obtained regarding railroad crossing related vehicle crashes. Considerable past research has explored relationships between vehicle accidents and geometric design and operation of road sections, but relatively little research has examined factors that contribute to accidents at railway-highway crossings. Between 1998 and 2002 in Korea, about 95% of railway accidents occurred at highway-rail grade crossings, resulting in 402 accidents, of which about 20% resulted in fatalities. These statistics suggest that efforts to reduce crashes at these locations may significantly reduce crash costs. The objective of this paper is to examine factors associated with railroad crossing crashes. Various statistical models are used to examine the relationships between crossing accidents and features of crossings. The paper also compares accident models developed in the United States and the safety effects of crossing elements obtained using Korea data. Crashes were observed to increase with total traffic volume and average daily train volumes. The proximity of crossings to commercial areas and the distance of the train detector from crossings are associated with larger numbers of accidents, as is the time duration between the activation of warning signals and gates. The unique contributions of the paper are the application of the gamma probability model to deal with underdispersion and the insights obtained regarding railroad crossing related vehicle crashes.

Reliability prediction using the non-parametric explicit hazard model : a case study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Survival probability prediction using covariate-based hazard approach is a known statistical methodology in engineering asset health management. We have previously reported the semi-parametric Explicit Hazard Model (EHM) which incorporates three types of information: population characteristics; condition indicators; and operating environment indicators for hazard prediction. This model assumes the baseline hazard has the form of the Weibull distribution. To avoid this assumption, this paper presents the non-parametric EHM which is a distribution-free covariate-based hazard model. In this paper, an application of the non-parametric EHM is demonstrated via a case study. In this case study, survival probabilities of a set of resistance elements using the non-parametric EHM are compared with the Weibull proportional hazard model and traditional Weibull model. The results show that the non-parametric EHM can effectively predict asset life using the condition indicator, operating environment indicator, and failure history.

A model for mesoscale patterns in motile populations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Experimental observations of cell migration often describe the presence of mesoscale patterns within motile cell populations. These patterns can take the form of cells moving as aggregates or in chain-like formation. Here we present a discrete model capable of producing mesoscale patterns. These patterns are formed by biasing movements to favor a particular configuration of agent–agent attachments using a binding function f(K), where K is the scaled local coordination number. This discrete model is related to a nonlinear diffusion equation, where we relate the nonlinear diffusivity D(C) to the binding function f. The nonlinear diffusion equation supports a range of solutions which can be either smooth or discontinuous. Aggregation patterns can be produced with the discrete model, and we show that there is a transition between the presence and absence of aggregation depending on the sign of D(C). A combination of simulation and analysis shows that both the existence of mesoscale patterns and the validity of the continuum model depend on the form of f. Our results suggest that there may be no formal continuum description of a motile system with strong mesoscale patterns.

A statistical framework for natural feature representation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a robust stochastic framework for the incorporation of visual observations into conventional estimation, data fusion, navigation and control algorithms. The representation combines Isomap, a non-linear dimensionality reduction algorithm, with expectation maximization, a statistical learning scheme. The joint probability distribution of this representation is computed offline based on existing training data. The training phase of the algorithm results in a nonlinear and non-Gaussian likelihood model of natural features conditioned on the underlying visual states. This generative model can be used online to instantiate likelihoods corresponding to observed visual features in real-time. The instantiated likelihoods are expressed as a Gaussian mixture model and are conveniently integrated within existing non-linear filtering algorithms. Example applications based on real visual data from heterogenous, unstructured environments demonstrate the versatility of the generative models.

A stochastic model for natural feature representation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a robust stochastic model for the incorporation of natural features within data fusion algorithms. The representation combines Isomap, a non-linear manifold learning algorithm, with Expectation Maximization, a statistical learning scheme. The representation is computed offline and results in a non-linear, non-Gaussian likelihood model relating visual observations such as color and texture to the underlying visual states. The likelihood model can be used online to instantiate likelihoods corresponding to observed visual features in real-time. The likelihoods are expressed as a Gaussian Mixture Model so as to permit convenient integration within existing nonlinear filtering algorithms. The resulting compactness of the representation is especially suitable to decentralized sensor networks. Real visual data consisting of natural imagery acquired from an Unmanned Aerial Vehicle is used to demonstrate the versatility of the feature representation.

Allowing for the effect of data binning in a Bayesian Normal mixture model

Relevância:

30.00% 30.00%

Publicador:

Critical gap estimation by numerical and statistical highest likelihood search

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many traffic situations require drivers to cross or merge into a stream having higher priority. Gap acceptance theory enables us to model such processes to analyse traffic operation. This discussion demonstrated that numerical search fine tuned by statistical analysis can be used to determine the most likely critical gap for a sample of drivers, based on their largest rejected gap and accepted gap. This method shares some common features with the Maximum Likelihood Estimation technique (Troutbeck 1992) but lends itself well to contemporary analysis tools such as spreadsheet and is particularly analytically transparent. This method is considered not to bias estimation of critical gap due to very small rejected gaps or very large rejected gaps. However, it requires a sufficiently large sample that there is reasonable representation of largest rejected gap/accepted gap pairs within a fairly narrow highest likelihood search band.

The Bayesian conditional independence model for measurement error: applications in ecology

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The measurement error model is a well established statistical method for regression problems in medical sciences, although rarely used in ecological studies. While the situations in which it is appropriate may be less common in ecology, there are instances in which there may be benefits in its use for prediction and estimation of parameters of interest. We have chosen to explore this topic using a conditional independence model in a Bayesian framework using a Gibbs sampler, as this gives a great deal of flexibility, allowing us to analyse a number of different models without losing generality. Using simulations and two examples, we show how the conditional independence model can be used in ecology, and when it is appropriate.

An approach to statistical lip modelling for speaker identification via chromatic feature extraction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a novel technique for the tracking of moving lips for the purpose of speaker identification. In our system, a model of the lip contour is formed directly from chromatic information in the lip region. Iterative refinement of contour point estimates is not required. Colour features are extracted from the lips via concatenated profiles taken around the lip contour. Reduction of order in lip features is obtained via principal component analysis (PCA) followed by linear discriminant analysis (LDA). Statistical speaker models are built from the lip features based on the Gaussian mixture model (GMM). Identification experiments performed on the M2VTS¹ database, show encouraging results

Evaluating multivariate volatility forecasts : how effective are statistical and economic loss functions?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multivariate volatility forecasts are an important input in many financial applications, in particular portfolio optimisation problems. Given the number of models available and the range of loss functions to discriminate between them, it is obvious that selecting the optimal forecasting model is challenging. The aim of this thesis is to thoroughly investigate how effective many commonly used statistical (MSE and QLIKE) and economic (portfolio variance and portfolio utility) loss functions are at discriminating between competing multivariate volatility forecasts. An analytical investigation of the loss functions is performed to determine whether they identify the correct forecast as the best forecast. This is followed by an extensive simulation study examines the ability of the loss functions to consistently rank forecasts, and their statistical power within tests of predictive ability. For the tests of predictive ability, the model confidence set (MCS) approach of Hansen, Lunde and Nason (2003, 2011) is employed. As well, an empirical study investigates whether simulation findings hold in a realistic setting. In light of these earlier studies, a major empirical study seeks to identify the set of superior multivariate volatility forecasting models from 43 models that use either daily squared returns or realised volatility to generate forecasts. This study also assesses how the choice of volatility proxy affects the ability of the statistical loss functions to discriminate between forecasts. Analysis of the loss functions shows that QLIKE, MSE and portfolio variance can discriminate between multivariate volatility forecasts, while portfolio utility cannot. An examination of the effective loss functions shows that they all can identify the correct forecast at a point in time, however, their ability to discriminate between competing forecasts does vary. That is, QLIKE is identified as the most effective loss function, followed by portfolio variance which is then followed by MSE. The major empirical analysis reports that the optimal set of multivariate volatility forecasting models includes forecasts generated from daily squared returns and realised volatility. Furthermore, it finds that the volatility proxy affects the statistical loss functions’ ability to discriminate between forecasts in tests of predictive ability. These findings deepen our understanding of how to choose between competing multivariate volatility forecasts.

New variational Bayesian approaches for statistical data mining : with applications to profiling and differentiating habitual consumption behaviour of customers in the wireless telecommunication industry

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Quantification of particle emission characteristics and development of an emission model for use in transport microenvironments affected by traffic emissions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Vehicle emitted particles are of significant concern based on their potential to influence local air quality and human health. Transport microenvironments usually contain higher vehicle emission concentrations compared to other environments, and people spend a substantial amount of time in these microenvironments when commuting. Currently there is limited scientific knowledge on particle concentration, passenger exposure and the distribution of vehicle emissions in transport microenvironments, partially due to the fact that the instrumentation required to conduct such measurements is not available in many research centres. Information on passenger waiting time and location in such microenvironments has also not been investigated, which makes it difficult to evaluate a passenger’s spatial-temporal exposure to vehicle emissions. Furthermore, current emission models are incapable of rapidly predicting emission distribution, given the complexity of variations in emission rates that result from changes in driving conditions, as well as the time spent in driving condition within the transport microenvironment. In order to address these scientific gaps in knowledge, this work conducted, for the first time, a comprehensive statistical analysis of experimental data, along with multi-parameter assessment, exposure evaluation and comparison, and emission model development and application, in relation to traffic interrupted transport microenvironments. The work aimed to quantify and characterise particle emissions and human exposure in the transport microenvironments, with bus stations and a pedestrian crossing identified as suitable research locations representing a typical transport microenvironment. Firstly, two bus stations in Brisbane, Australia, with different designs, were selected to conduct measurements of particle number size distributions, particle number and PM2.5 concentrations during two different seasons. Simultaneous traffic and meteorological parameters were also monitored, aiming to quantify particle characteristics and investigate the impact of bus flow rate, station design and meteorological conditions on particle characteristics at stations. The results showed higher concentrations of PN20-30 at the station situated in an open area (open station), which is likely to be attributed to the lower average daily temperature compared to the station with a canyon structure (canyon station). During precipitation events, it was found that particle number concentration in the size range 25-250 nm decreased greatly, and that the average daily reduction in PM2.5 concentration on rainy days compared to fine days was 44.2 % and 22.6 % at the open and canyon station, respectively. The effect of ambient wind speeds on particle number concentrations was also examined, and no relationship was found between particle number concentration and wind speed for the entire measurement period. In addition, 33 pairs of average half-hourly PN7-3000 concentrations were calculated and identified at the two stations, during the same time of a day, and with the same ambient wind speeds and precipitation conditions. The results of a paired t-test showed that the average half-hourly PN7-3000 concentrations at the two stations were not significantly different at the 5% confidence level (t = 0.06, p = 0.96), which indicates that the different station designs were not a crucial factor for influencing PN7-3000 concentrations. A further assessment of passenger exposure to bus emissions on a platform was evaluated at another bus station in Brisbane, Australia. The sampling was conducted over seven weekdays to investigate spatial-temporal variations in size-fractionated particle number and PM2.5 concentrations, as well as human exposure on the platform. For the whole day, the average PN13-800 concentration was 1.3 x 104 and 1.0 x 104 particle/cm3 at the centre and end of the platform, respectively, of which PN50-100 accounted for the largest proportion to the total count. Furthermore, the contribution of exposure at the bus station to the overall daily exposure was assessed using two assumed scenarios of a school student and an office worker. It was found that, although the daily time fraction (the percentage of time spend at a location in a whole day) at the station was only 0.8 %, the daily exposure fractions (the percentage of exposures at a location accounting for the daily exposure) at the station were 2.7% and 2.8 % for exposure to PN13-800 and 2.7% and 3.5% for exposure to PM2.5 for the school student and the office worker, respectively. A new parameter, “exposure intensity” (the ratio of daily exposure fraction and the daily time fraction) was also defined and calculated at the station, with values of 3.3 and 3.4 for exposure to PN13-880, and 3.3 and 4.2 for exposure to PM2.5, for the school student and the office worker, respectively. In order to quantify the enhanced emissions at critical locations and define the emission distribution in further dispersion models for traffic interrupted transport microenvironments, a composite line source emission (CLSE) model was developed to specifically quantify exposure levels and describe the spatial variability of vehicle emissions in traffic interrupted microenvironments. This model took into account the complexity of vehicle movements in the queue, as well as different emission rates relevant to various driving conditions (cruise, decelerate, idle and accelerate), and it utilised multi-representative segments to capture the accurate emission distribution for real vehicle flow. This model does not only helped to quantify the enhanced emissions at critical locations, but it also helped to define the emission source distribution of the disrupted steady flow for further dispersion modelling. The model then was applied to estimate particle number emissions at a bidirectional bus station used by diesel and compressed natural gas fuelled buses. It was found that the acceleration distance was of critical importance when estimating particle number emission, since the highest emissions occurred in sections where most of the buses were accelerating and no significant increases were observed at locations where they idled. It was also shown that emissions at the front end of the platform were 43 times greater than at the rear of the platform. The CLSE model was also applied at a signalled pedestrian crossing, in order to assess increased particle number emissions from motor vehicles when forced to stop and accelerate from rest. The CLSE model was used to calculate the total emissions produced by a specific number and mix of light petrol cars and diesel passenger buses including 1 car travelling in 1 direction (/1 direction), 14 cars / 1 direction, 1 bus / 1 direction, 28 cars / 2 directions, 24 cars and 2 buses / 2 directions, and 20 cars and 4 buses / 2 directions. It was found that the total emissions produced during stopping on a red signal were significantly higher than when the traffic moved at a steady speed. Overall, total emissions due to the interruption of the traffic increased by a factor of 13, 11, 45, 11, 41, and 43 for the above 6 cases, respectively. In summary, this PhD thesis presents the results of a comprehensive study on particle number and mass concentration, together with particle size distribution, in a bus station transport microenvironment, influenced by bus flow rates, meteorological conditions and station design. Passenger spatial-temporal exposure to bus emitted particles was also assessed according to waiting time and location along the platform, as well as the contribution of exposure at the bus station to overall daily exposure. Due to the complexity of the interrupted traffic flow within the transport microenvironments, a unique CLSE model was also developed, which is capable of quantifying emission levels at critical locations within the transport microenvironment, for the purpose of evaluating passenger exposure and conducting simulations of vehicle emission dispersion. The application of the CLSE model at a pedestrian crossing also proved its applicability and simplicity for use in a real-world transport microenvironment.

«
1
2
...
55
56
57
58
59
60
61
62
63
»