315 resultados para Data-driven modelling
Resumo:
Travel time prediction has long been the topic of transportation research. But most relevant prediction models in the literature are limited to motorways. Travel time prediction on arterial networks is challenging due to involving traffic signals and significant variability of individual vehicle travel time. The limited availability of traffic data from arterial networks makes travel time prediction even more challenging. Recently, there has been significant interest of exploiting Bluetooth data for travel time estimation. This research analysed the real travel time data collected by the Brisbane City Council using the Bluetooth technology on arterials. Databases, including experienced average daily travel time are created and classified for approximately 8 months. Thereafter, based on data characteristics, Seasonal Auto Regressive Integrated Moving Average (SARIMA) modelling is applied on the database for short-term travel time prediction. The SARMIA model not only takes the previous continuous lags into account, but also uses the values from the same time of previous days for travel time prediction. This is carried out by defining a seasonality coefficient which improves the accuracy of travel time prediction in linear models. The accuracy, robustness and transferability of the model are evaluated through comparing the real and predicted values on three sites within Brisbane network. The results contain the detailed validation for different prediction horizons (5 min to 90 minutes). The model performance is evaluated mainly on congested periods and compared to the naive technique of considering the historical average.
Resumo:
A 3-year longitudinal study Transforming Children’s Mathematical and Scientific Development integrates, through data modelling, a pedagogical approach focused on mathematical patterns and structural relationships with learning in science. As part of this study, a purposive sample of 21 highly able Grade 1 students was engaged in an innovative data modelling program. In the majority of students, representational development was observed. Their complex graphs depicting categorical and continuous data revealed a high level of structure and enabled identification of structural features critical to this development.
Resumo:
Background The expansion of cell colonies is driven by a delicate balance of several mechanisms including cell motility, cell-to-cell adhesion and cell proliferation. New approaches that can be used to independently identify and quantify the role of each mechanism will help us understand how each mechanism contributes to the expansion process. Standard mathematical modelling approaches to describe such cell colony expansion typically neglect cell-to-cell adhesion, despite the fact that cell-to-cell adhesion is thought to play an important role. Results We use a combined experimental and mathematical modelling approach to determine the cell diffusivity, D, cell-to-cell adhesion strength, q, and cell proliferation rate, ?, in an expanding colony of MM127 melanoma cells. Using a circular barrier assay, we extract several types of experimental data and use a mathematical model to independently estimate D, q and ?. In our first set of experiments, we suppress cell proliferation and analyse three different types of data to estimate D and q. We find that standard types of data, such as the area enclosed by the leading edge of the expanding colony and more detailed cell density profiles throughout the expanding colony, does not provide sufficient information to uniquely identify D and q. We find that additional data relating to the degree of cell-to-cell clustering is required to provide independent estimates of q, and in turn D. In our second set of experiments, where proliferation is not suppressed, we use data describing temporal changes in cell density to determine the cell proliferation rate. In summary, we find that our experiments are best described using the range D = 161 - 243 ?m2 hour-1, q = 0.3 - 0.5 (low to moderate strength) and ? = 0.0305 - 0.0398 hour-1, and with these parameters we can accurately predict the temporal variations in the spatial extent and cell density profile throughout the expanding melanoma cell colony. Conclusions Our systematic approach to identify the cell diffusivity, cell-to-cell adhesion strength and cell proliferation rate highlights the importance of integrating multiple types of data to accurately quantify the factors influencing the spatial expansion of melanoma cell colonies.
Resumo:
Digital human modeling (DHM) systems underwent significant development within the last years. They achieved constantly growing importance in the field of ergonomic workplace design, product development, product usability, ergonomic research, ergonomic education, audiovisual marketing and the entertainment industry. They help to design ergonomic products as well as healthy and safe socio-technical work systems. In the domain of scientific DHM systems, no industry specific standard interfaces are defined which could facilitate the exchange of 3D solid body data, anthropometric data or motion data. The focus of this article is to provide an overview of requirements for a reliable data exchange between different DHM systems in order to identify suitable file formats. Examples from the literature are discussed in detail. Methods: As a first step a literature review is conducted on existing studies and file formats for exchanging data between different DHM systems. The found file formats can be structured into different categories: static 3D solid body data exchange, anthropometric data exchange, motion data exchange and comprehensive data exchange. Each file format is discussed and advantages as well as disadvantages for the DHM context are pointed out. Case studies are furthermore presented, which show first approaches to exchange data between DHM systems. Lessons learnt are shortly summarized. Results: A selection of suitable file formats for data exchange between DHM systems is determined from the literature review.
Resumo:
This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival. Keywords: Bayesian modelling; Bayesian model averaging; Cure model; Markov Chain Monte Carlo; Mixture model; Survival analysis; Weibull distribution
Resumo:
The study of the relationship between macroscopic traffic parameters, such as flow, speed and travel time, is essential to the understanding of the behaviour of freeway and arterial roads. However, the temporal dynamics of these parameters are difficult to model, especially for arterial roads, where the process of traffic change is driven by a variety of variables. The introduction of the Bluetooth technology into the transportation area has proven exceptionally useful for monitoring vehicular traffic, as it allows reliable estimation of travel times and traffic demands. In this work, we propose an approach based on Bayesian networks for analyzing and predicting the complex dynamics of flow or volume, based on travel time observations from Bluetooth sensors. The spatio-temporal relationship between volume and travel time is captured through a first-order transition model, and a univariate Gaussian sensor model. The two models are trained and tested on travel time and volume data, from an arterial link, collected over a period of six days. To reduce the computational costs of the inference tasks, volume is converted into a discrete variable. The discretization process is carried out through a Self-Organizing Map. Preliminary results show that a simple Bayesian network can effectively estimate and predict the complex temporal dynamics of arterial volumes from the travel time data. Not only is the model well suited to produce posterior distributions over single past, current and future states; but it also allows computing the estimations of joint distributions, over sequences of states. Furthermore, the Bayesian network can achieve excellent prediction, even when the stream of travel time observation is partially incomplete.
Resumo:
This chapter addresses data modelling as a means of promoting statistical literacy in the early grades. Consideration is first given to the importance of increasing young children’s exposure to statistical reasoning experiences and how data modelling can be a rich means of doing so. Selected components of data modelling are then reviewed, followed by a report on some findings from the third-year of a three-year longitudinal study across grades one through three.
Resumo:
Presentation by Dr Caroline Grant, Science & Engineering Faculty, IHBI, at Managing your research data seminar, 2012
Resumo:
Cancer is the leading contributor to the disease burden in Australia. This thesis develops and applies Bayesian hierarchical models to facilitate an investigation of the spatial and temporal associations for cancer diagnosis and survival among Queenslanders. The key objectives are to document and quantify the importance of spatial inequalities, explore factors influencing these inequalities, and investigate how spatial inequalities change over time. Existing Bayesian hierarchical models are refined, new models and methods developed, and tangible benefits obtained for cancer patients in Queensland. The versatility of using Bayesian models in cancer control are clearly demonstrated through these detailed and comprehensive analyses.
Resumo:
As for other complex diseases, linkage analyses of schizophrenia (SZ) have produced evidence for numerous chromosomal regions, with inconsistent results reported across studies. The presence of locus heterogeneity appears likely and may reduce the power of linkage analyses if homogeneity is assumed. In addition, when multiple heterogeneous datasets are pooled, inter-sample variation in the proportion of linked families (alpha) may diminish the power of the pooled sample to detect susceptibility loci, in spite of the larger sample size obtained. We compare the significance of linkage findings obtained using allele-sharing LOD scores (LOD(exp))-which assume homogeneity-and heterogeneity LOD scores (HLOD) in European American and African American NIMH SZ families. We also pool these two samples and evaluate the relative power of the LOD(exp) and two different heterogeneity statistics. One of these (HLOD-P) estimates the heterogeneity parameter alpha only in aggregate data, while the second (HLOD-S) determines alpha separately for each sample. In separate and combined data, we show consistently improved performance of HLOD scores over LOD(exp). Notably, genome-wide significant evidence for linkage is obtained at chromosome 10p in the European American sample using a recessive HLOD score. When the two samples are combined, linkage at the 10p locus also achieves genome-wide significance under HLOD-S, but not HLOD-P. Using HLOD-S, improved evidence for linkage was also obtained for a previously reported region on chromosome 15q. In linkage analyses of complex disease, power may be maximised by routinely modelling locus heterogeneity within individual datasets, even when multiple datasets are combined to form larger samples.
Resumo:
A central tenet in the theory of reliability modelling is the quantification of the probability of asset failure. In general, reliability depends on asset age and the maintenance policy applied. Usually, failure and maintenance times are the primary inputs to reliability models. However, for many organisations, different aspects of these data are often recorded in different databases (e.g. work order notifications, event logs, condition monitoring data, and process control data). These recorded data cannot be interpreted individually, since they typically do not have all the information necessary to ascertain failure and preventive maintenance times. This paper presents a methodology for the extraction of failure and preventive maintenance times using commonly-available, real-world data sources. A text-mining approach is employed to extract keywords indicative of the source of the maintenance event. Using these keywords, a Naïve Bayes classifier is then applied to attribute each machine stoppage to one of two classes: failure or preventive. The accuracy of the algorithm is assessed and the classified failure time data are then presented. The applicability of the methodology is demonstrated on a maintenance data set from an Australian electricity company.