151 resultados para spatiotemporal epidemic prediction model
Resumo:
Real-World Data Mining Applications generally do not end up with the creation of the models. The use of the model is the final purpose especially in prediction tasks. The problem arises when the model is built based on much more information than that the user can provide in using the model. As a result, the performance of model reduces drastically due to many missing attributes values. This paper develops a new learning system framework, called as User Query Based Learning System (UQBLS), for building data mining models best suitable for users use. We demonstrate its deployment in a real-world application of the lifetime prediction of metallic components in buildings
Resumo:
This paper deals with the problem of using the data mining models in a real-world situation where the user can not provide all the inputs with which the predictive model is built. A learning system framework, Query Based Learning System (QBLS), is developed for improving the performance of the predictive models in practice where not all inputs are available for querying to the system. The automatic feature selection algorithm called Query Based Feature Selection (QBFS) is developed for selecting features to obtain a balance between the relative minimum subset of features and the relative maximum classification accuracy. Performance of the QBLS system and the QBFS algorithm is successfully demonstrated with a real-world application
Resumo:
Aiming at the shortage of prevailing prediction methods about highway truck conveyance configuration in over-limit freight research that transferring the goods attributed to over-limit portion to another fully loaded truck of the same configuration and developing the truck traffic volume synchronously, a new way to get accumulated probability function of truck power tonnage in basal year by highway truck classified by wheel and axle type load mass spectrum investigation was presented. Logit models were used to forecast overall highway freight diversion and single cargo tonnage diversion when the weight rules and strict of enforcement intensity of overload were changed in scheme year. Assumption that the probability distribution of single truck loadage should be consistent with the probability distribution of single goods freighted, the model describes the truck conveyance configuration in the future under strict over-limit prohibition. The model was used and tested in Highway Over-limit Research Project in Anhui by World Bank.
Resumo:
Modern Engineering Asset Management (EAM) requires the accurate assessment of current and the prediction of future asset health condition. Appropriate mathematical models that are capable of estimating times to failures and the probability of failures in the future are essential in EAM. In most real-life situations, the lifetime of an engineering asset is influenced and/or indicated by different factors that are termed as covariates. Hazard prediction with covariates is an elemental notion in the reliability theory to estimate the tendency of an engineering asset failing instantaneously beyond the current time assumed that it has already survived up to the current time. A number of statistical covariate-based hazard models have been developed. However, none of them has explicitly incorporated both external and internal covariates into one model. This paper introduces a novel covariate-based hazard model to address this concern. This model is named as Explicit Hazard Model (EHM). Both the semi-parametric and non-parametric forms of this model are presented in the paper. The major purpose of this paper is to illustrate the theoretical development of EHM. Due to page limitation, a case study with the reliability field data is presented in the applications part of this study.
Resumo:
Hazard and reliability prediction of an engineering asset is one of the significant fields of research in Engineering Asset Health Management (EAHM). In real-life situations where an engineering asset operates under dynamic operational and environmental conditions, the lifetime of an engineering asset can be influenced and/or indicated by different factors that are termed as covariates. The Explicit Hazard Model (EHM) as a covariate-based hazard model is a new approach for hazard prediction which explicitly incorporates both internal and external covariates into one model. EHM is an appropriate model to use in the analysis of lifetime data in presence of both internal and external covariates in the reliability field. This paper presents applications of the methodology which is introduced and illustrated in the theory part of this study. In this paper, the semi-parametric EHM is applied to a case study so as to predict the hazard and reliability of resistance elements on a Resistance Corrosion Sensor Board (RCSB).
Resumo:
In condition-based maintenance (CBM), effective diagnostics and prognostics are essential tools for maintenance engineers to identify imminent fault and to predict the remaining useful life before the components finally fail. This enables remedial actions to be taken in advance and reschedules production if necessary. This paper presents a technique for accurate assessment of the remnant life of machines based on historical failure knowledge embedded in the closed loop diagnostic and prognostic system. The technique uses the Support Vector Machine (SVM) classifier for both fault diagnosis and evaluation of health stages of machine degradation. To validate the feasibility of the proposed model, the five different level data of typical four faults from High Pressure Liquefied Natural Gas (HP-LNG) pumps were used for multi-class fault diagnosis. In addition, two sets of impeller-rub data were analysed and employed to predict the remnant life of pump based on estimation of health state. The results obtained were very encouraging and showed that the proposed prognosis system has the potential to be used as an estimation tool for machine remnant life prediction in real life industrial applications.
Resumo:
Event-specific scales commonly have greater power than generalized scales in prediction of specific disorders and in testing mediator models for predicting such disorders. Therefore, in a preliminary study, a 6-item Alcohol Helplessness Scale was constructed and found to be reliable for a sample of 98 problem drinkers. Hierarchical multiple regression and its derivative path analysis were used to test whether helplessness and self-efficacy moderate or mediate the link between alcohol dependence and depression, A test of a moderation model was not supported, whereas a test of a mediation model was supported. Helplessness and self-efficacy both significantly and independently mediated between alcohol dependence and depression. Nevertheless, a significant direct effect of alcohol dependence on depression also remained.
Resumo:
Tested a social–cognitive model of depressive episodes and their treatment within a predictive study of treatment response. 42 clinically depressed volunteers (aged 22–60 yrs) were given self-efficacy (SE) questionnaires and other measures before and after treatment with cognitive therapy. Results support the idea that SE and skills regarding control of negative cognition mediates a sustained response to cognitive treatment for depression. Not only did mood-control variables correlate highly with concurrent changes in depression scores during treatment, but the posttreatment SE measure discriminated Ss who relapsed over the next 12 mo.
Resumo:
Background: The seasonality of suicide has long been recognised. However, little is known about the relative importance of socio-environmental factors in the occurrence of suicide in different geographical areas. This study examined the association of climate, socioeconomic and demographic factors with suicide in Queensland, Australia, using a spatiotemporal approach. Methods: Seasonal data on suicide, demographic variables and socioeconomic indexes for areas in each Local Government Area (LGA) between 1999 and 2003 were acquired from the Australian Bureau of Statistics. Climate data were supplied by the Australian Bureau of Meteorology. A multivariable generalized estimating equation model was used to examine the impact of socio-environmental factors on suicide. Results: The preliminary data analyses show that far north Queensland had the highest suicide incidence (e.g., Cook and Mornington Shires), while the south-western areas had the lowest incidence (e.g., Barcoo and Bauhinia Shires) in all the seasons. Maximum temperature, unemployment rate, the proportion of Indigenous population and the proportion of population with low individual income were statistically significantly and positively associated with suicide. There were weaker but not significant associations for other variables. Conclusions: Maximum temperature, the proportion of Indigenous population and unemployment rate appeared to be major determinants of suicide at a LGA level in Queensland.
Resumo:
Pipelines play an important role in the modern society. Failures of pipelines can have great impacts on economy, environment and community. Preventive maintenance (PM) is often conducted to improve the reliability of pipelines. Modern asset management practice requires accurate predictability of the reliability of pipelines with multiple PM actions, especially when these PM actions involve imperfect repairs. To address this issue, a split system approach (SSA) based model is developed in this paper through an industrial case study. This new model enables maintenance personnel to predict the reliability of pipelines with different PM strategies and hence effectively assists them in making optimal PM decisions.
Resumo:
An adaptive agent improves its performance by learning from experience. This paper describes an approach to adaptation based on modelling dynamic elements of the environment in order to make predictions of likely future state. This approach is akin to an elite sports player being able to “read the play”, allowing for decisions to be made based on predictions of likely future outcomes. Modelling of the agent‟s likely future state is performed using Markov Chains and a technique called “Motion and Occupancy Grids”. The experiments in this paper compare the performance of the planning system with and without the use of this predictive model. The results of the study demonstrate a surprising decrease in performance when using the predictions of agent occupancy. The results are derived from statistical analysis of the agent‟s performance in a high fidelity simulation of a world leading real robot soccer team.
Resumo:
The study described in this paper developed a model of animal movement, which explicitly recognised each individual as the central unit of measure. The model was developed by learning from a real dataset that measured and calculated, for individual cows in a herd, their linear and angular positions and directional and angular speeds. Two learning algorithms were implemented: a Hidden Markov model (HMM) and a long-term prediction algorithm. It is shown that a HMM can be used to describe the animal's movement and state transition behaviour within several “stay” areas where cows remained for long periods. Model parameters were estimated for hidden behaviour states such as relocating, foraging and bedding. For cows’ movement between the “stay” areas a long-term prediction algorithm was implemented. By combining these two algorithms it was possible to develop a successful model, which achieved similar results to the animal behaviour data collected. This modelling methodology could easily be applied to interactions of other animal species.
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.
Resumo:
There has been a worldwide trend to increase axle loads and train speeds. This means that railway track degradation will be accelerated, and track maintenance costs will be increased significantly. There is a need to investigate the consequences of increasing traffic load. The aim of the research is to develop a model for the analysis of physical degradation of railway tracks in response to changes in traffic parameters, especially increased axle loads and train speeds. This research has developed an integrated track degradation model (ITDM) by integrating several models into a comprehensive framework. Mechanistic relationships for track degradation hav~ ?een used wherever possible in each of the models contained in ITDM. This overcc:mes the deficiency of the traditional statistical track models which rely heavily on historical degradation data, which is generally not available in many railway systems. In addition statistical models lack the flexibility of incorporating future changes in traffic patterns or maintenance practices. The research starts with reviewing railway track related studies both in Australia and overseas to develop a comprehensive understanding of track performance under various traffic conditions. Existing railway related models are then examined for their suitability for track degradation analysis for Australian situations. The ITDM model is subsequently developed by modifying suitable existing models, and developing new models where necessary. The ITDM model contains four interrelated submodels for rails, sleepers, ballast and subgrade, and track modulus. The rail submodel is for rail wear analysis and is developed from a theoretical concept. The sleeper submodel is for timber sleepers damage prediction. The submodel is developed by modifying and extending an existing model developed elsewhere. The submodel has also incorporated an analysis for the likelihood of concrete sleeper cracking. The ballast and subgrade submodel is evolved from a concept developed in the USA. Substantial modifications and improvements have been made. The track modulus submodel is developed from a conceptual method. Corrections for more global track conditions have been made. The integration of these submodels into one comprehensive package has enabled the interaction between individual track components to be taken into account. This is done by calculating wheel load distribution with time and updating track conditions periodically in the process of track degradation simulation. A Windows-based computer program ~ssociated with ITDM has also been developed. The program enables the user to carry out analysis of degradation of individual track components and to investigate the inter relationships between these track components and their deterioration. The successful implementation of this research has provided essential information for prediction of increased maintenance as a consequence of railway trackdegradation. The model, having been presented at various conferences and seminars, has attracted wide interest. It is anticipated that the model will be put into practical use among Australian railways, enabling track maintenance planning to be optimized and potentially saving Australian railway systems millions of dollars in operating costs.
Resumo:
Statistical modeling of traffic crashes has been of interest to researchers for decades. Over the most recent decade many crash models have accounted for extra-variation in crash counts—variation over and above that accounted for by the Poisson density. The extra-variation – or dispersion – is theorized to capture unaccounted for variation in crashes across sites. The majority of studies have assumed fixed dispersion parameters in over-dispersed crash models—tantamount to assuming that unaccounted for variation is proportional to the expected crash count. Miaou and Lord [Miaou, S.P., Lord, D., 2003. Modeling traffic crash-flow relationships for intersections: dispersion parameter, functional form, and Bayes versus empirical Bayes methods. Transport. Res. Rec. 1840, 31–40] challenged the fixed dispersion parameter assumption, and examined various dispersion parameter relationships when modeling urban signalized intersection accidents in Toronto. They suggested that further work is needed to determine the appropriateness of the findings for rural as well as other intersection types, to corroborate their findings, and to explore alternative dispersion functions. This study builds upon the work of Miaou and Lord, with exploration of additional dispersion functions, the use of an independent data set, and presents an opportunity to corroborate their findings. Data from Georgia are used in this study. A Bayesian modeling approach with non-informative priors is adopted, using sampling-based estimation via Markov Chain Monte Carlo (MCMC) and the Gibbs sampler. A total of eight model specifications were developed; four of them employed traffic flows as explanatory factors in mean structure while the remainder of them included geometric factors in addition to major and minor road traffic flows. The models were compared and contrasted using the significance of coefficients, standard deviance, chi-square goodness-of-fit, and deviance information criteria (DIC) statistics. The findings indicate that the modeling of the dispersion parameter, which essentially explains the extra-variance structure, depends greatly on how the mean structure is modeled. In the presence of a well-defined mean function, the extra-variance structure generally becomes insignificant, i.e. the variance structure is a simple function of the mean. It appears that extra-variation is a function of covariates when the mean structure (expected crash count) is poorly specified and suffers from omitted variables. In contrast, when sufficient explanatory variables are used to model the mean (expected crash count), extra-Poisson variation is not significantly related to these variables. If these results are generalizable, they suggest that model specification may be improved by testing extra-variation functions for significance. They also suggest that known influences of expected crash counts are likely to be different than factors that might help to explain unaccounted for variation in crashes across sites