673 resultados para Predictive modelling
Resumo:
In a seminal data mining article, Leo Breiman [1] argued that to develop effective predictive classification and regression models, we need to move away from the sole dependency on statistical algorithms and embrace a wider toolkit of modeling algorithms that include data mining procedures. Nevertheless, many researchers still rely solely on statistical procedures when undertaking data modeling tasks; the sole reliance on these procedures has lead to the development of irrelevant theory and questionable research conclusions ([1], p.199). We will outline initiatives that the HPC & Research Support group is undertaking to engage researchers with data mining tools and techniques; including a new range of seminars, workshops, and one-on-one consultations covering data mining algorithms, the relationship between data mining and the research cycle, and limitations and problems with these new algorithms. Organisational limitations and restrictions to these initiatives are also discussed.
Resumo:
Abstract Background The quantum increases in home Internet access and available online health information with limited control over information quality highlight the necessity of exploring decision making processes in accessing and using online information, specifically in relation to children who do not make their health decisions. Objectives To understand the processes explaining parents’ decisions to use online health information for child health care. Methods Parents (N = 391) completed an initial questionnaire assessing the theory of planned behaviour constructs of attitude, subjective norm, and perceived behavioural control, as well as perceived risk, group norm, and additional demographic factors. Two months later, 187 parents completed a follow-up questionnaire assessing their decisions to use online information for their child’s health care, specifically to 1) diagnose and/or treat their child’s suspected medical condition/illness and 2) increase understanding about a diagnosis or treatment recommended by a health professional. Results Hierarchical multiple regression showed that, for both behaviours, attitude, subjective norm, perceived behavioural control, (less) perceived risk, group norm, and (non) medical background were the significant predictors of intention. For parents’ use of online child health information, for both behaviours, intention was the sole significant predictor of behaviour. The findings explain 77% of the variance in parents’ intention to treat/diagnose a child health problem and 74% of the variance in their intentions to increase their understanding about child health concerns. Conclusions Understanding parents’ socio-cognitive processes that guide their use of online information for child health care is important given the increase in Internet usage and the sometimes-questionable quality of health information provided online. Findings highlight parents’ thirst for information; there is an urgent need for health professionals to provide parents with evidence-based child health websites in addition to general population education on how to evaluate the quality of online health information.
Resumo:
The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.
Resumo:
This research analyses the extent of damage to buildings in Brisbane, Ipswich and Grantham during the recent Eastern Australia flooding and explore the role planning and design/construction regulations played in these failures. It highlights weaknesses in the current systems and propose effective solutions to mitigate future damage and financial loss under current or future climates. 2010 and early 2011 saw major flooding throughout much of Eastern Australia. Queensland and Victoria were particularly hard hit, with insured losses in these states reaching $2.5 billion and many thousands of homes inundated. The Queensland cities of Brisbane and Ipswich were the worst affected; around two-thirds of all inundated property/buildings were in these two areas. Other local government areas to record high levels of inundation were Central Highlands and Rockhampton Regional Councils in Queensland, and Buloke, Campaspe, Central Gold Fields and Loddon in Victoria. Flash flooding was a problem in a number of Victorian councils, but the Lockyer Valley west of Ipswich suffered the most extensive damage with 19 lives lost and more than 100 homes completely destroyed. In all more than 28,000 properties were inundated in Queensland and around 2,500 buildings affected in Victoria. Of the residential properties affected in Brisbane, around 90% were in areas developed prior to the introduction of floodplain development controls, with many also suffering inundation during the 1974 floods. The project developed a predictive model for estimating flood loss and occupant displacement. This model can now be used for flood risk assessments or rapid assessment of impacts following a flood event.
Resumo:
Predicting temporal responses of ecosystems to disturbances associated with industrial activities is critical for their management and conservation. However, prediction of ecosystem responses is challenging due to the complexity and potential non-linearities stemming from interactions between system components and multiple environmental drivers. Prediction is particularly difficult for marine ecosystems due to their often highly variable and complex natures and large uncertainties surrounding their dynamic responses. Consequently, current management of such systems often rely on expert judgement and/or complex quantitative models that consider only a subset of the relevant ecological processes. Hence there exists an urgent need for the development of whole-of-systems predictive models to support decision and policy makers in managing complex marine systems in the context of industry based disturbances. This paper presents Dynamic Bayesian Networks (DBNs) for predicting the temporal response of a marine ecosystem to anthropogenic disturbances. The DBN provides a visual representation of the problem domain in terms of factors (parts of the ecosystem) and their relationships. These relationships are quantified via Conditional Probability Tables (CPTs), which estimate the variability and uncertainty in the distribution of each factor. The combination of qualitative visual and quantitative elements in a DBN facilitates the integration of a wide array of data, published and expert knowledge and other models. Such multiple sources are often essential as one single source of information is rarely sufficient to cover the diverse range of factors relevant to a management task. Here, a DBN model is developed for tropical, annual Halophila and temperate, persistent Amphibolis seagrass meadows to inform dredging management and help meet environmental guidelines. Specifically, the impacts of capital (e.g. new port development) and maintenance (e.g. maintaining channel depths in established ports) dredging is evaluated with respect to the risk of permanent loss, defined as no recovery within 5 years (Environmental Protection Agency guidelines). The model is developed using expert knowledge, existing literature, statistical models of environmental light, and experimental data. The model is then demonstrated in a case study through the analysis of a variety of dredging, environmental and seagrass ecosystem recovery scenarios. In spatial zones significantly affected by dredging, such as the zone of moderate impact, shoot density has a very high probability of being driven to zero by capital dredging due to the duration of such dredging. Here, fast growing Halophila species can recover, however, the probability of recovery depends on the presence of seed banks. On the other hand, slow growing Amphibolis meadows have a high probability of suffering permanent loss. However, in the maintenance dredging scenario, due to the shorter duration of dredging, Amphibolis is better able to resist the impacts of dredging. For both types of seagrass meadows, the probability of loss was strongly dependent on the biological and ecological status of the meadow, as well as environmental conditions post-dredging. The ability to predict the ecosystem response under cumulative, non-linear interactions across a complex ecosystem highlights the utility of DBNs for decision support and environmental management.
Resumo:
The purpose of this research was to develop and test a multicausal model of the individual characteristics associated with academic success in first-year Australian university students. This model comprised the constructs of: previous academic performance, achievement motivation, self-regulatory learning strategies, and personality traits, with end-of-semester grades the dependent variable of interest. The study involved the distribution of a questionnaire, which assessed motivation, self-regulatory learning strategies and personality traits, to 1193 students at the start of their first year at university. Students' academic records were accessed at the end of their first year of study to ascertain their first and second semester grades. This study established that previous high academic performance, use of self-regulatory learning strategies, and being introverted and agreeable, were indicators of academic success in the first semester of university study. Achievement motivation and the personality trait of conscientiousness were indirectly related to first semester grades, through the influence they had on the students' use of self-regulatory learning strategies. First semester grades were predictive of second semester grades. This research provides valuable information for both educators and students about the factors intrinsic to the individual that are associated with successful performance in the first year at university.
Resumo:
Hypertrophic scars arise when there is an overproduction of collagen during wound healing. These are often associated with poor regulation of the rate of programmed cell death(apoptosis) of the cells synthesizing the collagen or by an exuberant inflammatory response that prolongs collagen production and increases wound contraction. Severe contractures that occur, for example, after a deep burn can cause loss of function especially if the wound is over a joint such as the elbow or knee. Recently, we have developed a morphoelastic mathematical model for dermal repair that incorporates the chemical, cellular and mechanical aspects of dermal wound healing. Using this model, we examine pathological scarring in dermal repair by first assuming a smaller than usual apoptotic rate for myofibroblasts, and then considering a prolonged inflammatory response, in an attempt to determine a possible optimal intervention strategy to promote normal repair, or terminate the fibrotic scarring response. Our model predicts that in both cases it is best to apply the intervention strategy early in the wound healing response. Further, the earlier an intervention is made, the less aggressive the intervention required. Finally, if intervention is conducted at a late time during healing, a significant intervention is required; however, there is a threshold concentration of the drug or therapy applied, above which minimal further improvement to wound repair is obtained.
Resumo:
The growth of solid tumours beyond a critical size is dependent upon angiogenesis, the formation of new blood vessels from an existing vasculature. Tumours may remain dormant at microscopic sizes for some years before switching to a mode in which growth of a supportive vasculature is initiated. The new blood vessels supply nutrients, oxygen, and access to routes by which tumour cells may travel to other sites within the host (metastasize). In recent decades an abundance of biological research has focused on tumour-induced angiogenesis in the hope that treatments targeted at the vasculature may result in a stabilisation or regression of the disease: a tantalizing prospect. The complex and fascinating process of angiogenesis has also attracted the interest of researchers in the field of mathematical biology, a discipline that is, for mathematics, relatively new. The challenge in mathematical biology is to produce a model that captures the essential elements and critical dependencies of a biological system. Such a model may ultimately be used as a predictive tool. In this thesis we examine a number of aspects of tumour-induced angiogenesis, focusing on growth of the neovasculature external to the tumour. Firstly we present a one-dimensional continuum model of tumour-induced angiogenesis in which elements of the immune system or other tumour-cytotoxins are delivered via the newly formed vessels. This model, based on observations from experiments by Judah Folkman et al., is able to show regression of the tumour for some parameter regimes. The modelling highlights a number of interesting aspects of the process that may be characterised further in the laboratory. The next model we present examines the initiation positions of blood vessel sprouts on an existing vessel, in a two-dimensional domain. This model hypothesises that a simple feedback inhibition mechanism may be used to describe the spacing of these sprouts with the inhibitor being produced by breakdown of the existing vessel's basement membrane. Finally, we have developed a stochastic model of blood vessel growth and anastomosis in three dimensions. The model has been implemented in C++, includes an openGL interface, and uses a novel algorithm for calculating proximity of the line segments representing a growing vessel. This choice of programming language and graphics interface allows for near-simultaneous calculation and visualisation of blood vessel networks using a contemporary personal computer. In addition the visualised results may be transformed interactively, and drop-down menus facilitate changes in the parameter values. Visualisation of results is of vital importance in the communication of mathematical information to a wide audience, and we aim to incorporate this philosophy in the thesis. As biological research further uncovers the intriguing processes involved in tumourinduced angiogenesis, we conclude with a comment from mathematical biologist Jim Murray, Mathematical biology is : : : the most exciting modern application of mathematics.
Resumo:
Aims: This paper describes the development of a risk adjustment (RA) model predictive of individual lesion treatment failure in percutaneous coronary interventions (PCI) for use in a quality monitoring and improvement program. Methods and results: Prospectively collected data for 3972 consecutive revascularisation procedures (5601 lesions) performed between January 2003 and September 2011 were studied. Data on procedures to September 2009 (n = 3100) were used to identify factors predictive of lesion treatment failure. Factors identified included lesion risk class (p < 0.001), occlusion type (p < 0.001), patient age (p = 0.001), vessel system (p < 0.04), vessel diameter (p < 0.001), unstable angina (p = 0.003) and presence of major cardiac risk factors (p = 0.01). A Bayesian RA model was built using these factors with predictive performance of the model tested on the remaining procedures (area under the receiver operating curve: 0.765, Hosmer–Lemeshow p value: 0.11). Cumulative sum, exponentially weighted moving average and funnel plots were constructed using the RA model and subjectively evaluated. Conclusion: A RA model was developed and applied to SPC monitoring for lesion failure in a PCI database. If linked to appropriate quality improvement governance response protocols, SPC using this RA tool might improve quality control and risk management by identifying variation in performance based on a comparison of observed and expected outcomes.
Resumo:
Cell migration is a behaviour critical to many key biological effects, including wound healing, cancerous cell invasion and morphogenesis, the development of an organism from an embryo. However, given that each of these situations is distinctly different and cells are extremely complicated biological objects, interest lies in more basic experiments which seek to remove conflating factors and present a less complex environment within which cell migration can be experimentally examined. These include in vitro studies like the scratch assay or circle migration assay, and ex vivo studies like the colonisation of the hindgut by neural crest cells. The reduced complexity of these experiments also makes them much more enticing as problems to mathematically model, like done here. The primary goal of the mathematical models used in this thesis is to shed light on which cellular behaviours work to generate the travelling waves of invasion observed in these experiments, and to explore how variations in these behaviours can potentially predict differences in this invasive pattern which are experimentally observed when cell types or chemical environment are changed. Relevant literature has already identified the difficulty of distinguishing between these behaviours when using traditional mathematical biology techniques operating on a macroscopic scale, and so here a sophisticated individual-cell-level model, an extension of the Cellular Potts Model (CPM), is been constructed and used to model a scratch assay experiment. This model includes a novel mechanism for dealing with cell proliferations that allowed for the differing properties of quiescent and proliferative cells to be implemented into their behaviour. This model is considered both for its predictive power and used to make comparisons with the travelling waves which result in more traditional macroscopic simulations. These comparisons demonstrate a surprising amount of agreement between the two modelling frameworks, and suggest further novel modifications to the CPM that would allow it to better model cell migration. Considerations of the model’s behaviour are used to argue that the dominant effect governing cell migration (random motility or signal-driven taxis) likely depends on the sort of invasion demonstrated by cells, as easily seen by microscopic photography. Additionally, a scratch assay simulated on a non-homogeneous domain consisting of a ’fast’ and ’slow’ region is also used to further differentiate between these different potential cell motility behaviours. A heterogeneous domain is a novel situation which has not been considered mathematically in this context, nor has it been constructed experimentally to the best of the candidate’s knowledge. Thus this problem serves as a thought experiment used to test the conclusions arising from the simulations on homogeneous domains, and to suggest what might be observed should this non-homogeneous assay situation be experimentally realised. Non-intuitive cell invasion patterns are predicted for diffusely-invading cells which respond to a cell-consumed signal or nutrient, contrasted with rather expected behaviour in the case of random-motility-driven invasion. The potential experimental observation of these behaviours is demonstrated by the individual-cell-level model used in this thesis, which does agree with the PDE model in predicting these unexpected invasion patterns. In the interest of examining such a case of a non-homogeneous domain experimentally, some brief suggestion is made as to how this could be achieved.
Resumo:
Keeping exotic plant pests out of our country relies on good border control or quarantine. However with increasing globalization and mobilization some things slip through. Then the back up systems become important. This can include an expensive form of surveillance that purposively targets particular pests. A much wider net is provided by general surveillance, which is assimilated into everyday activities, like farmers checking the health of their crops. In fact farmers and even home gardeners have provided a front line warning system for some pests (eg European wasp) that could otherwise have wreaked havoc. Mathematics is used to model how surveillance works in various situations. Within this virtual world we can play with various surveillance and management strategies to "see" how they would work, or how to make them work better. One of our greatest challenges is estimating some of the input parameters : because the pest hasn't been here before, it's hard to predict how well it might behave: establishing, spreading, and what types of symptoms it might express. So we rely on experts to help us with this. This talk will look at the mathematical, psychological and logical challenges of helping experts to quantify what they think. We show how the subjective Bayesian approach is useful for capturing expert uncertainty, ultimately providing a more complete picture of what they think... And what they don't!
Resumo:
Sustainability is a key driver for decisions in the management and future development of industries. The World Commission on Environment and Development (WCED, 1987) outlined imperatives which need to be met for environmental, economic and social sustainability. Development of strategies for measuring and improving sustainability in and across these domains, however, has been hindered by intense debate between advocates for one approach fearing that efforts by those who advocate for another could have unintended adverse impacts. Studies attempting to compare the sustainability performance of countries and industries have also found ratings of performance quite variable depending on the sustainability indices used. Quantifying and comparing the sustainability of industries across the triple bottom line of economy, environment and social impact continues to be problematic. Using the Australian dairy industry as a case study, a Sustainability Scorecard, developed as a Bayesian network model, is proposed as an adaptable tool to enable informed assessment, dialogue and negotiation of strategies at a global level as well as being suitable for developing local solutions.
Resumo:
An important aspect of decision support systems involves applying sophisticated and flexible statistical models to real datasets and communicating these results to decision makers in interpretable ways. An important class of problem is the modelling of incidence such as fire, disease etc. Models of incidence known as point processes or Cox processes are particularly challenging as they are ‘doubly stochastic’ i.e. obtaining the probability mass function of incidents requires two integrals to be evaluated. Existing approaches to the problem either use simple models that obtain predictions using plug-in point estimates and do not distinguish between Cox processes and density estimation but do use sophisticated 3D visualization for interpretation. Alternatively other work employs sophisticated non-parametric Bayesian Cox process models, but do not use visualization to render interpretable complex spatial temporal forecasts. The contribution here is to fill this gap by inferring predictive distributions of Gaussian-log Cox processes and rendering them using state of the art 3D visualization techniques. This requires performing inference on an approximation of the model on a discretized grid of large scale and adapting an existing spatial-diurnal kernel to the log Gaussian Cox process context.
Resumo:
Multivariate predictive models are widely used tools for assessment of aquatic ecosystem health and models have been successfully developed for the prediction and assessment of aquatic macroinvertebrates, diatoms, local stream habitat features and fish. We evaluated the ability of a modelling method based on the River InVertebrate Prediction and Classification System (RIVPACS) to accurately predict freshwater fish assemblage composition and assess aquatic ecosystem health in rivers and streams of south-eastern Queensland, Australia. The predictive model was developed, validated and tested in a region of comparatively high environmental variability due to the unpredictable nature of rainfall and river discharge. The model was concluded to provide sufficiently accurate and precise predictions of species composition and was sensitive enough to distinguish test sites impacted by several common types of human disturbance (particularly impacts associated with catchment land use and associated local riparian, in-stream habitat and water quality degradation). The total number of fish species available for prediction was low in comparison to similar applications of multivariate predictive models based on other indicator groups, yet the accuracy and precision of our model was comparable to outcomes from such studies. In addition, our model developed for sites sampled on one occasion and in one season only (winter), was able to accurately predict fish assemblage composition at sites sampled during other seasons and years, provided that they were not subject to unusually extreme environmental conditions (e.g. extended periods of low flow that restricted fish movement or resulted in habitat desiccation and local fish extinctions).