933 resultados para Data Models
Resumo:
Habitat models are widely used in ecology, however there are relatively few studies of rare species, primarily because of a paucity of survey records and lack of robust means of assessing accuracy of modelled spatial predictions. We investigated the potential of compiled ecological data in developing habitat models for Macadamia integrifolia, a vulnerable mid-stratum tree endemic to lowland subtropical rainforests of southeast Queensland, Australia. We compared performance of two binomial models—Classification and Regression Trees (CART) and Generalised Additive Models (GAM)—with Maximum Entropy (MAXENT) models developed from (i) presence records and available absence data and (ii) developed using presence records and background data. The GAM model was the best performer across the range of evaluation measures employed, however all models were assessed as potentially useful for informing in situ conservation of M. integrifolia, A significant loss in the amount of M. integrifolia habitat has occurred (p < 0.05), with only 37% of former habitat (pre-clearing) remaining in 2003. Remnant patches are significantly smaller, have larger edge-to-area ratios and are more isolated from each other compared to pre-clearing configurations (p < 0.05). Whilst the network of suitable habitat patches is still largely intact, there are numerous smaller patches that are more isolated in the contemporary landscape compared with their connectedness before clearing. These results suggest that in situ conservation of M. integrifolia may be best achieved through a landscape approach that considers the relative contribution of small remnant habitat fragments to the species as a whole, as facilitating connectivity among the entire network of habitat patches.
Resumo:
Longitudinal data, where data are repeatedly observed or measured on a temporal basis of time or age provides the foundation of the analysis of processes which evolve over time, and these can be referred to as growth or trajectory models. One of the traditional ways of looking at growth models is to employ either linear or polynomial functional forms to model trajectory shape, and account for variation around an overall mean trend with the inclusion of random eects or individual variation on the functional shape parameters. The identification of distinct subgroups or sub-classes (latent classes) within these trajectory models which are not based on some pre-existing individual classification provides an important methodology with substantive implications. The identification of subgroups or classes has a wide application in the medical arena where responder/non-responder identification based on distinctly diering trajectories delivers further information for clinical processes. This thesis develops Bayesian statistical models and techniques for the identification of subgroups in the analysis of longitudinal data where the number of time intervals is limited. These models are then applied to a single case study which investigates the neuropsychological cognition for early stage breast cancer patients undergoing adjuvant chemotherapy treatment from the Cognition in Breast Cancer Study undertaken by the Wesley Research Institute of Brisbane, Queensland. Alternative formulations to the linear or polynomial approach are taken which use piecewise linear models with a single turning point, change-point or knot at a known time point and latent basis models for the non-linear trajectories found for the verbal memory domain of cognitive function before and after chemotherapy treatment. Hierarchical Bayesian random eects models are used as a starting point for the latent class modelling process and are extended with the incorporation of covariates in the trajectory profiles and as predictors of class membership. The Bayesian latent basis models enable the degree of recovery post-chemotherapy to be estimated for short and long-term followup occasions, and the distinct class trajectories assist in the identification of breast cancer patients who maybe at risk of long-term verbal memory impairment.
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
Acoustic sensors play an important role in augmenting the traditional biodiversity monitoring activities carried out by ecologists and conservation biologists. With this ability however comes the burden of analysing large volumes of complex acoustic data. Given the complexity of acoustic sensor data, fully automated analysis for a wide range of species is still a significant challenge. This research investigates the use of citizen scientists to analyse large volumes of environmental acoustic data in order to identify bird species. Specifically, it investigates ways in which the efficiency of a user can be improved through the use of species identification tools and the use of reputation models to predict the accuracy of users with unidentified skill levels. Initial experimental results are reported.
Resumo:
With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.
Resumo:
The finite element (FE) analysis is an effective method to study the strength and predict the fracture risk of endodontically-treated teeth. This paper presents a rapid method developed to generate a comprehensive tooth FE model using data retrieved from micro-computed tomography (μCT). With this method, the inhomogeneity of material properties of teeth was included into the model without dividing the tooth model into different regions. The material properties of the tooth were assumed to be related to the mineral density. The fracture risk at different tooth portions was assessed for root canal treatments. The micro-CT images of a tooth were processed by a Matlab software programme and the CT numbers were retrieved. The tooth contours were obtained with thresholding segmentation using Amira. The inner and outer surfaces of the tooth were imported into Solidworks and a three-dimensional (3D) tooth model was constructed. An assembly of the tooth model with the periodontal ligament (PDL) layer and surrounding bone was imported into ABAQUS. The material properties of the tooth were calculated from the retrieved CT numbers via ABAQUS user's subroutines. Three root canal geometries (original and two enlargements) were investigated. The proposed method in this study can generate detailed 3D finite element models of a tooth with different root canal enlargements and filling materials, and would be very useful for the assessment of the fracture risk at different tooth portions after root canal treatments.
Resumo:
Confusion exists as to the age of the Abor Volcanics of NE India. Some consider the unit to have been emplaced in the Early Permian, others the Early Eocene, a difference of ∼230 million years. The divergence in opinion is significant because fundamentally different models explaining the geotectonic evolution of India depend on the age designation of the unit. Paleomagnetic data reported here from several exposures in the type locality of the formation in the lower Siang Valley indicate that steep dipping primary magnetizations (mean = 72.7 ± 6.2°, equating to a paleo-latitude of 58.1°) are recorded in the formation. These are only consistent with the unit being of Permian age, possibly Artinskian based on a magnetostratigraphic argument. Plate tectonic models for this time consistently show the NE corner of the sub-continent >50°S; in the Early Eocene it was just north of the equator, which would have resulted in the unit recording shallow directions. The mean declination is counter-clockwise rotated by ∼94°, around half of which can be related to the motion of the Indian block; the remainder is likely due local Himalayan-age thrusting in the Eastern Syntaxis. Several workers have correlated the Abor Volcanics with broadly coeval mafic volcanic suites in Oman, NE Pakistan–NW India and southern Tibet–Nepal, which developed in response to the Cimmerian block peeling-off eastern Gondwana in the Early-Middle Permian, but we believe there are problems with this model. Instead, we suggest that the Abor basalts relate to India–Antarctica/India–Australia extension that was happening at about the same time. Such an explanation best accommodates the relevant stratigraphical and structural data (present-day position within the Himalayan thrust stack), as well as the plate tectonic model for Permian eastern Gondwana.
Resumo:
Management of groundwater systems requires realistic conceptual hydrogeological models as a framework for numerical simulation modelling, but also for system understanding and communicating this to stakeholders and the broader community. To help overcome these challenges we developed GVS (Groundwater Visualisation System), a stand-alone desktop software package that uses interactive 3D visualisation and animation techniques. The goal was a user-friendly groundwater management tool that could support a range of existing real-world and pre-processed data, both surface and subsurface, including geology and various types of temporal hydrological information. GVS allows these data to be integrated into a single conceptual hydrogeological model. In addition, 3D geological models produced externally using other software packages, can readily be imported into GVS models, as can outputs of simulations (e.g. piezometric surfaces) produced by software such as MODFLOW or FEFLOW. Boreholes can be integrated, showing any down-hole data and properties, including screen information, intersected geology, water level data and water chemistry. Animation is used to display spatial and temporal changes, with time-series data such as rainfall, standing water levels and electrical conductivity, displaying dynamic processes. Time and space variations can be presented using a range of contouring and colour mapping techniques, in addition to interactive plots of time-series parameters. Other types of data, for example, demographics and cultural information, can also be readily incorporated. The GVS software can execute on a standard Windows or Linux-based PC with a minimum of 2 GB RAM, and the model output is easy and inexpensive to distribute, by download or via USB/DVD/CD. Example models are described here for three groundwater systems in Queensland, northeastern Australia: two unconfined alluvial groundwater systems with intensive irrigation, the Lockyer Valley and the upper Condamine Valley, and the Surat Basin, a large sedimentary basin of confined artesian aquifers. This latter example required more detail in the hydrostratigraphy, correlation of formations with drillholes and visualisation of simulation piezometric surfaces. Both alluvial system GVS models were developed during drought conditions to support government strategies to implement groundwater management. The Surat Basin model was industry sponsored research, for coal seam gas groundwater management and community information and consultation. The “virtual” groundwater systems in these 3D GVS models can be interactively interrogated by standard functions, plus production of 2D cross-sections, data selection from the 3D scene, rear end database and plot displays. A unique feature is that GVS allows investigation of time-series data across different display modes, both 2D and 3D. GVS has been used successfully as a tool to enhance community/stakeholder understanding and knowledge of groundwater systems and is of value for training and educational purposes. Projects completed confirm that GVS provides a powerful support to management and decision making, and as a tool for interpretation of groundwater system hydrological processes. A highly effective visualisation output is the production of short videos (e.g. 2–5 min) based on sequences of camera ‘fly-throughs’ and screen images. Further work involves developing support for multi-screen displays and touch-screen technologies, distributed rendering, gestural interaction systems. To highlight the visualisation and animation capability of the GVS software, links to related multimedia hosted online sites are included in the references.
Resumo:
This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.
Resumo:
The use of hedonic models to estimate the effects of various factors on house prices is well established. This paper examines a number of international hedonic house price models that seek to quantify the effect of infrastructure charges on new house prices. This work is an important factor in the housing affordability debate, with many governments in high growth areas having user-pays infrastructure charging policies operating in tandem with housing affordability objectives, with no empirical evidence on the impact of one on the other. This research finds there is little consistency between existing models and the data sets utilised. Specification appears dependent upon data availability rather than sound theoretical grounding. This may lead to a lack of external validity with model specification dependent upon data availability rather than sound theoretical grounding.
Resumo:
The motion response of marine structures in waves can be studied using finite-dimensional linear-time-invariant approximating models. These models, obtained using system identification with data computed by hydrodynamic codes, find application in offshore training simulators, hardware-in-the-loop simulators for positioning control testing, and also in initial designs of wave-energy conversion devices. Different proposals have appeared in the literature to address the identification problem in both time and frequency domains, and recent work has highlighted the superiority of the frequency-domain methods. This paper summarises practical frequency-domain estimation algorithms that use constraints on model structure and parameters to refine the search of approximating parametric models. Practical issues associated with the identification are discussed, including the influence of radiation model accuracy in force-to-motion models, which are usually the ultimate modelling objective. The illustration examples in the paper are obtained using a freely available MATLAB toolbox developed by the authors, which implements the estimation algorithms described.
Resumo:
Process models specify behavioral aspects by describing ordering constraints between tasks which must be accomplished to achieve envisioned goals. Tasks usually exchange information by means of data objects, i.e., by writing information to and reading information from data objects. A data object can be characterized by its states and allowed state transitions. In this paper, we propose a notion which checks conformance of a process model with respect to data objects that its tasks access. This new notion can be used to tell whether in every execution of a process model each time a task needs to access a data object in a particular state, it is ensured that the data object is in the expected state or can reach the expected state and, hence, the process model can achieve its goals.
Resumo:
Most of existing motorway traffic safety studies using disaggregate traffic flow data aim at developing models for identifying real-time traffic risks by comparing pre-crash and non-crash conditions. One of serious shortcomings in those studies is that non-crash conditions are arbitrarily selected and hence, not representative, i.e. selected non-crash data might not be the right data comparable with pre-crash data; the non-crash/pre-crash ratio is arbitrarily decided and neglects the abundance of non-crash over pre-crash conditions; etc. Here, we present a methodology for developing a real-time MotorwaY Traffic Risk Identification Model (MyTRIM) using individual vehicle data, meteorological data, and crash data. Non-crash data are clustered into groups called traffic regimes. Thereafter, pre-crash data are classified into regimes to match with relevant non-crash data. Among totally eight traffic regimes obtained, four highly risky regimes were identified; three regime-based Risk Identification Models (RIM) with sufficient pre-crash data were developed. MyTRIM memorizes the latest risk evolution identified by RIM to predict near future risks. Traffic practitioners can decide MyTRIM’s memory size based on the trade-off between detection and false alarm rates. Decreasing the memory size from 5 to 1 precipitates the increase of detection rate from 65.0% to 100.0% and of false alarm rate from 0.21% to 3.68%. Moreover, critical factors in differentiating pre-crash and non-crash conditions are recognized and usable for developing preventive measures. MyTRIM can be used by practitioners in real-time as an independent tool to make online decision or integrated with existing traffic management systems.