26 resultados para Data-driven Methods
em University of Queensland eSpace - Australia
Resumo:
The integration of geo-information from multiple sources and of diverse nature in developing mineral favourability indexes (MFIs) is a well-known problem in mineral exploration and mineral resource assessment. Fuzzy set theory provides a convenient framework to combine and analyse qualitative and quantitative data independently of their source or characteristics. A novel, data-driven formulation for calculating MFIs based on fuzzy analysis is developed in this paper. Different geo-variables are considered fuzzy sets and their appropriate membership functions are defined and modelled. A new weighted average-type aggregation operator is then introduced to generate a new fuzzy set representing mineral favourability. The membership grades of the new fuzzy set are considered as the MFI. The weights for the aggregation operation combine the individual membership functions of the geo-variables, and are derived using information from training areas and L, regression. The technique is demonstrated in a case study of skarn tin deposits and is used to integrate geological, geochemical and magnetic data. The study area covers a total of 22.5 km(2) and is divided into 349 cells, which include nine control cells. Nine geo-variables are considered in this study. Depending on the nature of the various geo-variables, four different types of membership functions are used to model the fuzzy membership of the geo-variables involved. (C) 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.
Resumo:
We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with excellent properties. The approach is in- spired by the principles of the generalized cross entropy method. The pro- posed density estimation procedure has numerous advantages over the tra- ditional kernel density estimator methods. Firstly, for the first time in the nonparametric literature, the proposed estimator allows for a genuine incor- poration of prior information in the density estimation procedure. Secondly, the approach provides the first data-driven bandwidth selection method that is guaranteed to provide a unique bandwidth for any data. Lastly, simulation examples suggest the proposed approach outperforms the current state of the art in nonparametric density estimation in terms of accuracy and reliability.
Resumo:
The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. We show in this paper that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.
Resumo:
In this and a preceding paper, we provide an introduction to the Fujitsu VPP range of vector-parallel supercomputers and to some of the computational chemistry software available for the VPP. Here, we consider the implementation and performance of seven popular chemistry application packages. The codes discussed range from classical molecular dynamics to semiempirical and ab initio quantum chemistry. All have evolved from sequential codes, and have typically been parallelised using a replicated data approach. As such they are well suited to the large-memory/fast-processor architecture of the VPP. For one code, CASTEP, a distributed-memory data-driven parallelisation scheme is presented. (C) 2000 Published by Elsevier Science B.V. All rights reserved.
Resumo:
Functional magnetic resonance imaging (FMRI) analysis methods can be quite generally divided into hypothesis-driven and data-driven approaches. The former are utilised in the majority of FMRI studies, where a specific haemodynamic response is modelled utilising knowledge of event timing during the scan, and is tested against the data using a t test or a correlation analysis. These approaches often lack the flexibility to account for variability in haemodynamic response across subjects and brain regions which is of specific interest in high-temporal resolution event-related studies. Current data-driven approaches attempt to identify components of interest in the data, but currently do not utilise any physiological information for the discrimination of these components. Here we present a hypothesis-driven approach that is an extension of Friman's maximum correlation modelling method (Neurolmage 16, 454-464, 2002) specifically focused on discriminating the temporal characteristics of event-related haemodynamic activity. Test analyses, on both simulated and real event-related FMRI data, will be presented.
Resumo:
Objective: To illustrate methodological issues involved in estimating dietary trends in populations using data obtained from various sources in Australia in the 1980s and 1990s. Methods: Estimates of absolute and relative change in consumption of selected food items were calculated using national data published annually on the national food supply for 1982-83 to 1992-93 and responses to food frequency questions in two population based risk factor surveys in 1983 and 1994 in the Hunter Region of New South Wales, Australia. The validity of estimated food quantities obtained from these inexpensive sources at the beginning of the period was assessed by comparison with data from a national dietary survey conducted in 1983 using 24 h recall. Results: Trend estimates from the food supply data and risk factor survey data were in good agreement for increases in consumption of fresh fruit, vegetables and breakfast food and decreases in butter, margarine, sugar and alcohol. Estimates for trends in milk, eggs and bread consumption, however, were inconsistent. Conclusions: Both data sources can be used for monitoring progress towards national nutrition goals based on selected food items provided that some limitations are recognized. While data collection methods should be consistent over time they also need to allow for changes in the food supply (for example the introduction of new varieties such as low-fat dairy products). From time to time the trends derived from these inexpensive data sources should be compared with data derived from more detailed and quantitative estimates of dietary intake.
Resumo:
We compare Bayesian methodology utilizing free-ware BUGS (Bayesian Inference Using Gibbs Sampling) with the traditional structural equation modelling approach based on another free-ware package, Mx. Dichotomous and ordinal (three category) twin data were simulated according to different additive genetic and common environment models for phenotypic variation. Practical issues are discussed in using Gibbs sampling as implemented by BUGS to fit subject-specific Bayesian generalized linear models, where the components of variation may be estimated directly. The simulation study (based on 2000 twin pairs) indicated that there is a consistent advantage in using the Bayesian method to detect a correct model under certain specifications of additive genetics and common environmental effects. For binary data, both methods had difficulty in detecting the correct model when the additive genetic effect was low (between 10 and 20%) or of moderate range (between 20 and 40%). Furthermore, neither method could adequately detect a correct model that included a modest common environmental effect (20%) even when the additive genetic effect was large (50%). Power was significantly improved with ordinal data for most scenarios, except for the case of low heritability under a true ACE model. We illustrate and compare both methods using data from 1239 twin pairs over the age of 50 years, who were registered with the Australian National Health and Medical Research Council Twin Registry (ATR) and presented symptoms associated with osteoarthritis occurring in joints of the hand.
Resumo:
Background: The OARSI Standing Committee for Clinical Trials Response Criteria Initiative had developed two sets of responder criteria to present the results of changes after treatment in three symptomatic domains (pain, function, and patient's global assessment) as a single variable for clinical trials (1). For each domain, a response was defined by both a relative and an absolute change, with different cut-offs with regard to the drug, the route of administration and the OA localization. Objective: To propose a simplified set of responder criteria with a similar cut-off, whatever the drug, the route or the OA localization. Methods: Data driven approach: (1) Two databases were considered The 'elaboration' database with which the formal OARSI sets of responder criteria were elaborated and The 'revisit' database. (2) Six different scenarios were evaluated: The two formal OARSI sets of criteria Four proposed scenarios of simplified sets of criteria Data from clinical randomized blinded placebo controlled trials were used to evaluate the performances of the two formal scenarios with two different databases ('elaboration' versus 'revisit') and those of the four proposed simplified scenarios within the 'revisit' database. The placebo effect, active effect, treatment effect, and the required sample arm size to obtain the placebo effect and the active treatment effect observed were the performances evaluated for each of the six scenarios. Experts' opinion approach: Results were discussed among the participants of the OMERACT VI meeting, who voted to select the definite OMERACT-OARSI set of criteria (one of the six evaluated scenarios). Results: Data driven approach: Fourteen trials totaling 1886 CA patients and fifteen studies involving 8164 CA patients were evaluated in the 'elaboration' and the 'revisit' databases respectively. The variability of the performances observed in the 'revisit' database when using the different simplified scenarios was similar to that observed between the two databases ('elaboration' versus 'revisit') when using the formal scenarios. The treatment effect and the required sample arm size were similar for each set of criteria. Experts' opinion approach: According to the experts, these two previous performances were the most important of an optimal set of responder criteria. They chose the set of criteria considering both pain and function as evaluation domain and requiring an absolute change and a relative change from baseline to define a response, with similar cut-offs whatever the drug, the route of administration or the CA localization. Conclusion: This data driven and experts' opinion approach is the basis for proposing an optimal simplified set of responder criteria for CA clinical trials. Other studies, using other sets of CA patients, are required in order to further validate this proposed OMERACT - OARSI set of criteria. (C) 2004 OsteoArthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
Resumo:
This paper provides information on the experimental set-up, data collection methods and results to date for the project Large scale modelling of coarse grained beaches, undertaken at the Large Wave Channel (GWK) of FZK in Hannover by an international group of researchers in Spring 2002. The main objective of the experiments was to provide full scale measurements of cross-shore processes on gravel and mixed beaches for the verification and further development of cross-shore numerical models of gravel and mixed sediment beaches. Identical random and regular wave tests were undertaken for a gravel beach and a mixed sand/gravel beach set up in the flume. Measurements included profile development, water surface elevation along the flume, internal pressures in the swash zone, piezometric head levels within the beach, run-up, flow velocities in the surf-zone and sediment size distributions. The purpose of the paper is to present to the scientific community the experimental procedure, a summary of the data collected, some initial results, as well as a brief outline of the on-going research being carried out with the data by different research groups. The experimental data is available to all the scientific community following submission of a statement of objectives, specification of data requirements and an agreement to abide with the GWK and EU protocols. (C) 2005 Elsevier B.V. All rights reserved.
Resumo:
There is a widely held paradigm that mangroves are critical for sustaining production in coastal fisheries through their role as important nursery areas for fisheries species. This paradigm frequently forms the basis for important management decisions on habitat conservation and restoration of mangroves and other coastal wetlands. This paper reviews the current status of the paradigm and synthesises the information on the processes underlying these potential links. In the past, the paradigm has been supported by studies identifying correlations between the areal and linear extent of mangroves and fisheries catch. This paper goes beyond the correlative approach to develop a new framework on which future evaluations can be based. First, the review identifies what type of marine animals are using mangroves and at what life stages. These species can be categorised as estuarine residents, marine-estuarine species and marine stragglers. The marine-estuarine category includes many commercial species that use mangrove habitats as nurseries. The second stage is to determine why these species are using mangroves as nurseries. The three main proposals are that mangroves provide a refuge from predators, high levels of nutrients and shelter from physical disturbances. The recognition of the important attributes of mangrove nurseries then allows an evaluation of how changes in mangroves will affect the associated fauna. Surprisingly few studies have addressed this question. Consequently, it is difficult to predict how changes in any of these mangrove attributes would affect the faunal communities within them and, ultimately, influence the fisheries associated with them. From the information available, it seems likely that reductions in mangrove habitat complexity would reduce the biodiversity and abundance of the associated fauna, and these changes have the potential to cause cascading effects at higher trophic levels with possible consequences for fisheries. Finally, there is a discussion of the data that are currently available on mangrove distribution and fisheries catch, the limitations of these data and how best to use the data to understand mangrove-fisheries links and, ultimately, to optimise habitat and fisheries management. Examples are drawn from two relatively data-rich regions, Moreton Bay (Australia) and Western Peninsular Malaysia, to illustrate the data needs and research requirements for investigating the mangrove-fisheries paradigm. Having reliable and accurate data at appropriate spatial and temporal scales is crucial for mangrove-fisheries investigations. Recommendations are made for improvements to data collection methods that would meet these important criteria. This review provides a framework on which to base future investigations of mangrove-fisheries links, based on an understanding of the underlying processes and the need for rigorous data collection. Without this information, the understanding of the relationship between mangroves and fisheries will remain limited. Future investigations of mangrove-fisheries links must take this into account in order to have a good ecological basis and to provide better information and understanding to both fisheries and conservation managers.
Resumo:
Sun exposure in childhood is I of the risk factors for developing skin cancer, yet little is known about levels of exposure at this age. This is particularly important in countries with high levels of ultraviolet radiation. (UVR) such as Australia. Among 49 children 3 to 5 years of age attending child care centers, UVR exposure was studied under 4 conditions in a repeated measures design; sunny days, cloudy days, teacher's instruction to stay in the shade, and a health professionals instruction to apply sunscreen. Three different data collection methods were employed: (a) completion of questionnaire or diary by parents and researcher, (b) polysulphone dosimeter readings, and (c) observational audits (video recording). Results of this study indicated that more than half the children had been sunburnt (pink or red) and more than a third had experienced painful sunburn (sore or tender) in the last summer. Most wore short sleeve shirts, short skirts or shorts and cap, that do not provide optimal levels of skin protection. However, sunscreen was applied to all exposed parts before the children went out to the playground. Over the period of I hr (9-10 a.m.) the average amount of time children spent in full sun was 22 min. On sunny days there was more variation across children in the amount of sun exposure received. While the potential amount of UVR exposure for young children during the hour they were outside on a sunny day was 1.45 MED (Minimum Erythemal Dose), they received on average 0.35 MED, which is an insufficient amount to result in an erythemal response on fair skin even without the use of sunscreen.
Resumo:
Biological wastewater treatment is a complex, multivariate process, in which a number of physical and biological processes occur simultaneously. In this study, principal component analysis (PCA) and parallel factor analysis (PARAFAC) were used to profile and characterise Lagoon 115E, a multistage biological lagoon treatment system at Melbourne Water's Western Treatment Plant (WTP) in Melbourne, Australia. In this study, the objective was to increase our understanding of the multivariate processes taking place in the lagoon. The data used in the study span a 7-year period during which samples were collected as often as weekly from the ponds of Lagoon 115E and subjected to analysis. The resulting database, involving 19 chemical and physical variables, was studied using the multivariate data analysis methods PCA and PARAFAC. With these methods, alterations in the state of the wastewater due to intrinsic and extrinsic factors could be discerned. The methods were effective in illustrating and visually representing the complex purification stages and cyclic changes occurring along the lagoon system. The two methods proved complementary, with each having its own beneficial features. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Registration of births, recording deaths by age, sex and cause, and calculating mortality levels and differentials are fundamental to evidence-based health policy, monitoring and evaluation. Yet few of the countries with the greatest need for these data have functioning systems to produce them despite legislation providing for the establishment and maintenance of vital registration. Sample vital registration (SVR), when applied in conjunction with validated verbal autopsy, procedures and implemented in a nationally representative sample of population clusters represents an affordable, cost-effective, and sustainable short- and medium-term solution to this problem. SVR complements other information sources by producing age-, sex-, and cause-specific mortality data that are more complete and continuous than those currently available. The tools and methods employed in an SVR system, however, are imperfect and require rigorous validation and continuous quality assurance; sampling strategies for SVR are also still evolving. Nonetheless, interest in establishing SVR is rapidly growing in Africa and Asia. Better systems for reporting and recording data on vital events will be sustainable only if developed hand-in-hand with existing health information strategies at the national and district levels; governance structures; and agendas for social research and development monitoring. If the global community wishes to have mortality measurements 5 or 10 years hence, the foundation stones of SVR must be laid today.