934 resultados para Strip mining -- Queensland
Resumo:
This report explains the objectives, datasets and evaluation criteria of both the clustering and classification tasks set in the INEX 2009 XML Mining track. The report also describes the approaches and results obtained by the different participants.
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
In a seminal data mining article, Leo Breiman [1] argued that to develop effective predictive classification and regression models, we need to move away from the sole dependency on statistical algorithms and embrace a wider toolkit of modeling algorithms that include data mining procedures. Nevertheless, many researchers still rely solely on statistical procedures when undertaking data modeling tasks; the sole reliance on these procedures has lead to the development of irrelevant theory and questionable research conclusions ([1], p.199). We will outline initiatives that the HPC & Research Support group is undertaking to engage researchers with data mining tools and techniques; including a new range of seminars, workshops, and one-on-one consultations covering data mining algorithms, the relationship between data mining and the research cycle, and limitations and problems with these new algorithms. Organisational limitations and restrictions to these initiatives are also discussed.
Resumo:
Security of tenure is the cornerstone of the land management system in Australia. Freehold title is protected throug indefeasibility of title entrenched in legislation and protection of registrable interests in land is offered through the Statutory Assurance Fund. For those with interests pertaining to Crown Land no such protection is offered, although this position is not uniform across Australia. Notably those with Crown leasehold interests or a profit a prendre on Crown Land in Queensland are not protected through registration on the freehold land register and do not have the benefit of indefeasibility of title. The issue of management of interests pertaining to Crown Land has become increasingly relevant due to the complexities associated with balancing public interests including native title with more commercial interests in land generated through carbon sequestration, forestry and mining. This paper considers the framework for the management of Crown Land in Queensland and the adequacy of this framework for commercial interests that pertain to Crown Land.
Resumo:
A method of selecting land in any region of Queensland for offsetting purposes is devised, employing uniform standards. The procedure first requires that any core natural asset lands, Crown environmental lands, prime urban and agricultural lands, and highly contentious sites in the region be eliminated from consideration. Other land is then sought that is located between existing large reservations and the centre of greatest potential regional development/disturbance. Using the criteria of rehabilitation (rather than preservation) plus proximity to those officially defined Regional Ecosystems that are most threatened, adjacent sites that are described as ‘Cleared’ are identified in terms of agricultural land capability. Class IV lands – defined as those ‘which may be safely used for occasional cultivation with careful management’,2 ‘where it is favourably located for special usage’,3 and where it is ‘helpful to those who are interested in industry or regional planning or in reconstruction’4 – are examined for their appropriate area, for current tenure and for any conditions such as Mining Leases that may exist. The positive impacts from offsets on adjoining lands can then be designed to be significant; examples are also offered in respect of riparian areas and of Marine Parks. Criteria against which to measure performance for trading purposes include functional lift, with other case studies about this matter reported separately in this issue. The procedure takes no account of demand side economics (financial additionality), which requires commercial rather than environmental analysis.
Resumo:
Global warming is already threatening many animal and plant communities worldwide, however, the effect of climate change on bat populations is poorly known. Understanding the factors influencing the survival of bats is crucial to their conservation, and this cannot be achieved solely by modern ecological studies. Palaeoecological investigations provide a perspective over a much longer temporal scale, allowing the understanding of the dynamic patterns that shaped the distribution of modern taxa. In this study twelve microchiropteran fossil assemblages from Mount Etna, central-eastern Queensland, ranging in age from more than 500,000 years to the present day, were investigated. The aim was to assess the responses of insectivorous bats to Quaternary environmental changes, including climatic fluctuations and recent anthropogenic impacts. In particular, this investigation focussed on the effects of increasing late Pleistocene aridity, the subsequent retraction of rainforest habitat, and the impact of cave mining following European settlement at Mount Etna. A thorough examination of the dental morphology of all available extant Australian bat taxa was conducted in order to identify the fossil taxa prior to their analysis in term of species richness and composition. This detailed odontological work provided new diagnostic dental characters for eighteen species and one genus. It also provided additional useful dental characters for three species and seven genera. This odontological analysis allowed the identification of fifteen fossil bat taxa from the Mount Etna deposits, all being representatives of extant bats, and included ten taxa identified to the species level (i.e., Macroderma gigas, Hipposideros semoni, Rhinolophus megaphyllus, Miniopterus schreibersii, Miniopterus australis, Scoteanax rueppellii, Chalinolobus gouldii, Chalinolobus dwyeri, Chalinolobus nigrogriseus and Vespadelus troughtoni) and five taxa identified to the generic level (i.e., Mormopterus, Taphozous, Nyctophilus, Scotorepens and Vespadelus). Palaeoecological analysis of the fossil taxa revealed that, unlike the non-volant mammal taxa, bats have remained essentially stable in terms of species diversity and community membership between the mid-Pleistocene rainforest habitat and the mesic habitat that occurs today in the region. The single major exception is Hipposideros semoni, which went locally extinct at Mount Etna. Additionally, while intensive mining operations resulted in the abandonment of at least one cave that served as a maternity roost in the recent past, the diversity of the Mount Etna bat fauna has not declined since European colonisation. The overall resilience through time of the bat species discussed herein is perhaps due to their unique ecological, behavioural, and physiological characteristics as well as their ability to fly, which have allowed them to successfully adapt to their changing environment. This study highlights the importance of palaeoecological analyses as a tool to gain an understanding of how bats have responded to environmental change in the past and provides valuable information for the conservation of threatened modern species, such as H. semoni.
Resumo:
Road safety is a major concern worldwide. Road safety will improve as road conditions and their effects on crashes are continually investigated. This paper proposes to use the capability of data mining to include the greater set of road variables for all available crashes with skid resistance values across the Queensland state main road network in order to understand the relationships among crash, traffic and road variables. This paper presents a data mining based methodology for the road asset management data to find out the various road properties that contribute unduly to crashes. The models demonstrate high levels of accuracy in predicting crashes in roads when various road properties are included. This paper presents the findings of these models to show the relationships among skid resistance, crashes, crash characteristics and other road characteristics such as seal type, seal age, road type, texture depth, lane count, pavement width, rutting, speed limit, traffic rates intersections, traffic signage and road design and so on.
Resumo:
Road crashes cost world and Australian society a significant proportion of GDP, affecting productivity and causing significant suffering for communities and individuals. This paper presents a case study that generates data mining models that contribute to understanding of road crashes by allowing examination of the role of skid resistance (F60) and other road attributes in road crashes. Predictive data mining algorithms, primarily regression trees, were used to produce road segment crash count models from the road and traffic attributes of crash scenarios. The rules derived from the regression trees provide evidence of the significance of road attributes in contributing to crash, with a focus on the evaluation of skid resistance.
Resumo:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.