947 resultados para association rule mining
Resumo:
Discusses the role of negotiated frameworks as a regulatory mechanism in the development of Australia's premier industry of the 20th century.
Resumo:
This report explains the objectives, datasets and evaluation criteria of both the clustering and classification tasks set in the INEX 2009 XML Mining track. The report also describes the approaches and results obtained by the different participants.
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.
Resumo:
Anecdotal evidence from the infrastructure and building sectors highlights issues of drugs and alcohol and its association with safety risk on construction sites. Operating machinery and mobile equipment, proximity to live traffic together with congested sites, electrical equipment and operating at heights conspire to accentuate the potential adverse impact of drugs and alcohol in the workplace. While most Australian jurisdictions have identified this as a critical safety issue, information is limited regarding the prevalence of alcohol and other drugs in the workplace and there is limited evidential guidance regarding how to effectively and efficiently address such an issue. No known study has scientifically evaluated the relationship between the use of drugs and alcohol and safety impacts in construction, and there has been only limited adoption of nationally coordinated strategies, supported by employers and employees to render it socially unacceptable to arrive at a construction workplace with impaired judgement from drugs and alcohol. A nationally consistent collaborative approach across the construction workforce - involving employers and employees; clients; unions; contractors and sub-contractors is required to engender a cultural change in the construction workforce – in a similar manner to the on-going initiative in securing a cultural change to drink-driving in our society where peer intervention and support is encouraged. This study has four key objectives. Firstly, using the standard World Health Organisation AUDIT, a national qualitative and quantitative assessment of the use of drugs and alcohol will be carried out. This will build upon similar studies carried out in the Australian energy and mining sectors. Secondly, the development of an appropriate industry policy will adopt a non-punitive and rehabilitative approach developed in consultation with employers and employees across the infrastructure and building sectors, with the aim it be adopted nationally for adoption at the construction workplace. Thirdly, an industry-specific cultural change management program will be developed through a nationally collaborative approach to reducing the risk of impaired performance on construction sites and increasing workers’ commitment to drugs and alcohol safety. Finally, an implementation plan will be developed from data gathered from both managers and construction employees. Such an approach stands to benefit not only occupational health and safety, through a greater understanding of the safety impacts of alcohol and other drugs at work, but also alcohol and drug use as a wider community health issue. This paper will provide an overview of the background and significance of the study as well as outlining the proposed methodology that will be used to evaluate the safety impacts of alcohol and other drugs in the construction industry.
Resumo:
This paper presents an overview of technical solutions for regional area precise GNSS positioning services such as in Queensland. The research focuses on the technical and business issues that currently constrain GPS-based local area Real Time Kinematic (RTK) precise positioning services so as to operate in future across larger regional areas, and therefore support services in agriculture, mining, utilities, surveying, construction, and others. The paper first outlines an overall technical framework that has been proposed to transition the current RTK services to future larger scale coverage. The framework enables mixed use of different reference GNSS receiver types, dual- or triple-frequency, single or multiple systems, to provide RTK correction services to users equipped with any type of GNSS receivers. Next, data processing algorithms appropriate for triple-frequency GNSS signals are reviewed and some key performance benefits of using triple carrier signals for reliable RTK positioning over long distances are demonstrated. A server-based RTK software platform is being developed to allow for user positioning computations at server nodes instead of on the user's device. An optimal deployment scheme for reference stations across a larger-scale network has been suggested, given restrictions such as inter-station distances, candidates for reference locations, and operational modes. For instance, inter-station distances between triple-frequency receivers can be extended to 150km, which doubles the distance between dual-frequency receivers in the existing RTK network designs.
Resumo:
The Queensland Coal Industry Employees Health Scheme was implemented in 1993 to provide health surveillance for all Queensland coal industry workers. Tt1e government, mining employers and mining unions agreed that the scheme should operate for seven years. At the expiry of the scheme, an assessment of the contribution of health surveillance to meet coal industry needs would be an essential part of determining a future health surveillance program. This research project has analysed the data made available between 1993 and 1998. All current coal industry employees have had at least one health assessment. The project examined how the centralised nature of the Health Scheme benefits industry by identi~)jng key health issues and exploring their dimensions on a scale not possible by corporate based health surveillance programs. There is a body of evidence that indicates that health awareness - on the scale of the individual, the work group and the industry is not a part of the mining industry culture. There is also growing evidence that there is a need for this culture to change and that some change is in progress. One element of this changing culture is a growth in the interest by the individual and the community in information on health status and benchmarks that are reasonably attainable. This interest opens the way for health education which contains personal, community and occupational elements. An important element of such education is the data on mine site health status. This project examined the role of health surveillance in the coal mining industry as a tool for generating the necessary information to promote an interest in health awareness. The Health Scheme Database provides the material for the bulk of the analysis of this project. After a preliminary scan of the data set, more detailed analysis was undertaken on key health and related safety issues that include respiratory disorders, hearing loss and high blood pressure. The data set facilitates control for confounding factors such as age and smoking status. Mines can be benchmarked to identify those mines with effective health management and those with particular challenges. While the study has confirmed the very low prevalence of restrictive airway disease such as pneu"moconiosis, it has demonstrated a need to examine in detail the emergence of obstructive airway disease such as bronchitis and emphysema which may be a consequence of the increasing use of high dust longwall technology. The power of the Health Database's electronic data management is demonstrated by linking the health data to other data sets such as injury data that is collected by the Department of l\1mes and Energy. The analysis examines serious strain -sprain injuries and has identified a marked difference between the underground and open cut sectors of the industry. The analysis also considers productivity and OHS data to examine the extent to which there is correlation between any pairs ofJpese and previously analysed health parameters. This project has demonstrated that the current structure of the Coal Industry Employees Health Scheme has largely delivered to mines and effective health screening process. At the same time, the centralised nature of data collection and analysis has provided to the mines, the unions and the government substantial statistical cross-sectional data upon which strategies to more effectively manage health and relates safety issues can be based.
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
Background Length of hospital stay (LOS) is a surrogate marker for patients' well-being during hospital treatment and is associated with health care costs. Identifying pretreatment factors associated with LOS in surgical patients may enable early intervention in order to reduce postoperative LOS. Methods This cohort study enrolled 157 patients with suspected or proven gynecological cancer at a tertiary cancer centre (2004-2006). Before commencing treatment, the scored Patient Generated - Subjective Global Assessment (PG-SGA) measuring nutritional status and the Functional Assessment of Cancer Therapy-General (FACT-G) scale measuring quality of life (QOL) were completed. Clinical and demographic patient characteristics were prospectively obtained. Patients were grouped into those with prolonged LOS if their hospital stay was greater than the median LOS and those with average or below average LOS. Results Patients' mean age was 58 years (SD 14 years). Preoperatively, 81 (52%) patients presented with suspected benign disease/pelvic mass, 23 (15%) with suspected advanced ovarian cancer, 36 (23%) patients with suspected endometrial and 17 (11%) with cervical cancer, respectively. In univariate models prolonged LOS was associated with low serum albumin or hemoglobin, malnutrition (PG-SGA score and PG-SGA group B or C), low pretreatment FACT-G score, and suspected diagnosis of cancer. In multivariable models, PG-SGA group B or C, FACT-G score and suspected diagnosis of advanced ovarian cancer independently predicted LOS. Conclusions Malnutrition, low quality of life scores and being diagnosed with advanced ovarian cancer are the major determinants of prolonged LOS amongst gynecological cancer patients. Interventions addressing malnutrition and poor QOL may decrease LOS in gynecological cancer patients.