36 resultados para on-disk data layout
em University of Queensland eSpace - Australia
Resumo:
This paper reports a comparative study of Australian and New Zealand leadership attributes, based on the GLOBE (Global Leadership and Organizational Behavior Effectiveness) program. Responses from 344 Australian managers and 184 New Zealand managers in three industries were analyzed using exploratory and confirmatory factor analysis. Results supported some of the etic leadership dimensions identified in the GLOBE study, but also found some emic dimensions of leadership for each country. An interesting finding of the study was that the New Zealand data fitted the Australian model, but not vice versa, suggesting asymmetric perceptions of leadership in the two countries.
Resumo:
Background: Hospital performance reports based on administrative data should distinguish differences in quality of care between hospitals from case mix related variation and random error effects. A study was undertaken to determine which of 12 diagnosis-outcome indicators measured across all hospitals in one state had significant risk adjusted systematic ( or special cause) variation (SV) suggesting differences in quality of care. For those that did, we determined whether SV persists within hospital peer groups, whether indicator results correlate at the individual hospital level, and how many adverse outcomes would be avoided if all hospitals achieved indicator values equal to the best performing 20% of hospitals. Methods: All patients admitted during a 12 month period to 180 acute care hospitals in Queensland, Australia with heart failure (n = 5745), acute myocardial infarction ( AMI) ( n = 3427), or stroke ( n = 2955) were entered into the study. Outcomes comprised in-hospital deaths, long hospital stays, and 30 day readmissions. Regression models produced standardised, risk adjusted diagnosis specific outcome event ratios for each hospital. Systematic and random variation in ratio distributions for each indicator were then apportioned using hierarchical statistical models. Results: Only five of 12 (42%) diagnosis-outcome indicators showed significant SV across all hospitals ( long stays and same diagnosis readmissions for heart failure; in-hospital deaths and same diagnosis readmissions for AMI; and in-hospital deaths for stroke). Significant SV was only seen for two indicators within hospital peer groups ( same diagnosis readmissions for heart failure in tertiary hospitals and inhospital mortality for AMI in community hospitals). Only two pairs of indicators showed significant correlation. If all hospitals emulated the best performers, at least 20% of AMI and stroke deaths, heart failure long stays, and heart failure and AMI readmissions could be avoided. Conclusions: Diagnosis-outcome indicators based on administrative data require validation as markers of significant risk adjusted SV. Validated indicators allow quantification of realisable outcome benefits if all hospitals achieved best performer levels. The overall level of quality of care within single institutions cannot be inferred from the results of one or a few indicators.
Resumo:
In the wake of findings from the Bundaberg Hospital and Forster inquiries in Queensland, periodic public release of hospital performance reports has been recommended. A process for developing and releasing such reports is being established by Queensland Health, overseen by an independent expert panel. This recommendation presupposes that public reports based on routinely collected administrative data are accurate; that the public can access, correctly interpret and act upon report contents; that reports motivate hospital clinicians and managers to improve quality of care; and that there are no unintended adverse effects of public reporting. Available research suggests that primary data sources are often inaccurate and incomplete, that reports have low predictive value in detecting outlier hospitals, and that users experience difficulty in accessing and interpreting reports and tend to distrust their findings.
Resumo:
In many online applications, we need to maintain quantile statistics for a sliding window on a data stream. The sliding windows in natural form are defined as the most recent N data items. In this paper, we study the problem of estimating quantiles over other types of sliding windows. We present a uniform framework to process quantile queries for time constrained and filter based sliding windows. Our algorithm makes one pass on the data stream and maintains an E-approximate summary. It uses O((1)/(epsilon2) log(2) epsilonN) space where N is the number of data items in the window. We extend this framework to further process generalized constrained sliding window queries and proved that our technique is applicable for flexible window settings. Our performance study indicates that the space required in practice is much less than the given theoretical bound and the algorithm supports high speed data streams.
Resumo:
One of the challenges in scientific visualization is to generate software libraries suitable for the large-scale data emerging from tera-scale simulations and instruments. We describe the efforts currently under way at SDSC and NPACI to address these challenges. The scope of the SDSC project spans data handling, graphics, visualization, and scientific application domains. Components of the research focus on the following areas: intelligent data storage, layout and handling, using an associated “Floor-Plan” (meta data); performance optimization on parallel architectures; extension of SDSC’s scalable, parallel, direct volume renderer to allow perspective viewing; and interactive rendering of fractional images (“imagelets”), which facilitates the examination of large datasets. These concepts are coordinated within a data-visualization pipeline, which operates on component data blocks sized to fit within the available computing resources. A key feature of the scheme is that the meta data, which tag the data blocks, can be propagated and applied consistently. This is possible at the disk level, in distributing the computations across parallel processors; in “imagelet” composition; and in feature tagging. The work reflects the emerging challenges and opportunities presented by the ongoing progress in high-performance computing (HPC) and the deployment of the data, computational, and visualization Grids.
Resumo:
A new isotherm is proposed here for adsorption of condensable vapors and gases on nonporous materials having type II isotherms according to the Brunauer-Deming-Deming-Teller (BDDH) classification. The isotherm combines the recent molecular-continuum model in the multilayer region, with other widely used models for sub-monolayer coverage, some of which satisfy the requirement of a Henry's law asymptote. The model is successfully tested using isotherm data for nitrogen adsorption on nonporous silica, carbon and alumina, as well as benzene and hexane adsorption on nonporous carbon. Based on the data fits, out of several different alternative choices of model for the monolayer region, the Freundlich and the Unilan models are found to be the most successful when combined with the multilayer model to predict the whole isotherm. The hybrid model is consequently applicable over a wide pressure range. (C) 2000 Elsevier Science B.V. All rights reserved.
Resumo:
Performance indicators in the public sector have often been criticised for being inadequate and not conducive to analysing efficiency. The main objective of this study is to use data envelopment analysis (DEA) to examine the relative efficiency of Australian universities. Three performance models are developed, namely, overall performance, performance on delivery of educational services, and performance on fee-paying enrolments. The findings based on 1995 data show that the university sector was performing well on technical and scale efficiency but there was room for improving performance on fee-paying enrolments. There were also small slacks in input utilisation. More universities were operating at decreasing returns to scale, indicating a potential to downsize. DEA helps in identifying the reference sets for inefficient institutions and objectively determines productivity improvements. As such, it can be a valuable benchmarking tool for educational administrators and assist in more efficient allocation of scarce resources. In the absence of market mechanisms to price educational outputs, which renders traditional production or cost functions inappropriate, universities are particularly obliged to seek alternative efficiency analysis methods such as DEA.
Resumo:
The prognostic significance of positive peritoneal cytology in endometrial carcinoma has led to the incorporation of peritoneal cytology into the current FIGO staging system, While cytology was shown to be prognostically relevant in patients with stage II and III disease, conflicting data exists about its significance in patients who would have been stage I but were classified as stage III solely and exclusively on the basis of positive peritoneal cytology (clinical stage I). Analysis was based on the data of 369 consecutive patients with clinical stage I endometrioid adenocarcinoma of the endometrium. Standard treatment consisted of an abdominal total hysterectomy, bilateral salpingo-oophorectomy with or without pelvic lymph node dissection. Peritoneal cytology was obtained at laparotomy by peritoneal washing of the pouch of Douglas and was considered positive if malignant cells could be detected regardless of the number of malignant cells present. Disease-free survival (DFS) was considered the primary statistical endpoint. In 13/369 (3.5%) patients, positive peritoneal cytology was found. The median follow-up was 29 months and 15 recurrences occurred. Peritoneal cytology was independent of the depth of myometrial invasion and the grade of tumour differentiation, Patients with negative washings had a DFS of 96'7e at 36 months compared with 67% for patients with positive washings (log-rank P < 0.001). The presence of positive peritoneal cytology in patients with clinically stage I endometrioid adenocarcinoma of the endometrium is considered an adverse prognostic factor. (C) 2001 Elsevier Science Ireland Ltd. All rights reserved.
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
This paper proposes a novel application of fuzzy logic to web data mining for two basic problems of a website: popularity and satisfaction. Popularity means that people will visit the website while satisfaction refers to the usefulness of the site. We will illustrate that the popularity of a website is a fuzzy logic problem. It is an important characteristic of a website in order to survive in Internet commerce. The satisfaction of a website is also a fuzzy logic problem that represents the degree of success in the application of information technology to the business. We propose a framework of fuzzy logic for the representation of these two problems based on web data mining techniques to fuzzify the attributes of a website.
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Although smoking is widely recognized as a major cause of cancer, there is little information on how it contributes to the global and regional burden of cancers in combination with other risk factors that affect background cancer mortality patterns. We used data from the American Cancer Society's Cancer Prevention Study II (CPS-II) and the WHO and IARC cancer mortality databases to estimate deaths from 8 clusters of site-specific cancers caused by smoking, for 14 epidemiologic subregions of the world, by age and sex. We used lung cancer mortality as an indirect marker for accumulated smoking hazard. CPS-II hazards were adjusted for important covariates. In the year 2000, an estimated 1.42 (95% CI 1.27-1.57) million cancer deaths in the world, 21% of total global cancer deaths, were caused by smoking. Of these, 1.18 million deaths were among men and 0.24 million among women; 625,000 (95% CI 485,000-749,000) smoking-caused cancer deaths occurred in the developing world and 794,000 (95% CI 749,000-840,000) in industrialized regions. Lung cancer accounted for 60% of smoking-attributable cancer mortality, followed by cancers of the upper aerodigestive tract (20%). Based on available data, more than one in every 5 cancer deaths in the world in the year 2000 were caused by smoking, making it possibly the single largest preventable cause of cancer mortality. There was significant variability across regions in the role of smoking as a cause of the different site-specific cancers. This variability illustrates the importance of coupling research and surveillance of smoking with that for other risk factors for more effective cancer prevention. (C) 2005 Wiley-Liss, Inc.
Resumo:
Electricity market price forecast is a changeling yet very important task for electricity market managers and participants. Due to the complexity and uncertainties in the power grid, electricity prices are highly volatile and normally carry with spikes. which may be (ens or even hundreds of times higher than the normal price. Such electricity spikes are very difficult to be predicted. So far. most of the research on electricity price forecast is based on the normal range electricity prices. This paper proposes a data mining based electricity price forecast framework, which can predict the normal price as well as the price spikes. The normal price can be, predicted by a previously proposed wavelet and neural network based forecast model, while the spikes are forecasted based on a data mining approach. This paper focuses on the spike prediction and explores the reasons for price spikes based on the measurement of a proposed composite supply-demand balance index (SDI) and relative demand index (RDI). These indices are able to reflect the relationship among electricity demand, electricity supply and electricity reserve capacity. The proposed model is based on a mining database including market clearing price, trading hour. electricity), demand, electricity supply and reserve. Bayesian classification and similarity searching techniques are used to mine the database to find out the internal relationships between electricity price spikes and these proposed. The mining results are used to form the price spike forecast model. This proposed model is able to generate forecasted price spike, level of spike and associated forecast confidence level. The model is tested with the Queensland electricity market data with promising results. Crown Copyright (C) 2004 Published by Elsevier B.V. All rights reserved.