140 resultados para methods: statistical
em University of Queensland eSpace - Australia
Resumo:
Background & Aims: An elevated transferrin saturation is the earliest phenotypic abnormality in hereditary hemochromatosis. Determination of transferrin saturation remains the most useful noninvasive screening test for affected individuals, but there is debate as to the appropriate screening level. The aims of this study were to estimate the mean transferrin saturation in hemochromatosis heterozygotes and normal individuals and to evaluate potential transferrin saturation screening levels. Methods: Statistical mixture modeling was applied to data from a survey of asymptomatic Australians to estimate the mean transferrin saturation in hemochromatosis heterozygotes and normal individuals. To evaluate potential transferrin saturation screening levels, modeling results were compared with data from identified hemochromatosis heterozygotes and homozygotes. Results: After removal of hemochromatosis homozygotes, two populations of transferrin saturation were identified in asymptomatic Australians (P < 0.01). In men, 88.2% of the truncated sample had a lower mean transferrin saturation of 24.1%, whereas 11.8% had an increased mean transferrin saturation of 37.3%. Similar results were found in women, A transferrin saturation threshold of 45% identified 98% of homozygotes without misidentifying any normal individuals. Conclusions: The results confirm that hemochromatosis heterozygotes form a distinct transferrin saturation subpopulation and support the use of transferrin saturation as an inexpensive screening test for hemochromatosis. In practice, a fasting transferrin saturation of greater than or equal to 45% identifies virtually all affected homozygous subjects without necessitating further investigation of unaffected normal individuals.
Resumo:
Background: Estimates of the performance of carbohydrate deficient transferrin (CDT) and gamma glutamyltransferase (GGT) as markers of alcohol consumption have varied widely. Studies have differed in design and subject characteristics. The WHO/ISBRA Collaborative Study allows assessment and comparison of CDT, GGT, and aspartate aminotransferase (AST) as markers of drinking in a large, well-characterized, multicenter sample. Methods: A total of 1863 subjects were recruited from five countries (Australia, Brazil, Canada, Finland, and Japan). Recruitment was stratified by alcohol use, age, and sex. Demographic characteristics, alcohol consumption, and presence of ICD-10 dependence were recorded using an interview schedule based on the AUDADIS, CDT was assayed using CDTect(TM) and GGT and AST by standard methods. Statistical techniques included receiver operating characteristic (ROC) analysis. Multiple regression was used to measure the impact of factors other than alcohol on test performance. Results: CDT and GGT had comparable performance on ROC analysis, with AST performing slightly less well. CDT was a slightly but significantly better marker of high-risk consumption in men. All were more effective for detection of high-risk rather than intermediate-risk drinking. CDT and GGT levels were influenced by body mass index, sex, age, and smoking status. Conclusions: CDT was little better than GGT in detecting high- or intermediate-risk alcohol consumption in this large, multicenter, predominantly community-based sample. As the two tests are relatively independent of each other, their combination is likely to provide better performance than either test alone, Test interpretation should take account sex, age. and body mass index.
Resumo:
The H I Parkes All Sky Survey (HIPASS) is a blind extragalactic H I 21-cm emission-line survey covering the whole southern sky from declination -90degrees to +25degrees. The HIPASS catalogue (HICAT), containing 4315 H I-selected galaxies from the region south of declination +2degrees, is presented in Meyer et al. (Paper I). This paper describes in detail the completeness and reliability of HICAT, which are calculated from the recovery rate of synthetic sources and follow-up observations, respectively. HICAT is found to be 99 per cent complete at a peak flux of 84 mJy and an integrated flux of 9.4 Jy km. s(-1). The overall reliability is 95 per cent, but rises to 99 per cent for sources with peak fluxes >58 mJy or integrated flux >8.2 Jy km s(-1). Expressions are derived for the uncertainties on the most important HICAT parameters: peak flux, integrated flux, velocity width and recessional velocity. The errors on HICAT parameters are dominated by the noise in the HIPASS data, rather than by the parametrization procedure.
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local false discovery rate is provided for each gene, and it can be implemented so that the implied global false discovery rate is bounded as with the Benjamini-Hochberg methodology based on tail areas. The latter procedure is too conservative, unless it is modified according to the prior probability that a gene is not differentially expressed. An attractive feature of the mixture model approach is that it provides a framework for the estimation of this probability and its subsequent use in forming a decision rule. The rule can also be formed to take the false negative rate into account.
Resumo:
We present a new algorithm for detecting intercluster galaxy filaments based upon the assumption that the orientations of constituent galaxies along such filaments are non-isotropic. We apply the algorithm to the 2dF Galaxy Redshift Survey catalogue and find that it readily detects many straight filaments between close cluster pairs. At large intercluster separations (> 15 h(-1) Mpc), we find that the detection efficiency falls quickly, as it also does with more complex filament morphologies. We explore the underlying assumptions and suggest that it is only in the case of close cluster pairs that we can expect galaxy orientations to be significantly correlated with filament direction.
Resumo:
We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework is provided that unifies two existing approaches to producing probabilistic outputs in the literature, one based on combining distribution estimates and the other based on combining probabilistic classifiers. We apply both of these to the problem of matching the HI Parkes All Sky Survey radio catalogue with large positional uncertainties to the much denser SuperCOSMOS catalogue with much smaller positional uncertainties. We demonstrate the utility of probabilistic outputs by a controllable completeness and efficiency trade-off and by identifying objects that have high probability of being rare. Finally, possible biasing effects in the output of these classifiers are also highlighted and discussed.
Resumo:
OBJECTIVE: To describe variation in all cause and selected cause-specific mortality rates across Australia. METHODS: Mortality and population data for 1997 were obtained from the Australian Bureau of Statistics. All cause and selected cause-specific mortality rates were calculated and directly standardised to the 1997 Australian population in 5-year age groups. Selected major causes of death included cancer, coronary artery disease, cerebrovascular disease, diabetes, accidents and suicide. Rates are reported by statistical division, and State and Territory. RESULTS: All cause age-standardised mortality was 6.98 per 1000 in 1997 and this varied 2-fold from a low in the statistical division of Pilbara, Western Australia (5.78, 95% confidence interval 5.06-6.56), to a high in Northern Territory-excluding Darwin (11.30, 10.67-11.98). Similar mortality variation (all p<0.0001) exists for cancer (1.01-2.23 per 1000) and coronary artery disease (0.99-2.23 per 1000), the two biggest killers. Larger variation (all p<0.0001) exists for cerebrovascular disease (0.7-11.8 per 10,000), diabetes (0.7-6.9 per 10,000), accidents (1.7-7.2 per 10,000) and suicide (0.6-3.8 per 10,000). Less marked variation was observed when analysed by State and Territory. but Northern Territory consistently has the highest age-standardised mortality rates. CONCLUSIONS: Analysed by statistical division, substantial mortality gradients exist across Australia, suggesting an inequitable distribution of the determinants of health. Further research is required to better understand this heterogeneity.
Resumo:
The monitoring of infection control indicators including hospital-acquired infections is an established part of quality maintenance programmes in many health-care facilities. However, surveillance data use can be frustrated by the infrequent nature of many infections. Traditional methods of analysis often provide delayed identification of increasing infection occurrence, placing patients at preventable risk. The application of Shewhart, Cumulative Sum (CUSUM) and Exponentially Weighted Moving Average (EWMA) statistical process control charts to the monitoring of indicator infections allows continuous real-time assessment. The Shewhart chart will detect large changes, while CUSUM and EWMA methods are more suited to recognition of small to moderate sustained change. When used together, Shewhart and EWMA methods are ideal for monitoring bacteraemia and multiresistant organism rates. Shewhart and CUSUM charts are suitable for surgical infection surveillance.
Resumo:
This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.
Resumo:
The effect of number of samples and selection of data for analysis on the calculation of surface motor unit potential (SMUP) size in the statistical method of motor unit number estimates (MUNE) was determined in 10 normal subjects and 10 with amyotrophic lateral sclerosis (ALS). We recorded 500 sequential compound muscle action potentials (CMAPs) at three different stable stimulus intensities (10–50% of maximal CMAP). Estimated mean SMUP sizes were calculated using Poisson statistical assumptions from the variance of 500 sequential CMAP obtained at each stimulus intensity. The results with the 500 data points were compared with smaller subsets from the same data set. The results using a range of 50–80% of the 500 data points were compared with the full 500. The effect of restricting analysis to data between 5–20% of the CMAP and to standard deviation limits was also assessed. No differences in mean SMUP size were found with stimulus intensity or use of different ranges of data. Consistency was improved with a greater sample number. Data within 5% of CMAP size gave both increased consistency and reduced mean SMUP size in many subjects, but excluded valid responses present at that stimulus intensity. These changes were more prominent in ALS patients in whom the presence of isolated SMUP responses was a striking difference from normal subjects. Noise, spurious data, and large SMUP limited the Poisson assumptions. When these factors are considered, consistent statistical MUNE can be calculated from a continuous sequence of data points. A 2 to 2.5 SD or 10% window are reasonable methods of limiting data for analysis. Muscle Nerve 27: 320–331, 2003
Resumo:
There has been a resurgence of interest in the mean trace length estimator of Pahl for window sampling of traces. The estimator has been dealt with by Mauldon and Zhang and Einstein in recent publications. The estimator is a very useful one in that it is non-parametric. However, despite some discussion regarding the statistical distribution of the estimator, none of the recent works or the original work by Pahl provide a rigorous basis for the determination a confidence interval for the estimator or a confidence region for the estimator and the corresponding estimator of trace spatial intensity in the sampling window. This paper shows, by consideration of a simplified version of the problem but without loss of generality, that the estimator is in fact the maximum likelihood estimator (MLE) and that it can be considered essentially unbiased. As the MLE, it possesses the least variance of all estimators and confidence intervals or regions should therefore be available through application of classical ML theory. It is shown that valid confidence intervals can in fact be determined. The results of the work and the calculations of the confidence intervals are illustrated by example. (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
An important aspect in manufacturing design is the distribution of geometrical tolerances so that an assembly functions with given probability, while minimising the manufacturing cost. This requires a complex search over a multidimensional domain, much of which leads to infeasible solutions and which can have many local minima. As well, Monte-Carlo methods are often required to determine the probability that the assembly functions as designed. This paper describes a genetic algorithm for carrying out this search and successfully applies it to two specific mechanical designs, enabling comparisons of a new statistical tolerancing design method with existing methods. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
This study has three main objectives. First, it develops a generalization of the commonly used EKS method to multilateral price comparisons. It is shown that the EKS system can be generalized so that weights can be attached to each of the link comparisons used in the EKS computations. These weights can account for differing levels of reliability of the underlying binary comparisons. Second, various reliability measures and corresponding weighting schemes are presented and their merits discussed. Finally, these new methods are applied to an international data set of manufacturing prices from the ICOP project. Although theoretically superior, it appears that the empirical impact of the weighted EKS method is generally small compared to the unweighted EKS. It is also found that this impact is larger when it is applied at lower levels of aggregation. Finally, the importance of using sector specific PPPs in assessing relative levels of manufacturing productivity is indicated.
Resumo:
Genetic assignment methods use genotype likelihoods to draw inference about where individuals were or were not born, potentially allowing direct, real-time estimates of dispersal. We used simulated data sets to test the power and accuracy of Monte Carlo resampling methods in generating statistical thresholds for identifying F-0 immigrants in populations with ongoing gene flow, and hence for providing direct, real-time estimates of migration rates. The identification of accurate critical values required that resampling methods preserved the linkage disequilibrium deriving from recent generations of immigrants and reflected the sampling variance present in the data set being analysed. A novel Monte Carlo resampling method taking into account these aspects was proposed and its efficiency was evaluated. Power and error were relatively insensitive to the frequency assumed for missing alleles. Power to identify F-0 immigrants was improved by using large sample size (up to about 50 individuals) and by sampling all populations from which migrants may have originated. A combination of plotting genotype likelihoods and calculating mean genotype likelihood ratios (D-LR) appeared to be an effective way to predict whether F-0 immigrants could be identified for a particular pair of populations using a given set of markers.