112 resultados para statistical methods
Resumo:
Cancer poses an undeniable burden to the health and wellbeing of the Australian community. In a recent report commissioned by the Australian Institute for Health and Welfare(AIHW, 2010), one in every two Australians on average will be diagnosed with cancer by the age of 85, making cancer the second leading cause of death in 2007, preceded only by cardiovascular disease. Despite modest decreases in standardised combined cancer mortality over the past few decades, in part due to increased funding and access to screening programs, cancer remains a significant economic burden. In 2010, all cancers accounted for an estimated 19% of the country's total burden of disease, equating to approximately $3:8 billion in direct health system costs (Cancer Council Australia, 2011). Furthermore, there remains established socio-economic and other demographic inequalities in cancer incidence and survival, for example, by indigenous status and rurality. Therefore, in the interests of the nation's health and economic management, there is an immediate need to devise data-driven strategies to not only understand the socio-economic drivers of cancer but also facilitate the implementation of cost-effective resource allocation for cancer management...
Resumo:
This thesis proposes three novel models which extend the statistical methodology for motor unit number estimation, a clinical neurology technique. Motor unit number estimation is important in the treatment of degenerative muscular diseases and, potentially, spinal injury. Additionally, a recent and untested statistic to enable statistical model choice is found to be a practical alternative for larger datasets. The existing methods for dose finding in dual-agent clinical trials are found to be suitable only for designs of modest dimensions. The model choice case-study is the first of its kind containing interesting results using so-called unit information prior distributions.
Resumo:
"We thank MrGilder for his considered comments and suggestions for alternative analyses of our data. We also appreciate Mr Gilder’s support of our call for larger studies to contribute to the evidence base for preoperative loading with high-carbohydrate fluids..."
Resumo:
The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.
Resumo:
Migraine is a painful disorder for which the etiology remains obscure. Diagnosis is largely based on International Headache Society criteria. However, no feature occurs in all patients who meet these criteria, and no single symptom is required for diagnosis. Consequently, this definition may not accurately reflect the phenotypic heterogeneity or genetic basis of the disorder. Such phenotypic uncertainty is typical for complex genetic disorders and has encouraged interest in multivariate statistical methods for classifying disease phenotypes. We applied three popular statistical phenotyping methods—latent class analysis, grade of membership and grade of membership “fuzzy” clustering (Fanny)—to migraine symptom data, and compared heritability and genome-wide linkage results obtained using each approach. Our results demonstrate that different methodologies produce different clustering structures and non-negligible differences in subsequent analyses. We therefore urge caution in the use of any single approach and suggest that multiple phenotyping methods be used.
Resumo:
Advances in symptom management strategies through a better understanding of cancer symptom clusters depend on the identification of symptom clusters that are valid and reliable. The purpose of this exploratory research was to investigate alternative analytical approaches to identify symptom clusters for patients with cancer, using readily accessible statistical methods, and to justify which methods of identification may be appropriate for this context. Three studies were undertaken: (1) a systematic review of the literature, to identify analytical methods commonly used for symptom cluster identification for cancer patients; (2) a secondary data analysis to identify symptom clusters and compare alternative methods, as a guide to best practice approaches in cross-sectional studies; and (3) a secondary data analysis to investigate the stability of symptom clusters over time. The systematic literature review identified, in 10 years prior to March 2007, 13 cross-sectional studies implementing multivariate methods to identify cancer related symptom clusters. The methods commonly used to group symptoms were exploratory factor analysis, hierarchical cluster analysis and principal components analysis. Common factor analysis methods were recommended as the best practice cross-sectional methods for cancer symptom cluster identification. A comparison of alternative common factor analysis methods was conducted, in a secondary analysis of a sample of 219 ambulatory cancer patients with mixed diagnoses, assessed within one month of commencing chemotherapy treatment. Principal axis factoring, unweighted least squares and image factor analysis identified five consistent symptom clusters, based on patient self-reported distress ratings of 42 physical symptoms. Extraction of an additional cluster was necessary when using alpha factor analysis to determine clinically relevant symptom clusters. The recommended approaches for symptom cluster identification using nonmultivariate normal data were: principal axis factoring or unweighted least squares for factor extraction, followed by oblique rotation; and use of the scree plot and Minimum Average Partial procedure to determine the number of factors. In contrast to other studies which typically interpret pattern coefficients alone, in these studies symptom clusters were determined on the basis of structure coefficients. This approach was adopted for the stability of the results as structure coefficients are correlations between factors and symptoms unaffected by the correlations between factors. Symptoms could be associated with multiple clusters as a foundation for investigating potential interventions. The stability of these five symptom clusters was investigated in separate common factor analyses, 6 and 12 months after chemotherapy commenced. Five qualitatively consistent symptom clusters were identified over time (Musculoskeletal-discomforts/lethargy, Oral-discomforts, Gastrointestinaldiscomforts, Vasomotor-symptoms, Gastrointestinal-toxicities), but at 12 months two additional clusters were determined (Lethargy and Gastrointestinal/digestive symptoms). Future studies should include physical, psychological, and cognitive symptoms. Further investigation of the identified symptom clusters is required for validation, to examine causality, and potentially to suggest interventions for symptom management. Future studies should use longitudinal analyses to investigate change in symptom clusters, the influence of patient related factors, and the impact on outcomes (e.g., daily functioning) over time.
Resumo:
Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.
Resumo:
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros