905 resultados para METHODS: STATISTICAL
Resumo:
Objectives. The purpose of this study was to elucidate behavioral determinants (prevailing attitudes and beliefs) of hand hygiene practices among undergraduate dental students in a dental school. ^ Methods. Statistical modeling using the Integrative Behavioral Model (IBM) prediction was utilized to develop a questionnaire for evaluating behavioral perceptions of hand hygiene practices by dental school students. Self-report questionnaires were given to second, third and fourth year undergraduate dental students. Models representing two distinct hand hygiene practices, termed "elective in-dental school hand hygiene practice" and "inherent in-dental school hand hygiene practice" were tested using linear regression analysis. ^ Results. 58 responses were received (24.5%); the sample mean age was 26.6 years old and females comprised 51%. In our models, elective in-dental school hand hygiene practice and inherent in-dental school hand hygiene practice, explained 40% and 28%, respectively, of the variance in behavioral intention. Translation of community hand hygiene practice to the dental school setting is the predominant driver of elective hand hygiene practice. Intended elective in-school hand hygiene practice is further significantly predicted by students' self-efficacy. Students' attitudes, peer pressure of other dental students and clinic administrators, and role modeling had minimal effects. Inherent hand hygiene intent was strongly predicted by students' beliefs in the benefits of the activity and, to a lesser extent, role modeling. Inherent and elective community behaviors were insignificant. ^ Conclusions. This study provided significant insights into dental student's hand hygiene behavior and can form the basis for an effective behavioral intervention program designed to improve hand hygiene compliance.^
Resumo:
The H I Parkes All Sky Survey (HIPASS) is a blind extragalactic H I 21-cm emission-line survey covering the whole southern sky from declination -90degrees to +25degrees. The HIPASS catalogue (HICAT), containing 4315 H I-selected galaxies from the region south of declination +2degrees, is presented in Meyer et al. (Paper I). This paper describes in detail the completeness and reliability of HICAT, which are calculated from the recovery rate of synthetic sources and follow-up observations, respectively. HICAT is found to be 99 per cent complete at a peak flux of 84 mJy and an integrated flux of 9.4 Jy km. s(-1). The overall reliability is 95 per cent, but rises to 99 per cent for sources with peak fluxes >58 mJy or integrated flux >8.2 Jy km s(-1). Expressions are derived for the uncertainties on the most important HICAT parameters: peak flux, integrated flux, velocity width and recessional velocity. The errors on HICAT parameters are dominated by the noise in the HIPASS data, rather than by the parametrization procedure.
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local false discovery rate is provided for each gene, and it can be implemented so that the implied global false discovery rate is bounded as with the Benjamini-Hochberg methodology based on tail areas. The latter procedure is too conservative, unless it is modified according to the prior probability that a gene is not differentially expressed. An attractive feature of the mixture model approach is that it provides a framework for the estimation of this probability and its subsequent use in forming a decision rule. The rule can also be formed to take the false negative rate into account.
Resumo:
We present a new algorithm for detecting intercluster galaxy filaments based upon the assumption that the orientations of constituent galaxies along such filaments are non-isotropic. We apply the algorithm to the 2dF Galaxy Redshift Survey catalogue and find that it readily detects many straight filaments between close cluster pairs. At large intercluster separations (> 15 h(-1) Mpc), we find that the detection efficiency falls quickly, as it also does with more complex filament morphologies. We explore the underlying assumptions and suggest that it is only in the case of close cluster pairs that we can expect galaxy orientations to be significantly correlated with filament direction.
Resumo:
We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework is provided that unifies two existing approaches to producing probabilistic outputs in the literature, one based on combining distribution estimates and the other based on combining probabilistic classifiers. We apply both of these to the problem of matching the HI Parkes All Sky Survey radio catalogue with large positional uncertainties to the much denser SuperCOSMOS catalogue with much smaller positional uncertainties. We demonstrate the utility of probabilistic outputs by a controllable completeness and efficiency trade-off and by identifying objects that have high probability of being rare. Finally, possible biasing effects in the output of these classifiers are also highlighted and discussed.
Resumo:
Fine-fraction (<63 µm) grain-size analyses of 530 samples from Holes 1095A, 1095B, and 1095D allow assessment of the downhole grain-size distribution at Drift 7. A variety of data processing methods, statistical treatment, and display techniques were used to describe this data set. The downhole fine-fraction grain-size distribution documents significant variations in the average grain-size composition and its cyclic pattern, revealed in five prominent intervals: (1) between 0 and 40 meters composite depth (mcd) (0 and 1.3 Ma), (2) between 40 and 80 mcd (1.3 and 2.4 Ma), (3) between 80 and 220 mcd (2.4 and 6 Ma), (4) between 220 and 360 mcd, and (5) below 360 mcd (prior to 8.1 Ma). In an approach designed to characterize depositional processes at Drift 7, we used statistical parameters determined by the method of moments for the sortable silt fraction to distinguish groups in the grainsize data set. We found three distinct grain-size populations and used these for a tentative environmental interpretation. Population 1 is related to a process in which glacially eroded shelf material was redeposited by turbidites with an ice-rafted debris influence. Population 2 is composed of interglacial turbidites. Population 3 is connected to depositional sequence tops linked to bioturbated sections that, in turn, are influenced by contourite currents and pelagic background sedimentation.
Resumo:
Introdução: A perda transitória da consciência e tónus postural seguido de rápida recuperação é definida como síncope. Tem sido dada atenção a uma síncope de origem central com descida da pressão sistémica conhecida por síncope vasovagal (SVV). Objetivos: A análise da variabilidade da frequência cardíaca (HRV) é uma das principais estratégias para estudar a SVV através de protocolos padrão (por exemplo tilt test). O principal objetivo deste trabalho é compreender a importância relativa de diversas variáveis, tais como pressão arterial diastólica e sistólica, (dBP) e (sBP), volume sistólico (SV) e resistência periférica total (TPR) na HRV. Métodos: Foram usados modelos estatísticos mistos para modelar o comportamento das variáveis acima descritas na HRV. Analisaram-se mais de mil e quinhentas observações de quatro pacientes com SVV, previamente testados com análise espectral clássica para a fase basal (LF/HF=3.01) e fases de tilt (LF/HF=0.64), indicando uma predominância vagal no período tilt. Resultados: O modelo 1 revelou o papel importante da dBP e uma baixa influência de SV, na fase de tilt, relativos à HRV. No modelo 2 a TPR revelou uma baixa influência na HRV na fase de tilt entre os pacientes. Conclusões: Verificou-se que a HRV é influenciada por um conjunto de variáveis fisiológicas, cuja contribuição individual pode ser usada para compreender as flutuações cardíacas. O uso de modelos estatísticos salientou a importância de estudar o papel da dBP e SV na SVV.
Resumo:
Now in its second edition, this book describes tools that are commonly used in transportation data analysis. The first part of the text provides statistical fundamentals while the second part presents continuous dependent variable models. With a focus on count and discrete dependent variable models, the third part features new chapters on mixed logit models, logistic regression, and ordered probability models. The last section provides additional coverage of Bayesian statistical modeling, including Bayesian inference and Markov chain Monte Carlo methods. Data sets are available online to use with the modeling techniques discussed.
Resumo:
The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.
Resumo:
The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an n≪p constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems.
Resumo:
Quality oriented management systems and methods have become the dominant business and governance paradigm. From this perspective, satisfying customers’ expectations by supplying reliable, good quality products and services is the key factor for an organization and even government. During recent decades, Statistical Quality Control (SQC) methods have been developed as the technical core of quality management and continuous improvement philosophy and now are being applied widely to improve the quality of products and services in industrial and business sectors. Recently SQC tools, in particular quality control charts, have been used in healthcare surveillance. In some cases, these tools have been modified and developed to better suit the health sector characteristics and needs. It seems that some of the work in the healthcare area has evolved independently of the development of industrial statistical process control methods. Therefore analysing and comparing paradigms and the characteristics of quality control charts and techniques across the different sectors presents some opportunities for transferring knowledge and future development in each sectors. Meanwhile considering capabilities of Bayesian approach particularly Bayesian hierarchical models and computational techniques in which all uncertainty are expressed as a structure of probability, facilitates decision making and cost-effectiveness analyses. Therefore, this research investigates the use of quality improvement cycle in a health vii setting using clinical data from a hospital. The need of clinical data for monitoring purposes is investigated in two aspects. A framework and appropriate tools from the industrial context are proposed and applied to evaluate and improve data quality in available datasets and data flow; then a data capturing algorithm using Bayesian decision making methods is developed to determine economical sample size for statistical analyses within the quality improvement cycle. Following ensuring clinical data quality, some characteristics of control charts in the health context including the necessity of monitoring attribute data and correlated quality characteristics are considered. To this end, multivariate control charts from an industrial context are adapted to monitor radiation delivered to patients undergoing diagnostic coronary angiogram and various risk-adjusted control charts are constructed and investigated in monitoring binary outcomes of clinical interventions as well as postintervention survival time. Meanwhile, adoption of a Bayesian approach is proposed as a new framework in estimation of change point following control chart’s signal. This estimate aims to facilitate root causes efforts in quality improvement cycle since it cuts the search for the potential causes of detected changes to a tighter time-frame prior to the signal. This approach enables us to obtain highly informative estimates for change point parameters since probability distribution based results are obtained. Using Bayesian hierarchical models and Markov chain Monte Carlo computational methods, Bayesian estimators of the time and the magnitude of various change scenarios including step change, linear trend and multiple change in a Poisson process are developed and investigated. The benefits of change point investigation is revisited and promoted in monitoring hospital outcomes where the developed Bayesian estimator reports the true time of the shifts, compared to priori known causes, detected by control charts in monitoring rate of excess usage of blood products and major adverse events during and after cardiac surgery in a local hospital. The development of the Bayesian change point estimators are then followed in a healthcare surveillances for processes in which pre-intervention characteristics of patients are viii affecting the outcomes. In this setting, at first, the Bayesian estimator is extended to capture the patient mix, covariates, through risk models underlying risk-adjusted control charts. Variations of the estimator are developed to estimate the true time of step changes and linear trends in odds ratio of intensive care unit outcomes in a local hospital. Secondly, the Bayesian estimator is extended to identify the time of a shift in mean survival time after a clinical intervention which is being monitored by riskadjusted survival time control charts. In this context, the survival time after a clinical intervention is also affected by patient mix and the survival function is constructed using survival prediction model. The simulation study undertaken in each research component and obtained results highly recommend the developed Bayesian estimators as a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances as well as industrial and business contexts. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The empirical results and simulations indicate that the Bayesian estimators are a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The advantages of the Bayesian approach seen in general context of quality control may also be extended in the industrial and business domains where quality monitoring was initially developed.
Resumo:
Cancer poses an undeniable burden to the health and wellbeing of the Australian community. In a recent report commissioned by the Australian Institute for Health and Welfare(AIHW, 2010), one in every two Australians on average will be diagnosed with cancer by the age of 85, making cancer the second leading cause of death in 2007, preceded only by cardiovascular disease. Despite modest decreases in standardised combined cancer mortality over the past few decades, in part due to increased funding and access to screening programs, cancer remains a significant economic burden. In 2010, all cancers accounted for an estimated 19% of the country's total burden of disease, equating to approximately $3:8 billion in direct health system costs (Cancer Council Australia, 2011). Furthermore, there remains established socio-economic and other demographic inequalities in cancer incidence and survival, for example, by indigenous status and rurality. Therefore, in the interests of the nation's health and economic management, there is an immediate need to devise data-driven strategies to not only understand the socio-economic drivers of cancer but also facilitate the implementation of cost-effective resource allocation for cancer management...
Resumo:
This thesis proposes three novel models which extend the statistical methodology for motor unit number estimation, a clinical neurology technique. Motor unit number estimation is important in the treatment of degenerative muscular diseases and, potentially, spinal injury. Additionally, a recent and untested statistic to enable statistical model choice is found to be a practical alternative for larger datasets. The existing methods for dose finding in dual-agent clinical trials are found to be suitable only for designs of modest dimensions. The model choice case-study is the first of its kind containing interesting results using so-called unit information prior distributions.