948 resultados para STATISTICAL-METHODS
Resumo:
This article reviews the statistical methods that have been used to study the planar distribution, and especially clustering, of objects in histological sections of brain tissue. The objective of these studies is usually quantitative description, comparison between patients or correlation between histological features. Objects of interest such as neurones, glial cells, blood vessels or pathological features such as protein deposits appear as sectional profiles in a two-dimensional section. These objects may not be randomly distributed within the section but exhibit a spatial pattern, a departure from randomness either towards regularity or clustering. The methods described include simple tests of whether the planar distribution of a histological feature departs significantly from randomness using randomized points, lines or sample fields and more complex methods that employ grids or transects of contiguous fields, and which can detect the intensity of aggregation and the sizes, distribution and spacing of clusters. The usefulness of these methods in understanding the pathogenesis of neurodegenerative diseases such as Alzheimer's disease and Creutzfeldt-Jakob disease is discussed. © 2006 The Royal Microscopical Society.
Resumo:
Two contrasting multivariate statistical methods, viz., principal components analysis (PCA) and cluster analysis were applied to the study of neuropathological variations between cases of Alzheimer's disease (AD). To compare the two methods, 78 cases of AD were analyzed, each characterised by measurements of 47 neuropathological variables. Both methods of analysis revealed significant variations between AD cases. These variations were related primarily to differences in the distribution and abundance of senile plaques (SP) and neurofibrillary tangles (NFT) in the brain. Cluster analysis classified the majority of AD cases into five groups which could represent subtypes of AD. However, PCA suggested that variation between cases was more continuous with no distinct subtypes. Hence, PCA may be a more appropriate method than cluster analysis in the study of neuropathological variations between AD cases.
Resumo:
The topic of this thesis is the development of knowledge based statistical software. The shortcomings of conventional statistical packages are discussed to illustrate the need to develop software which is able to exhibit a greater degree of statistical expertise, thereby reducing the misuse of statistical methods by those not well versed in the art of statistical analysis. Some of the issues involved in the development of knowledge based software are presented and a review is given of some of the systems that have been developed so far. The majority of these have moved away from conventional architectures by adopting what can be termed an expert systems approach. The thesis then proposes an approach which is based upon the concept of semantic modelling. By representing some of the semantic meaning of data, it is conceived that a system could examine a request to apply a statistical technique and check if the use of the chosen technique was semantically sound, i.e. will the results obtained be meaningful. Current systems, in contrast, can only perform what can be considered as syntactic checks. The prototype system that has been implemented to explore the feasibility of such an approach is presented, the system has been designed as an enhanced variant of a conventional style statistical package. This involved developing a semantic data model to represent some of the statistically relevant knowledge about data and identifying sets of requirements that should be met for the application of the statistical techniques to be valid. Those areas of statistics covered in the prototype are measures of association and tests of location.
Resumo:
The last decade has seen a considerable increase in the application of quantitative methods in the study of histological sections of brain tissue and especially in the study of neurodegenerative disease. These disorders are characterised by the deposition and aggregation of abnormal or misfolded proteins in the form of extracellular protein deposits such as senile plaques (SP) and intracellular inclusions such as neurofibrillary tangles (NFT). Quantification of brain lesions and studying the relationships between lesions and normal anatomical features of the brain, including neurons, glial cells, and blood vessels, has become an important method of elucidating disease pathogenesis. This review describes methods for quantifying the abundance of a histological feature such as density, frequency, and 'load' and the sampling methods by which quantitative measures can be obtained including plot/quadrat sampling, transect sampling, and the point-quarter method. In addition, methods for determining the spatial pattern of a histological feature, i.e., whether the feature is distributed at random, regularly, or is aggregated into clusters, are described. These methods include the use of the Poisson and binomial distributions, pattern analysis by regression, Fourier analysis, and methods based on mapped point patterns. Finally, the statistical methods available for studying the degree of spatial correlation between pathological lesions and neurons, glial cells, and blood vessels are described.
Resumo:
Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.
Resumo:
The development of new, health supporting food of high quality and the optimization of food technological processes today require the application of statistical methods of experimental design. The principles and steps of statistical planning and evaluation of experiments will be explained. By example of the development of a gluten-free rusk (zwieback), which is enriched by roughage compounds the application of a simplex-centroid mixture design will be shown. The results will be illustrated by different graphics.
Resumo:
Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.
Resumo:
Hypertrophic cardiomyopathy (HCM) is a cardiovascular disease where the heart muscle is partially thickened and blood flow is - potentially fatally - obstructed. It is one of the leading causes of sudden cardiac death in young people. Electrocardiography (ECG) and Echocardiography (Echo) are the standard tests for identifying HCM and other cardiac abnormalities. The American Heart Association has recommended using a pre-participation questionnaire for young athletes instead of ECG or Echo tests due to considerations of cost and time involved in interpreting the results of these tests by an expert cardiologist. Initially we set out to develop a classifier for automated prediction of young athletes’ heart conditions based on the answers to the questionnaire. Classification results and further in-depth analysis using computational and statistical methods indicated significant shortcomings of the questionnaire in predicting cardiac abnormalities. Automated methods for analyzing ECG signals can help reduce cost and save time in the pre-participation screening process by detecting HCM and other cardiac abnormalities. Therefore, the main goal of this dissertation work is to identify HCM through computational analysis of 12-lead ECG. ECG signals recorded on one or two leads have been analyzed in the past for classifying individual heartbeats into different types of arrhythmia as annotated primarily in the MIT-BIH database. In contrast, we classify complete sequences of 12-lead ECGs to assign patients into two groups: HCM vs. non-HCM. The challenges and issues we address include missing ECG waves in one or more leads and the dimensionality of a large feature-set. We address these by proposing imputation and feature-selection methods. We develop heartbeat-classifiers by employing Random Forests and Support Vector Machines, and propose a method to classify full 12-lead ECGs based on the proportion of heartbeats classified as HCM. The results from our experiments show that the classifiers developed using our methods perform well in identifying HCM. Thus the two contributions of this thesis are the utilization of computational and statistical methods for discovering shortcomings in a current screening procedure and the development of methods to identify HCM through computational analysis of 12-lead ECG signals.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
La stratégie actuelle de contrôle de la qualité de l’anode est inadéquate pour détecter les anodes défectueuses avant qu’elles ne soient installées dans les cuves d’électrolyse. Des travaux antérieurs ont porté sur la modélisation du procédé de fabrication des anodes afin de prédire leurs propriétés directement après la cuisson en utilisant des méthodes statistiques multivariées. La stratégie de carottage des anodes utilisée à l’usine partenaire fait en sorte que ce modèle ne peut être utilisé que pour prédire les propriétés des anodes cuites aux positions les plus chaudes et les plus froides du four à cuire. Le travail actuel propose une stratégie pour considérer l’histoire thermique des anodes cuites à n’importe quelle position et permettre de prédire leurs propriétés. Il est montré qu’en combinant des variables binaires pour définir l’alvéole et la position de cuisson avec les données routinières mesurées sur le four à cuire, les profils de température des anodes cuites à différentes positions peuvent être prédits. Également, ces données ont été incluses dans le modèle pour la prédiction des propriétés des anodes. Les résultats de prédiction ont été validés en effectuant du carottage supplémentaire et les performances du modèle sont concluantes pour la densité apparente et réelle, la force de compression, la réactivité à l’air et le Lc et ce peu importe la position de cuisson.
Resumo:
Abstract: Quantitative Methods (QM) is a compulsory course in the Social Science program in CEGEP. Many QM instructors assign a number of homework exercises to give students the opportunity to practice the statistical methods, which enhances their learning. However, traditional written exercises have two significant disadvantages. The first is that the feedback process is often very slow. The second disadvantage is that written exercises can generate a large amount of correcting for the instructor. WeBWorK is an open-source system that allows instructors to write exercises which students answer online. Although originally designed to write exercises for math and science students, WeBWorK programming allows for the creation of a variety of questions which can be used in the Quantitative Methods course. Because many statistical exercises generate objective and quantitative answers, the system is able to instantly assess students’ responses and tell them whether they are right or wrong. This immediate feedback has been shown to be theoretically conducive to positive learning outcomes. In addition, the system can be set up to allow students to re-try the problem if they got it wrong. This has benefits both in terms of student motivation and reinforcing learning. Through the use of a quasi-experiment, this research project measured and analysed the effects of using WeBWorK exercises in the Quantitative Methods course at Vanier College. Three specific research questions were addressed. First, we looked at whether students who did the WeBWorK exercises got better grades than students who did written exercises. Second, we looked at whether students who completed more of the WeBWorK exercises got better grades than students who completed fewer of the WeBWorK exercises. Finally, we used a self-report survey to find out what students’ perceptions and opinions were of the WeBWorK and the written exercises. For the first research question, a crossover design was used in order to compare whether the group that did WeBWorK problems during one unit would score significantly higher on that unit test than the other group that did the written problems. We found no significant difference in grades between students who did the WeBWorK exercises and students who did the written exercises. The second research question looked at whether students who completed more of the WeBWorK exercises would get significantly higher grades than students who completed fewer of the WeBWorK exercises. The straight-line relationship between number of WeBWorK exercises completed and grades was positive in both groups. However, the correlation coefficients for these two variables showed no real pattern. Our third research question was investigated by using a survey to elicit students’ perceptions and opinions regarding the WeBWorK and written exercises. Students reported no difference in the amount of effort put into completing each type of exercise. Students were also asked to rate each type of exercise along six dimensions and a composite score was calculated. Overall, students gave a significantly higher score to the written exercises, and reported that they found the written exercises were better for understanding the basic statistical concepts and for learning the basic statistical methods. However, when presented with the choice of having only written or only WeBWorK exercises, slightly more students preferred or strongly preferred having only WeBWorK exercises. The results of this research suggest that the advantages of using WeBWorK to teach Quantitative Methods are variable. The WeBWorK system offers immediate feedback, which often seems to motivate students to try again if they do not have the correct answer. However, this does not necessarily translate into better performance on the written tests and on the final exam. What has been learned is that the WeBWorK system can be used by interested instructors to enhance student learning in the Quantitative Methods course. Further research may examine more specifically how this system can be used more effectively.
Resumo:
This dissertation applies statistical methods to the evaluation of automatic summarization using data from the Text Analysis Conferences in 2008-2011. Several aspects of the evaluation framework itself are studied, including the statistical testing used to determine significant differences, the assessors, and the design of the experiment. In addition, a family of evaluation metrics is developed to predict the score an automatically generated summary would receive from a human judge and its results are demonstrated at the Text Analysis Conference. Finally, variations on the evaluation framework are studied and their relative merits considered. An over-arching theme of this dissertation is the application of standard statistical methods to data that does not conform to the usual testing assumptions.
Resumo:
The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.