905 resultados para Methods : Statistical
Resumo:
Purpose: To develop an effective method for evaluating the quality of Cortex berberidis from different geographical origins. Methods: A simple, precise and accurate high performance liquid chromatography (HPLC) method was first developed for simultaneous quantification of four active alkaloids (magnoflorine, jatrorrhizine, palmatine, and berberine) in Cortex berberidis obtained from Qinghai, Tibet and Sichuan Provinces of China. Method validation was performed in terms of precision, repeatability, stability, accuracy, and linearity. Besides, partial least squares discriminant analysis (PLS-DA) and one-way analysis of variance (ANOVA) were applied to study the quality variations of Cortex berberidis from various geographical origins. Results: The proposed HPLC method showed good linearity, precision, repeatability, and accuracy. The four alkaloids were detected in all samples of Cortex berberidis. Among them, magnoflorine (36.46 - 87.30 mg/g) consistently showed the highest amounts in all the samples, followed by berberine (16.00 - 37.50 mg/g). The content varied in the range of 0.66 - 4.57 mg/g for palmatine and 1.53 - 16.26 mg/g for jatrorrhizine, respectively. The total content of the four alkaloids ranged from 67.62 to 114.79 mg/g. Moreover, the results obtained by the PLS-DA and ANOVA showed that magnoflorine level and the total content of these four alkaloids in Qinghai and Tibet samples were significantly higher (p < 0.01) than those in Sichuan samples. Conclusion: Quantification of multi-ingredients by HPLC combined with statistical methods provide an effective approach for achieving origin discrimination and quality evaluation of Cortex berberidis. The quality of Cortex berberidis closely correlates to the geographical origin of the samples, with Cortex berberidis samples from Qinghai and Tibet exhibiting superior qualities to those from Sichuan.
Resumo:
Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.
Resumo:
Light Detection and Ranging (LIDAR) has great potential to assist vegetation management in power line corridors by providing more accurate geometric information of the power line assets and vegetation along the corridors. However, the development of algorithms for the automatic processing of LIDAR point cloud data, in particular for feature extraction and classification of raw point cloud data, is in still in its infancy. In this paper, we take advantage of LIDAR intensity and try to classify ground and non-ground points by statistically analyzing the skewness and kurtosis of the intensity data. Moreover, the Hough transform is employed to detected power lines from the filtered object points. The experimental results show the effectiveness of our methods and indicate that better results were obtained by using LIDAR intensity data than elevation data.
Resumo:
Migraine is a painful disorder for which the etiology remains obscure. Diagnosis is largely based on International Headache Society criteria. However, no feature occurs in all patients who meet these criteria, and no single symptom is required for diagnosis. Consequently, this definition may not accurately reflect the phenotypic heterogeneity or genetic basis of the disorder. Such phenotypic uncertainty is typical for complex genetic disorders and has encouraged interest in multivariate statistical methods for classifying disease phenotypes. We applied three popular statistical phenotyping methods—latent class analysis, grade of membership and grade of membership “fuzzy” clustering (Fanny)—to migraine symptom data, and compared heritability and genome-wide linkage results obtained using each approach. Our results demonstrate that different methodologies produce different clustering structures and non-negligible differences in subsequent analyses. We therefore urge caution in the use of any single approach and suggest that multiple phenotyping methods be used.
Resumo:
Numerous expert elicitation methods have been suggested for generalised linear models (GLMs). This paper compares three relatively new approaches to eliciting expert knowledge in a form suitable for Bayesian logistic regression. These methods were trialled on two experts in order to model the habitat suitability of the threatened Australian brush-tailed rock-wallaby (Petrogale penicillata). The first elicitation approach is a geographically assisted indirect predictive method with a geographic information system (GIS) interface. The second approach is a predictive indirect method which uses an interactive graphical tool. The third method uses a questionnaire to elicit expert knowledge directly about the impact of a habitat variable on the response. Two variables (slope and aspect) are used to examine prior and posterior distributions of the three methods. The results indicate that there are some similarities and dissimilarities between the expert informed priors of the two experts formulated from the different approaches. The choice of elicitation method depends on the statistical knowledge of the expert, their mapping skills, time constraints, accessibility to experts and funding available. This trial reveals that expert knowledge can be important when modelling rare event data, such as threatened species, because experts can provide additional information that may not be represented in the dataset. However care must be taken with the way in which this information is elicited and formulated.
Resumo:
We study the suggestion that Markov switching (MS) models should be used to determine cyclical turning points. A Kalman filter approximation is used to derive the dating rules implicit in such models. We compare these with dating rules in an algorithm that provides a good approximation to the chronology determined by the NBER. We find that there is very little that is attractive in the MS approach when compared with this algorithm. The most important difference relates to robustness. The MS approach depends on the validity of that statistical model. Our approach is valid in a wider range of circumstances.
Resumo:
Advances in symptom management strategies through a better understanding of cancer symptom clusters depend on the identification of symptom clusters that are valid and reliable. The purpose of this exploratory research was to investigate alternative analytical approaches to identify symptom clusters for patients with cancer, using readily accessible statistical methods, and to justify which methods of identification may be appropriate for this context. Three studies were undertaken: (1) a systematic review of the literature, to identify analytical methods commonly used for symptom cluster identification for cancer patients; (2) a secondary data analysis to identify symptom clusters and compare alternative methods, as a guide to best practice approaches in cross-sectional studies; and (3) a secondary data analysis to investigate the stability of symptom clusters over time. The systematic literature review identified, in 10 years prior to March 2007, 13 cross-sectional studies implementing multivariate methods to identify cancer related symptom clusters. The methods commonly used to group symptoms were exploratory factor analysis, hierarchical cluster analysis and principal components analysis. Common factor analysis methods were recommended as the best practice cross-sectional methods for cancer symptom cluster identification. A comparison of alternative common factor analysis methods was conducted, in a secondary analysis of a sample of 219 ambulatory cancer patients with mixed diagnoses, assessed within one month of commencing chemotherapy treatment. Principal axis factoring, unweighted least squares and image factor analysis identified five consistent symptom clusters, based on patient self-reported distress ratings of 42 physical symptoms. Extraction of an additional cluster was necessary when using alpha factor analysis to determine clinically relevant symptom clusters. The recommended approaches for symptom cluster identification using nonmultivariate normal data were: principal axis factoring or unweighted least squares for factor extraction, followed by oblique rotation; and use of the scree plot and Minimum Average Partial procedure to determine the number of factors. In contrast to other studies which typically interpret pattern coefficients alone, in these studies symptom clusters were determined on the basis of structure coefficients. This approach was adopted for the stability of the results as structure coefficients are correlations between factors and symptoms unaffected by the correlations between factors. Symptoms could be associated with multiple clusters as a foundation for investigating potential interventions. The stability of these five symptom clusters was investigated in separate common factor analyses, 6 and 12 months after chemotherapy commenced. Five qualitatively consistent symptom clusters were identified over time (Musculoskeletal-discomforts/lethargy, Oral-discomforts, Gastrointestinaldiscomforts, Vasomotor-symptoms, Gastrointestinal-toxicities), but at 12 months two additional clusters were determined (Lethargy and Gastrointestinal/digestive symptoms). Future studies should include physical, psychological, and cognitive symptoms. Further investigation of the identified symptom clusters is required for validation, to examine causality, and potentially to suggest interventions for symptom management. Future studies should use longitudinal analyses to investigate change in symptom clusters, the influence of patient related factors, and the impact on outcomes (e.g., daily functioning) over time.
Resumo:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.
Resumo:
Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.