6 resultados para Multivariate data analysis

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. METHODS/PRINCIPAL FINDINGS: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of "what if" situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. CONCLUSION/SIGNIFICANCE: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Isometric muscle contraction, where force is generated without muscle shortening, is a molecular traffic jam in which the number of actin-attached motors is maximized and all states of motor action are trapped with consequently high heterogeneity. This heterogeneity is a major limitation to deciphering myosin conformational changes in situ. METHODOLOGY: We used multivariate data analysis to group repeat segments in electron tomograms of isometrically contracting insect flight muscle, mechanically monitored, rapidly frozen, freeze substituted, and thin sectioned. Improved resolution reveals the helical arrangement of F-actin subunits in the thin filament enabling an atomic model to be built into the thin filament density independent of the myosin. Actin-myosin attachments can now be assigned as weak or strong by their motor domain orientation relative to actin. Myosin attachments were quantified everywhere along the thin filament including troponin. Strong binding myosin attachments are found on only four F-actin subunits, the "target zone", situated exactly midway between successive troponin complexes. They show an axial lever arm range of 77°/12.9 nm. The lever arm azimuthal range of strong binding attachments has a highly skewed, 127° range compared with X-ray crystallographic structures. Two types of weak actin attachments are described. One type, found exclusively in the target zone, appears to represent pre-working-stroke intermediates. The other, which contacts tropomyosin rather than actin, is positioned M-ward of the target zone, i.e. the position toward which thin filaments slide during shortening. CONCLUSION: We present a model for the weak to strong transition in the myosin ATPase cycle that incorporates azimuthal movements of the motor domain on actin. Stress/strain in the S2 domain may explain azimuthal lever arm changes in the strong binding attachments. The results support previous conclusions that the weak attachments preceding force generation are very different from strong binding attachments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Nolan and Temple Lang argue that “the ability to express statistical computations is an es- sential skill.” A key related capacity is the ability to conduct and present data analysis in a way that another person can understand and replicate. The copy-and-paste workflow that is an artifact of antiquated user-interface design makes reproducibility of statistical analysis more difficult, especially as data become increasingly complex and statistical methods become increasingly sophisticated. R Markdown is a new technology that makes creating fully-reproducible statistical analysis simple and painless. It provides a solution suitable not only for cutting edge research, but also for use in an introductory statistics course. We present experiential and statistical evidence that R Markdown can be used effectively in introductory statistics courses, and discuss its role in the rapidly-changing world of statistical computation.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and looping of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries.