10 resultados para Factor Analysis, Statistical

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tumor microenvironmental stresses, such as hypoxia and lactic acidosis, play important roles in tumor progression. Although gene signatures reflecting the influence of these stresses are powerful approaches to link expression with phenotypes, they do not fully reflect the complexity of human cancers. Here, we describe the use of latent factor models to further dissect the stress gene signatures in a breast cancer expression dataset. The genes in these latent factors are coordinately expressed in tumors and depict distinct, interacting components of the biological processes. The genes in several latent factors are highly enriched in chromosomal locations. When these factors are analyzed in independent datasets with gene expression and array CGH data, the expression values of these factors are highly correlated with copy number alterations (CNAs) of the corresponding BAC clones in both the cell lines and tumors. Therefore, variation in the expression of these pathway-associated factors is at least partially caused by variation in gene dosage and CNAs among breast cancers. We have also found the expression of two latent factors without any chromosomal enrichment is highly associated with 12q CNA, likely an instance of "trans"-variations in which CNA leads to the variations in gene expression outside of the CNA region. In addition, we have found that factor 26 (1q CNA) is negatively correlated with HIF-1alpha protein and hypoxia pathways in breast tumors and cell lines. This agrees with, and for the first time links, known good prognosis associated with both a low hypoxia signature and the presence of CNA in this region. Taken together, these results suggest the possibility that tumor segmental aneuploidy makes significant contributions to variation in the lactic acidosis/hypoxia gene signatures in human cancers and demonstrate that latent factor analysis is a powerful means to uncover such a linkage.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Individual differences in affect intensity are typically assessed with the Affect Intensity Measure (AIM). Previous factor analyses suggest that the AIM is comprised of four weakly correlated factors: Positive Affectivity, Negative Reactivity, Negative Intensity and Positive Intensity or Serenity. However, little data exist to show whether its four factors relate to other measures differently enough to preclude use of the total scale score. The present study replicated the four-factor solution and found that subscales derived from the four factors correlated differently with criterion variables that assess personality domains, affective dispositions, and cognitive patterns that are associated with emotional reactions. The results show that use of the total AIM score can obscure relationships between specific features of affect intensity and other variables and suggest that researchers should examine the individual AIM subscales.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: Anticoagulation can reduce quality of life, and different models of anticoagulation management might have different impacts on satisfaction with this component of medical care. Yet, to our knowledge, there are no scales measuring quality of life and satisfaction with anticoagulation that can be generalized across different models of anticoagulation management. We describe the development and preliminary validation of such an instrument - the Duke Anticoagulation Satisfaction Scale (DASS). METHODS: The DASS is a 25-item scale addressing the (a) negative impacts of anticoagulation (limitations, hassles and burdens); and (b) positive impacts of anticoagulation (confidence, reassurance, satisfaction). Each item has 7 possible responses. The DASS was administered to 262 patients currently receiving oral anticoagulation. Scales measuring generic quality of life, satisfaction with medical care, and tendency to provide socially desirable responses were also administered. Statistical analysis included assessment of item variability, internal consistency (Cronbach's alpha), scale structure (factor analysis), and correlations between the DASS and demographic variables, clinical characteristics, and scores on the above scales. A follow-up study of 105 additional patients assessed test-retest reliability. RESULTS: 220 subjects answered all items. Ceiling and floor effects were modest, and 25 of the 27 proposed items grouped into 2 factors (positive impacts, negative impacts, this latter factor being potentially subdivided into limitations versus hassles and burdens). Each factor had a high degree of internal consistency (Cronbach's alpha 0.78-0.91). The limitations and hassles factors consistently correlated with the SF-36 scales measuring generic quality of life, while the positive psychological impact scale correlated with age and time on anticoagulation. The intra-class correlation coefficient for test-retest reliability was 0.80. CONCLUSIONS: The DASS has demonstrated reasonable psychometric properties to date. Further validation is ongoing. To the degree that dissatisfaction with anticoagulation leads to decreased adherence, poorer INR control, and poor clinical outcomes, the DASS has the potential to help identify reasons for dissatisfaction (and positive satisfaction), and thus help to develop interventions to break this cycle. As an instrument designed to be applicable across multiple models of anticoagulation management, the DASS could be crucial in the scientific comparison between those models of care.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Multi-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Shame has been shown to predict sexual HIV transmission risk behavior, medication non-adherence, symptomatic HIV or AIDS, and symptoms of depression and PTSD. However, there remains a dearth of tools to measure the specific constructs of HIV-related and sexual abuse-related shame. To ameliorate this gap, we present a 31-item measure that assesses HIV and sexual abuse-related shame, and the impact of shame on HIV-related health behaviors. A diverse sample of 271 HIV-positive men and women who were sexually abused as children completed the HIV and Abuse Related Shame Inventory (HARSI) among other measures. An exploratory factor analysis supported the retention of three-factors, explaining 56.7% of the sample variance. These internally consistent factors showed good test-retest reliability, and sound convergent and divergent validity using eight well-established HIV specific and general psychosocial criterion measures. Unlike stigma or discrimination, shame is potentially alterable through individually-focused interventions, making the measurement of shame clinically meaningful.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Family dogs and dog owners offer a potentially powerful way to conduct citizen science to answer questions about animal behavior that are difficult to answer with more conventional approaches. Here we evaluate the quality of the first data on dog cognition collected by citizen scientists using the Dognition.com website. We conducted analyses to understand if data generated by over 500 citizen scientists replicates internally and in comparison to previously published findings. Half of participants participated for free while the other half paid for access. The website provided each participant a temperament questionnaire and instructions on how to conduct a series of ten cognitive tests. Participation required internet access, a dog and some common household items. Participants could record their responses on any PC, tablet or smartphone from anywhere in the world and data were retained on servers. Results from citizen scientists and their dogs replicated a number of previously described phenomena from conventional lab-based research. There was little evidence that citizen scientists manipulated their results. To illustrate the potential uses of relatively large samples of citizen science data, we then used factor analysis to examine individual differences across the cognitive tasks. The data were best explained by multiple factors in support of the hypothesis that nonhumans, including dogs, can evolve multiple cognitive domains that vary independently. This analysis suggests that in the future, citizen scientists will generate useful datasets that test hypotheses and answer questions as a complement to conventional laboratory techniques used to study dog psychology.