4 resultados para data structures

em Duke University


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION: The characterization of urinary calculi using noninvasive methods has the potential to affect clinical management. CT remains the gold standard for diagnosis of urinary calculi, but has not reliably differentiated varying stone compositions. Dual-energy CT (DECT) has emerged as a technology to improve CT characterization of anatomic structures. This study aims to assess the ability of DECT to accurately discriminate between different types of urinary calculi in an in vitro model using novel postimage acquisition data processing techniques. METHODS: Fifty urinary calculi were assessed, of which 44 had >or=60% composition of one component. DECT was performed utilizing 64-slice multidetector CT. The attenuation profiles of the lower-energy (DECT-Low) and higher-energy (DECT-High) datasets were used to investigate whether differences could be seen between different stone compositions. RESULTS: Postimage acquisition processing allowed for identification of the main different chemical compositions of urinary calculi: brushite, calcium oxalate-calcium phosphate, struvite, cystine, and uric acid. Statistical analysis demonstrated that this processing identified all stone compositions without obvious graphical overlap. CONCLUSION: Dual-energy multidetector CT with postprocessing techniques allows for accurate discrimination among the main different subtypes of urinary calculi in an in vitro model. The ability to better detect stone composition may have implications in determining the optimum clinical treatment modality for urinary calculi from noninvasive, preprocedure radiological assessment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Successfully predicting the frequency dispersion of electronic hyperpolarizabilities is an unresolved challenge in materials science and electronic structure theory. We show that the generalized Thomas-Kuhn sum rules, combined with linear absorption data and measured hyperpolarizability at one or two frequencies, may be used to predict the entire frequency-dependent electronic hyperpolarizability spectrum. This treatment includes two- and three-level contributions that arise from the lowest two or three excited electronic state manifolds, enabling us to describe the unusual observed frequency dispersion of the dynamic hyperpolarizability in high oscillator strength M-PZn chromophores, where (porphinato)zinc(II) (PZn) and metal(II)polypyridyl (M) units are connected via an ethyne unit that aligns the high oscillator strength transition dipoles of these components in a head-to-tail arrangement. We show that some of these structures can possess very similar linear absorption spectra yet manifest dramatically different frequency dependent hyperpolarizabilities, because of three-level contributions that result from excited state-to excited state transition dipoles among charge polarized states. Importantly, this approach provides a quantitative scheme to use linear optical absorption spectra and very limited individual hyperpolarizability measurements to predict the entire frequency-dependent nonlinear optical response. Copyright © 2010 American Chemical Society.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.