23 resultados para High dimensional


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

While molecular and cellular processes are often modeled as stochastic processes, such as Brownian motion, chemical reaction networks and gene regulatory networks, there are few attempts to program a molecular-scale process to physically implement stochastic processes. DNA has been used as a substrate for programming molecular interactions, but its applications are restricted to deterministic functions and unfavorable properties such as slow processing, thermal annealing, aqueous solvents and difficult readout limit them to proof-of-concept purposes. To date, whether there exists a molecular process that can be programmed to implement stochastic processes for practical applications remains unknown.

In this dissertation, a fully specified Resonance Energy Transfer (RET) network between chromophores is accurately fabricated via DNA self-assembly, and the exciton dynamics in the RET network physically implement a stochastic process, specifically a continuous-time Markov chain (CTMC), which has a direct mapping to the physical geometry of the chromophore network. Excited by a light source, a RET network generates random samples in the temporal domain in the form of fluorescence photons which can be detected by a photon detector. The intrinsic sampling distribution of a RET network is derived as a phase-type distribution configured by its CTMC model. The conclusion is that the exciton dynamics in a RET network implement a general and important class of stochastic processes that can be directly and accurately programmed and used for practical applications of photonics and optoelectronics. Different approaches to using RET networks exist with vast potential applications. As an entropy source that can directly generate samples from virtually arbitrary distributions, RET networks can benefit applications that rely on generating random samples such as 1) fluorescent taggants and 2) stochastic computing.

By using RET networks between chromophores to implement fluorescent taggants with temporally coded signatures, the taggant design is not constrained by resolvable dyes and has a significantly larger coding capacity than spectrally or lifetime coded fluorescent taggants. Meanwhile, the taggant detection process becomes highly efficient, and the Maximum Likelihood Estimation (MLE) based taggant identification guarantees high accuracy even with only a few hundred detected photons.

Meanwhile, RET-based sampling units (RSU) can be constructed to accelerate probabilistic algorithms for wide applications in machine learning and data analytics. Because probabilistic algorithms often rely on iteratively sampling from parameterized distributions, they can be inefficient in practice on the deterministic hardware traditional computers use, especially for high-dimensional and complex problems. As an efficient universal sampling unit, the proposed RSU can be integrated into a processor / GPU as specialized functional units or organized as a discrete accelerator to bring substantial speedups and power savings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The goal of this work is to analyze three-dimensional dispersive metallic photonic crystals (PCs) and to find a structure that can provide a bandgap and a high cutoff frequency. The determination of the band structure of a PC with dispersive materials is an expensive nonlinear eigenvalue problem; in this work we propose a rational-polynomial method to convert such a nonlinear eigenvalue problem into a linear eigenvalue problem. The spectral element method is extended to rapidly calculate the band structure of three-dimensional PCs consisting of realistic dispersive materials modeled by Drude and Drude-Lorentz models. Exponential convergence is observed in the numerical experiments. Numerical results show that, at the low frequency limit, metallic materials are similar to a perfect electric conductor, where the simulation results tend to be the same as perfect electric conductor PCs. Band structures of the scaffold structure and semi-woodpile structure metallic PCs are investigated. It is found that band structures of semi-woodpile PCs have a very high cutoff frequency as well as a bandgap between the lowest two bands and the higher bands.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The radiation loss in the escaping light cone with a two-dimensional (2D) photonic crystal slab microcavity can be suppressed by means of cladding the low-Q slab microcavity by three-dimensional woodpile photonic crystals with the complete bandgap when the resonance frequency is located inside the complete bandgap. It is confirmed that the hybrid microcavity based on a low-Q, single-defect photonic crystal slab microcavity shows improvement of the Q factor without affecting the mode volume and modal frequency. Whereas 2D slab microcavities exhibit Q saturation with an increase in the number of layers, for the analyzed hybrid microcavities with a small gap between the slab and woodpiles, the Q factor does not saturate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

© 2014, Springer-Verlag Berlin Heidelberg.This study assesses the skill of advanced regional climate models (RCMs) in simulating southeastern United States (SE US) summer precipitation and explores the physical mechanisms responsible for the simulation skill at a process level. Analysis of the RCM output for the North American Regional Climate Change Assessment Program indicates that the RCM simulations of summer precipitation show the largest biases and a remarkable spread over the SE US compared to other regions in the contiguous US. The causes of such a spread are investigated by performing simulations using the Weather Research and Forecasting (WRF) model, a next-generation RCM developed by the US National Center for Atmospheric Research. The results show that the simulated biases in SE US summer precipitation are due mainly to the misrepresentation of the modeled North Atlantic subtropical high (NASH) western ridge. In the WRF simulations, the NASH western ridge shifts 7° northwestward when compared to that in the reanalysis ensemble, leading to a dry bias in the simulated summer precipitation according to the relationship between the NASH western ridge and summer precipitation over the southeast. Experiments utilizing the four dimensional data assimilation technique further suggest that the improved representation of the circulation patterns (i.e., wind fields) associated with the NASH western ridge substantially reduces the bias in the simulated SE US summer precipitation. Our analysis of circulation dynamics indicates that the NASH western ridge in the WRF simulations is significantly influenced by the simulated planetary boundary layer (PBL) processes over the Gulf of Mexico. Specifically, a decrease (increase) in the simulated PBL height tends to stabilize (destabilize) the lower troposphere over the Gulf of Mexico, and thus inhibits (favors) the onset and/or development of convection. Such changes in tropical convection induce a tropical–extratropical teleconnection pattern, which modulates the circulation along the NASH western ridge in the WRF simulations and contributes to the modeled precipitation biases over the SE US. In conclusion, our study demonstrates that the NASH western ridge is an important factor responsible for the RCM skill in simulating SE US summer precipitation. Furthermore, the improvements in the PBL parameterizations for the Gulf of Mexico might help advance RCM skill in representing the NASH western ridge circulation and summer precipitation over the SE US.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent emergence of human connectome imaging has led to a high demand on angular and spatial resolutions for diffusion magnetic resonance imaging (MRI). While there have been significant growths in high angular resolution diffusion imaging, the improvement in spatial resolution is still limited due to a number of technical challenges, such as the low signal-to-noise ratio and high motion artifacts. As a result, the benefit of a high spatial resolution in the whole-brain connectome imaging has not been fully evaluated in vivo. In this brief report, the impact of spatial resolution was assessed in a newly acquired whole-brain three-dimensional diffusion tensor imaging data set with an isotropic spatial resolution of 0.85 mm. It was found that the delineation of short cortical association fibers is drastically improved as well as the definition of fiber pathway endings into the gray/white matter boundary-both of which will help construct a more accurate structural map of the human brain connectome.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The intensity and valence of 30 emotion terms, 30 events typical of those emotions, and 30 autobiographical memories cued by those emotions were each rated by different groups of 40 undergraduates. A vector model gave a consistently better account of the data than a circumplex model, both overall and in the absence of high-intensity, neutral valence stimuli. The Positive Activation - Negative Activation (PANA) model could be tested at high levels of activation, where it is identical to the vector model. The results replicated when ratings of arousal were used instead of ratings of intensity for the events and autobiographical memories. A reanalysis of word norms gave further support for the vector and PANA models by demonstrating that neutral valence, high-arousal ratings resulted from the averaging of individual positive and negative valence ratings. Thus, compared to a circumplex model, vector and PANA models provided overall better fits.