897 resultados para high dimensional geometry


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Internet users consume online targeted advertising based on information collected about them and voluntarily share personal information in social networks. Sensor information and data from smart-phones is collected and used by applications, sometimes in unclear ways. As it happens today with smartphones, in the near future sensors will be shipped in all types of connected devices, enabling ubiquitous information gathering from the physical environment, enabling the vision of Ambient Intelligence. The value of gathered data, if not obvious, can be harnessed through data mining techniques and put to use by enabling personalized and tailored services as well as business intelligence practices, fueling the digital economy. However, the ever-expanding information gathering and use undermines the privacy conceptions of the past. Natural social practices of managing privacy in daily relations are overridden by socially-awkward communication tools, service providers struggle with security issues resulting in harmful data leaks, governments use mass surveillance techniques, the incentives of the digital economy threaten consumer privacy, and the advancement of consumergrade data-gathering technology enables new inter-personal abuses. A wide range of fields attempts to address technology-related privacy problems, however they vary immensely in terms of assumptions, scope and approach. Privacy of future use cases is typically handled vertically, instead of building upon previous work that can be re-contextualized, while current privacy problems are typically addressed per type in a more focused way. Because significant effort was required to make sense of the relations and structure of privacy-related work, this thesis attempts to transmit a structured view of it. It is multi-disciplinary - from cryptography to economics, including distributed systems and information theory - and addresses privacy issues of different natures. As existing work is framed and discussed, the contributions to the state-of-theart done in the scope of this thesis are presented. The contributions add to five distinct areas: 1) identity in distributed systems; 2) future context-aware services; 3) event-based context management; 4) low-latency information flow control; 5) high-dimensional dataset anonymity. Finally, having laid out such landscape of the privacy-preserving work, the current and future privacy challenges are discussed, considering not only technical but also socio-economic perspectives.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Abstract The ultimate problem considered in this thesis is modeling a high-dimensional joint distribution over a set of discrete variables. For this purpose, we consider classes of context-specific graphical models and the main emphasis is on learning the structure of such models from data. Traditional graphical models compactly represent a joint distribution through a factorization justi ed by statements of conditional independence which are encoded by a graph structure. Context-speci c independence is a natural generalization of conditional independence that only holds in a certain context, speci ed by the conditioning variables. We introduce context-speci c generalizations of both Bayesian networks and Markov networks by including statements of context-specific independence which can be encoded as a part of the model structures. For the purpose of learning context-speci c model structures from data, we derive score functions, based on results from Bayesian statistics, by which the plausibility of a structure is assessed. To identify high-scoring structures, we construct stochastic and deterministic search algorithms designed to exploit the structural decomposition of our score functions. Numerical experiments on synthetic and real-world data show that the increased exibility of context-specific structures can more accurately emulate the dependence structure among the variables and thereby improve the predictive accuracy of the models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Current practice for analysing functional neuroimaging data is to average the brain signals recorded at multiple sensors or channels on the scalp over time across hundreds of trials or replicates to eliminate noise and enhance the underlying signal of interest. These studies recording brain signals non-invasively using functional neuroimaging techniques such as electroencephalography (EEG) and magnetoencephalography (MEG) generate complex, high dimensional and noisy data for many subjects at a number of replicates. Single replicate (or single trial) analysis of neuroimaging data have gained focus as they are advantageous to study the features of the signals at each replicate without averaging out important features in the data that the current methods employ. The research here is conducted to systematically develop flexible regression mixed models for single trial analysis of specific brain activities using examples from EEG and MEG to illustrate the models. This thesis follows three specific themes: i) artefact correction to estimate the `brain' signal which is of interest, ii) characterisation of the signals to reduce their dimensions, and iii) model fitting for single trials after accounting for variations between subjects and within subjects (between replicates). The models are developed to establish evidence of two specific neurological phenomena - entrainment of brain signals to an $\alpha$ band of frequencies (8-12Hz) and dipolar brain activation in the same $\alpha$ frequency band in an EEG experiment and a MEG study, respectively.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent progress in the technology for single unit recordings has given the neuroscientific community theopportunity to record the spiking activity of large neuronal populations. At the same pace, statistical andmathematical tools were developed to deal with high-dimensional datasets typical of such recordings.A major line of research investigates the functional role of subsets of neurons with significant co-firingbehavior: the Hebbian cell assemblies. Here we review three linear methods for the detection of cellassemblies in large neuronal populations that rely on principal and independent component analysis.Based on their performance in spike train simulations, we propose a modified framework that incorpo-rates multiple features of these previous methods. We apply the new framework to actual single unitrecordings and show the existence of cell assemblies in the rat hippocampus, which typically oscillate attheta frequencies and couple to different phases of the underlying field rhythm

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Personal information is increasingly gathered and used for providing services tailored to user preferences, but the datasets used to provide such functionality can represent serious privacy threats if not appropriately protected. Work in privacy-preserving data publishing targeted privacy guarantees that protect against record re-identification, by making records indistinguishable, or sensitive attribute value disclosure, by introducing diversity or noise in the sensitive values. However, most approaches fail in the high-dimensional case, and the ones that don’t introduce a utility cost incompatible with tailored recommendation scenarios. This paper aims at a sensible trade-off between privacy and the benefits of tailored recommendations, in the context of privacy-preserving data publishing. We empirically demonstrate that significant privacy improvements can be achieved at a utility cost compatible with tailored recommendation scenarios, using a simple partition-based sanitization method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity. In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing. In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class­ based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state­ of ­the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state­ of­ the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Compressed covariance sensing using quadratic samplers is gaining increasing interest in recent literature. Covariance matrix often plays the role of a sufficient statistic in many signal and information processing tasks. However, owing to the large dimension of the data, it may become necessary to obtain a compressed sketch of the high dimensional covariance matrix to reduce the associated storage and communication costs. Nested sampling has been proposed in the past as an efficient sub-Nyquist sampling strategy that enables perfect reconstruction of the autocorrelation sequence of Wide-Sense Stationary (WSS) signals, as though it was sampled at the Nyquist rate. The key idea behind nested sampling is to exploit properties of the difference set that naturally arises in quadratic measurement model associated with covariance compression. In this thesis, we will focus on developing novel versions of nested sampling for low rank Toeplitz covariance estimation, and phase retrieval, where the latter problem finds many applications in high resolution optical imaging, X-ray crystallography and molecular imaging. The problem of low rank compressive Toeplitz covariance estimation is first shown to be fundamentally related to that of line spectrum recovery. In absence if noise, this connection can be exploited to develop a particular kind of sampler called the Generalized Nested Sampler (GNS), that can achieve optimal compression rates. In presence of bounded noise, we develop a regularization-free algorithm that provably leads to stable recovery of the high dimensional Toeplitz matrix from its order-wise minimal sketch acquired using a GNS. Contrary to existing TV-norm and nuclear norm based reconstruction algorithms, our technique does not use any tuning parameters, which can be of great practical value. The idea of nested sampling idea also finds a surprising use in the problem of phase retrieval, which has been of great interest in recent times for its convex formulation via PhaseLift, By using another modified version of nested sampling, namely the Partial Nested Fourier Sampler (PNFS), we show that with probability one, it is possible to achieve a certain conjectured lower bound on the necessary measurement size. Moreover, for sparse data, an l1 minimization based algorithm is proposed that can lead to stable phase retrieval using order-wise minimal number of measurements.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In these last years a great effort has been put in the development of new techniques for automatic object classification, also due to the consequences in many applications such as medical imaging or driverless cars. To this end, several mathematical models have been developed from logistic regression to neural networks. A crucial aspect of these so called classification algorithms is the use of algebraic tools to represent and approximate the input data. In this thesis, we examine two different models for image classification based on a particular tensor decomposition named Tensor-Train (TT) decomposition. The use of tensor approaches preserves the multidimensional structure of the data and the neighboring relations among pixels. Furthermore the Tensor-Train, differently from other tensor decompositions, does not suffer from the curse of dimensionality making it an extremely powerful strategy when dealing with high-dimensional data. It also allows data compression when combined with truncation strategies that reduce memory requirements without spoiling classification performance. The first model we propose is based on a direct decomposition of the database by means of the TT decomposition to find basis vectors used to classify a new object. The second model is a tensor dictionary learning model, based on the TT decomposition where the terms of the decomposition are estimated using a proximal alternating linearized minimization algorithm with a spectral stepsize.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The thesis deals with the problem of Model Selection (MS) motivated by information and prediction theory, focusing on parametric time series (TS) models. The main contribution of the thesis is the extension to the multivariate case of the Misspecification-Resistant Information Criterion (MRIC), a criterion introduced recently that solves Akaike’s original research problem posed 50 years ago, which led to the definition of the AIC. The importance of MS is witnessed by the huge amount of literature devoted to it and published in scientific journals of many different disciplines. Despite such a widespread treatment, the contributions that adopt a mathematically rigorous approach are not so numerous and one of the aims of this project is to review and assess them. Chapter 2 discusses methodological aspects of MS from information theory. Information criteria (IC) for the i.i.d. setting are surveyed along with their asymptotic properties; and the cases of small samples, misspecification, further estimators. Chapter 3 surveys criteria for TS. IC and prediction criteria are considered for: univariate models (AR, ARMA) in the time and frequency domain, parametric multivariate (VARMA, VAR); nonparametric nonlinear (NAR); and high-dimensional models. The MRIC answers Akaike’s original question on efficient criteria, for possibly-misspecified (PM) univariate TS models in multi-step prediction with high-dimensional data and nonlinear models. Chapter 4 extends the MRIC to PM multivariate TS models for multi-step prediction introducing the Vectorial MRIC (VMRIC). We show that the VMRIC is asymptotically efficient by proving the decomposition of the MSPE matrix and the consistency of its Method-of-Moments Estimator (MoME), for Least Squares multi-step prediction with univariate regressor. Chapter 5 extends the VMRIC to the general multiple regressor case, by showing that the MSPE matrix decomposition holds, obtaining consistency for its MoME, and proving its efficiency. The chapter concludes with a digression on the conditions for PM VARX models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this thesis, the viability of the Dynamic Mode Decomposition (DMD) as a technique to analyze and model complex dynamic real-world systems is presented. This method derives, directly from data, computationally efficient reduced-order models (ROMs) which can replace too onerous or unavailable high-fidelity physics-based models. Optimizations and extensions to the standard implementation of the methodology are proposed, investigating diverse case studies related to the decoding of complex flow phenomena. The flexibility of this data-driven technique allows its application to high-fidelity fluid dynamics simulations, as well as time series of real systems observations. The resulting ROMs are tested against two tasks: (i) reduction of the storage requirements of high-fidelity simulations or observations; (ii) interpolation and extrapolation of missing data. The capabilities of DMD can also be exploited to alleviate the cost of onerous studies that require many simulations, such as uncertainty quantification analysis, especially when dealing with complex high-dimensional systems. In this context, a novel approach to address parameter variability issues when modeling systems with space and time-variant response is proposed. Specifically, DMD is merged with another model-reduction technique, namely the Polynomial Chaos Expansion, for uncertainty quantification purposes. Useful guidelines for DMD deployment result from the study, together with the demonstration of its potential to ease diagnosis and scenario analysis when complex flow processes are involved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In a high mobility two-dimensional electron gas (2DEG) realized in a GaAs/Al0.3Ga0.7As quantum well we observe changes in the Shubnikov-de Haas oscillations (SdHO) and in the Hall resistance for different sample geometries. We observe for each sample geometry a strong negative magnetoresistance around zero magnetic field which consists of a peak around zero magnetic field and of a huge magnetoresistance at larger fields. The peak around zero magnetic field is left unchanged for different geometries.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The ligands PhL and MeL are obtained by condensing 2-formylpyridine with benzil dihydrazone and diacetyl dihydrazone, respectively, in 2: 1 molar proportion. With silver( I), PhL yields a double-stranded dinuclear cationic helicate 1 in which the metal is tetrahedral but MeL gives a cationic one-dimensional polymeric complex 2 where silver( I) is distorted square planar and the ligand backbone is nearly planar. In both complexes, metal: ligand ratio is 1: 1. Ab initio calculations on the ligands at the HF/6-31+G* level reveal that while PhL strongly prefers a helical conformation, MeL has a natural inclination to remain in a planar conformation. Density functional theory calculations on model silver( I) complexes show that formation of the linear polymer in the case of MeL is also an important factor in imposing the planar geometry of Ag(I) in 2.