970 resultados para Datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Inference for latent feature models is inherently difficult as the inference space grows exponentially with the size of the input data and number of latent features. In this work, we use Kurihara & Welling (2008)'s maximization-expectation framework to perform approximate MAP inference for linear-Gaussian latent feature models with an Indian Buffet Process (IBP) prior. This formulation yields a submodular function of the features that corresponds to a lower bound on the model evidence. By adding a constant to this function, we obtain a nonnegative submodular function that can be maximized via a greedy algorithm that obtains at least a one-third approximation to the optimal solution. Our inference method scales linearly with the size of the input data, and we show the efficacy of our method on the largest datasets currently analyzed using an IBP model.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many visual datasets are traditionally used to analyze the performance of different learning techniques. The evaluation is usually done within each dataset, therefore it is questionable if such results are a reliable indicator of true generalization ability. We propose here an algorithm to exploit the existing data resources when learning on a new multiclass problem. Our main idea is to identify an image representation that decomposes orthogonally into two subspaces: a part specific to each dataset, and a part generic to, and therefore shared between, all the considered source sets. This allows us to use the generic representation as un-biased reference knowledge for a novel classification task. By casting the method in the multi-view setting, we also make it possible to use different features for different databases. We call the algorithm MUST, Multitask Unaligned Shared knowledge Transfer. Through extensive experiments on five public datasets, we show that MUST consistently improves the cross-datasets generalization performance. © 2013 Springer-Verlag.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents new methods for computing the step sizes of the subband-adaptive iterative shrinkage-thresholding algorithms proposed by Bayram & Selesnick and Vonesch & Unser. The method yields tighter wavelet-domain bounds of the system matrix, thus leading to improved convergence speeds. It is directly applicable to non-redundant wavelet bases, and we also adapt it for cases of redundant frames. It turns out that the simplest and most intuitive setting for the step sizes that ignores subband aliasing is often satisfactory in practice. We show that our methods can be used to advantage with reweighted least squares penalty functions as well as L1 penalties. We emphasize that the algorithms presented here are suitable for performing inverse filtering on very large datasets, including 3D data, since inversions are applied only to diagonal matrices and fast transforms are used to achieve all matrix-vector products.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Some amount of differential settlement occurs even in the most uniform soil deposit, but it is extremely difficult to estimate because of the natural heterogeneity of the soil. The compression response of the soil and its variability must be characterised in order to estimate the probability of the differential settlement exceeding a certain threshold value. The work presented in this paper introduces a probabilistic framework to address this issue in a rigorous manner, while preserving the format of a typical geotechnical settlement analysis. In order to avoid dealing with different approaches for each category of soil, a simplified unified compression model is used to characterise the nonlinear compression behavior of soils of varying gradation through a single constitutive law. The Bayesian updating rule is used to incorporate information from three different laboratory datasets in the computation of the statistics (estimates of the means and covariance matrix) of the compression model parameters, as well as of the uncertainty inherent in the model.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An easy-to-interpret kinematic quantity measuring the average corotation of material line segments near a point is introduced and applied to vortex identification. At a given point, the vector of average corotation of line segments is defined as the average of the instantaneous local rigid-body rotation over "all planar cross sections" passing through the examined point. The vortex-identification method based on average corotation is a one-parameter, region-type local method sensitive to the axial stretching rate as well as to the inner configuration of the velocity gradient tensor. The method is derived from a well-defined interpretation of the local flow kinematics to determine the "plane of swirling" and is also applicable to compressible and variable-density flows. Practical application to direct numerical simulation datasets includes a hairpin vortex of boundary-layer transition, the reconnection process of two Burgers vortices, a flow around an inclined flat plate, and a flow around a revolving insect wing. The results agree well with some popular local methods and perform better in regions of strong shearing. Copyright © 2013 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work applies a variety of multilinear function factorisation techniques to extract appropriate features or attributes from high dimensional multivariate time series for classification. Recently, a great deal of work has centred around designing time series classifiers using more and more complex feature extraction and machine learning schemes. This paper argues that complex learners and domain specific feature extraction schemes of this type are not necessarily needed for time series classification, as excellent classification results can be obtained by simply applying a number of existing matrix factorisation or linear projection techniques, which are simple and computationally inexpensive. We highlight this using a geometric separability measure and classification accuracies obtained though experiments on four different high dimensional multivariate time series datasets. © 2013 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels, the Random Forest Kernel and the Fast Cluster Kernel, and show that these kernels consistently outperform standard kernels on problems involving real-world datasets. Finally, we show how the form of these kernels lend themselves to a natural approximation that is appropriate for certain big data problems, allowing $O(N)$ inference in methods such as Gaussian Processes, Support Vector Machines and Kernel PCA.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Copyright 2014 by the author(s). We present a nonparametric prior over reversible Markov chains. We use completely random measures, specifically gamma processes, to construct a countably infinite graph with weighted edges. By enforcing symmetry to make the edges undirected we define a prior over random walks on graphs that results in a reversible Markov chain. The resulting prior over infinite transition matrices is closely related to the hierarchical Dirichlet process but enforces reversibility. A reinforcement scheme has recently been proposed with similar properties, but the de Finetti measure is not well characterised. We take the alternative approach of explicitly constructing the mixing measure, which allows more straightforward and efficient inference at the cost of no longer having a closed form predictive distribution. We use our process to construct a reversible infinite HMM which we apply to two real datasets, one from epigenomics and one ion channel recording.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Relative (comparative) attributes are promising for thematic ranking of visual entities, which also aids in recognition tasks. However, attribute rank learning often requires a substantial amount of relational supervision, which is highly tedious, and apparently impractical for real-world applications. In this paper, we introduce the Semantic Transform, which under minimal supervision, adaptively finds a semantic feature space along with a class ordering that is related in the best possible way. Such a semantic space is found for every attribute category. To relate the classes under weak supervision, the class ordering needs to be refined according to a cost function in an iterative procedure. This problem is ideally NP-hard, and we thus propose a constrained search tree formulation for the same. Driven by the adaptive semantic feature space representation, our model achieves the best results to date for all of the tasks of relative, absolute and zero-shot classification on two popular datasets. © 2013 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

McCullagh and Yang (2006) suggest a family of classification algorithms based on Cox processes. We further investigate the log Gaussian variant which has a number of appealing properties. Conditioned on the covariates, the distribution over labels is given by a type of conditional Markov random field. In the supervised case, computation of the predictive probability of a single test point scales linearly with the number of training points and the multiclass generalization is straightforward. We show new links between the supervised method and classical nonparametric methods. We give a detailed analysis of the pairwise graph representable Markov random field, which we use to extend the model to semi-supervised learning problems, and propose an inference method based on graph min-cuts. We give the first experimental analysis on supervised and semi-supervised datasets and show good empirical performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

According to the research results reported in the past decades, it is well acknowledged that face recognition is not a trivial task. With the development of electronic devices, we are gradually revealing the secret of object recognition in the primate's visual cortex. Therefore, it is time to reconsider face recognition by using biologically inspired features. In this paper, we represent face images by utilizing the C1 units, which correspond to complex cells in the visual cortex, and pool over S1 units by using a maximum operation to reserve only the maximum response of each local area of S1 units. The new representation is termed C1 Face. Because C1 Face is naturally a third-order tensor (or a three dimensional array), we propose three-way discriminative locality alignment (TWDLA), an extension of the discriminative locality alignment, which is a top-level discriminate manifold learning-based subspace learning algorithm. TWDLA has the following advantages: (1) it takes third-order tensors as input directly so the structure information can be well preserved; (2) it models the local geometry over every modality of the input tensors so the spatial relations of input tensors within a class can be preserved; (3) it maximizes the margin between a tensor and tensors from other classes over each modality so it performs well for recognition tasks and (4) it has no under sampling problem. Extensive experiments on YALE and FERET datasets show (1) the proposed C1Face representation can better represent face images than raw pixels and (2) TWDLA can duly preserve both the local geometry and the discriminative information over every modality for recognition.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

列表类型数据是生态研究中最为常见的数据形式。在分析列表类型数据特征及其与元数据关系,数据安全和共享策略等问题基础上,提出了生态研究列表类数据管理系统设计和开发方案。研究认为数据集的元数据不仅是对数据集实体的说明,而且一定程度上决定着数据集实体的内容和数量,以及数据集实体之间的内在联系,这种联系正是进行列表类型数据管理依据所在。

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We consider the Randall-Sundrum brane-world model with bulk-brane energy transfer where the Einstein-Hilbert action is modified by curvature correction terms: a four-dimensional scalar curvature from induced gravity on the brane, and a five-dimensional Gauss-Bonnet curvature term. It is remarkable that these curvature terms will not change the dynamics of the brane universe at low energy. Parameterizing the energy transfer and taking the dark radiation term into account, we find that the phantom divide of the equation of state of effective dark energy could be crossed, without the need of any new dark energy components. Fitting the two most reliable and robust SNIa datasets, the 182 Gold dataset and the Supernova Legacy Survey (SNLS), our model indeed has a small tendency of phantom divide crossing for the Gold dataset, but not for the SNLS dataset. Furthermore, combining the recent detection of the SDSS baryon acoustic oscillations peak (BAO) with lower matter density parameter prior, we find that the SNLS dataset also mildly favors phantom divide crossing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Steroid derivatives show a complex interaction with P-glycoprotein (Pgp). To determine the essential structural requirements of a series of structurally related and functionally diverse steroids for Pgp-mediated transport or inhibition, a three-dimensional quantitative structure activity relationship study was performed by comparative similarity index analysis modeling. Twelve models have been explored to well correlate the physiochemical features with their biological functions with Pgp on basis of substrate and inhibitor datasets, in which the best predictive model for substrate gave cross-validated q(2) = 0.720, non-cross-validated r(2) = 0.998, standard error of estimate SEE = 0.012, F = 257.955, and the best predictive model for inhibitor gave q(2) = 0.536, r(2) = 0.950, SEE = 1.761 and F = 45.800. The predictive ability of all models was validated by a set of compounds that were not included in the training set. The physiochemical similarities and differences of steroids as Pgp substrate and inhibitor, respectively, were analyzed to be helpful in developing new steroid-like compounds. (C) 2004 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As a recently developed and powerful classification tool, probabilistic neural network was used to distinguish cancer patients from healthy persons according to the levels of nucleosides in human urine. Two datasets (containing 32 and 50 patterns, respectively) were investigated and the total consistency rate obtained was 100% for dataset 1 and 94% for dataset 2. To evaluate the performance of probabilistic neural network, linear discriminant analysis and learning vector quantization network, were also applied to the classification problem. The results showed that the predictive ability of the probabilistic neural network is stronger than the others in this study. Moreover, the recognition rate for dataset 2 can achieve to 100% if combining, these three methods together, which indicated the promising potential of clinical diagnosis by combining different methods. (C) 2002 Elsevier Science B.V. All rights reserved.