914 resultados para machine learning algorithms


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, also with known label proportions. This problem appears in areas like e-commerce, spam filtering and improper content detection. We present consistent estimators which can reconstruct the correct labels with high probability in a uniform convergence sense. Experiments show that our method works well in practice. Copyright 2008 by the author(s)/owner(s).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work applies a variety of multilinear function factorisation techniques to extract appropriate features or attributes from high dimensional multivariate time series for classification. Recently, a great deal of work has centred around designing time series classifiers using more and more complex feature extraction and machine learning schemes. This paper argues that complex learners and domain specific feature extraction schemes of this type are not necessarily needed for time series classification, as excellent classification results can be obtained by simply applying a number of existing matrix factorisation or linear projection techniques, which are simple and computationally inexpensive. We highlight this using a geometric separability measure and classification accuracies obtained though experiments on four different high dimensional multivariate time series datasets. © 2013 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Standard forms of density-functional theory (DFT) have good predictive power for many materials, but are not yet fully satisfactory for cluster, solid, and liquid forms of water. Recent work has stressed the importance of DFT errors in describing dispersion, but we note that errors in other parts of the energy may also contribute. We obtain information about the nature of DFT errors by using a many-body separation of the total energy into its 1-body, 2-body, and beyond-2-body components to analyze the deficiencies of the popular PBE and BLYP approximations for the energetics of water clusters and ice structures. The errors of these approximations are computed by using accurate benchmark energies from the coupled-cluster technique of molecular quantum chemistry and from quantum Monte Carlo calculations. The systems studied are isomers of the water hexamer cluster, the crystal structures Ih, II, XV, and VIII of ice, and two clusters extracted from ice VIII. For the binding energies of these systems, we use the machine-learning technique of Gaussian Approximation Potentials to correct successively for 1-body and 2-body errors of the DFT approximations. We find that even after correction for these errors, substantial beyond-2-body errors remain. The characteristics of the 2-body and beyond-2-body errors of PBE are completely different from those of BLYP, but the errors of both approximations disfavor the close approach of non-hydrogen-bonded monomers. We note the possible relevance of our findings to the understanding of liquid water.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We investigate the Student-t process as an alternative to the Gaussian process as a non-parametric prior over functions. We derive closed form expressions for the marginal likelihood and predictive distribution of a Student-t process, by integrating away an inverse Wishart process prior over the co-variance kernel of a Gaussian process model. We show surprising equivalences between different hierarchical Gaussian process models leading to Student-t processes, and derive a new sampling scheme for the inverse Wishart process, which helps elucidate these equivalences. Overall, we show that a Student-t process can retain the attractive properties of a Gaussian process - a nonparamet-ric representation, analytic marginal and predictive distributions, and easy model selection through covariance kernels - but has enhanced flexibility, and predictive covariances that, unlike a Gaussian process, explicitly depend on the values of training observations. We verify empirically that a Student-t process is especially useful in situations where there are changes in covariance structure, or in applications such as Bayesian optimization, where accurate predictive covariances are critical for good performance. These advantages come at no additional computational cost over Gaussian processes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper demonstrates a novel digital radio distribution system able to transmit not only over optical fibres and coaxial cables but also over twisted pair cables. The digitised RF signal is compressed for maximum transmission efficiency in a way that allows for integral self-learning algorithms to be introduced for multi-service applications. © 2013 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Choosing appropriate architectures and regularization strategies of deep networks is crucial to good predictive performance. To shed light on this problem, we analyze the analogous problem of constructing useful priors on compositions of functions. Specifically, we study the deep Gaussian process, a type of infinitely-wide, deep neural network. We show that in standard architectures, the representational capacity of the network tends to capture fewer degrees of freedom as the number of layers increases, retaining only a single degree of freedom in the limit. We propose an alternate network architecture which does not suffer from this pathology. We also examine deep covariance functions, obtained by composing infinitely many feature transforms. Lastly, we characterize the class of models obtained by performing dropout on Gaussian processes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Copyright 2014 by the author(s). We present a nonparametric prior over reversible Markov chains. We use completely random measures, specifically gamma processes, to construct a countably infinite graph with weighted edges. By enforcing symmetry to make the edges undirected we define a prior over random walks on graphs that results in a reversible Markov chain. The resulting prior over infinite transition matrices is closely related to the hierarchical Dirichlet process but enforces reversibility. A reinforcement scheme has recently been proposed with similar properties, but the de Finetti measure is not well characterised. We take the alternative approach of explicitly constructing the mixing measure, which allows more straightforward and efficient inference at the cost of no longer having a closed form predictive distribution. We use our process to construct a reversible infinite HMM which we apply to two real datasets, one from epigenomics and one ion channel recording.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present and test an extension of slow feature analysis as a novel approach to nonlinear blind source separation. The algorithm relies on temporal correlations and iteratively reconstructs a set of statistically independent sources from arbitrary nonlinear instantaneous mixtures. Simulations show that it is able to invert a complicated nonlinear mixture of two audio signals with a high reliability. The algorithm is based on a mathematical analysis of slow feature analysis for the case of input data that are generated from statistically independent sources. © 2014 Henning Sprekeler, Tiziano Zito and Laurenz Wiskott.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

© 2015 John P. Cunningham and Zoubin Ghahramani. Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of interest, such as covariance, dynamical structure, correlation between data sets, input-output relationships, and margin between data classes. Methods have been developed with a variety of names and motivations in many fields, and perhaps as a result the connections between all these methods have not been highlighted. Here we survey methods from this disparate literature as optimization programs over matrix manifolds. We discuss principal component analysis, factor analysis, linear multidimensional scaling, Fisher's linear discriminant analysis, canonical correlations analysis, maximum autocorrelation factors, slow feature analysis, sufficient dimensionality reduction, undercomplete independent component analysis, linear regression, distance metric learning, and more. This optimization framework gives insight to some rarely discussed shortcomings of well-known methods, such as the suboptimality of certain eigenvector solutions. Modern techniques for optimization over matrix manifolds enable a generic linear dimensionality reduction solver, which accepts as input data and an objective to be optimized, and returns, as output, an optimal low-dimensional projection of the data. This simple optimization framework further allows straightforward generalizations and novel variants of classical methods, which we demonstrate here by creating an orthogonal-projection canonical correlations analysis. More broadly, this survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hyperparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we constructed a Iris recognition algorithm based on point covering of high-dimensional space and Multi-weighted neuron of point covering of high-dimensional space, and proposed a new method for iris recognition based on point covering theory of high-dimensional space. In this method, irises are trained as "cognition" one class by one class, and it doesn't influence the original recognition knowledge for samples of the new added class. The results of experiments show the rejection rate is 98.9%, the correct cognition rate and the error rate are 95.71% and 3.5% respectively. The experimental results demonstrate that the rejection rate of test samples excluded in the training samples class is very high. It proves the proposed method for iris recognition is effective.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Seismic sensors are widely used to detect moving target in ground sensor networks. Footstep detection is very important for security surveillance and other applications. Because of non-stationary characteristic of seismic signal and complex environment conditions, footstep detection is a very challenging problem. A novel wavelet denoising method based on singular value decomposition is used to solve these problems. The signal-to-noise ratio (SNR) of raw footstep signal is greatly improved using this strategy. The feature extraction method is also discussed after denosing procedure. Comparing, with kurtosis statistic feature, the wavelet energy feature is more promising for seismic footstep detection, especially in a long distance surveillance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

为有效地刻画和求解军事装备系统的维修规划问题,建立了一个以维修费用和任务能力为目标的约束优化模型,提出了一种求解装备维修规划问题的多目标禁忌搜索算法。模型考虑了维修器材和工时两种费用指标,并在数质量评估的基础上通过二次回归方程来分层评估装备系统的任务能力指标。算法采用两阶段搜索策略,第一阶段从维修数量下限出发,以任务能力为演化目标进行搜索,直至找到一个可行解;第二阶段以任务能力/维修费用比为演化目标进行搜索,不断改善整个非支配解集。实验表明,算法能够求解型号≥500种,数量≥45000的大规模问题,模型和算法求解的质量也在实际应用中得到了验证。