846 resultados para Kernel Functions
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
The seismic hazard of the Iberian Peninsula is analysed using a nonparametric methodology based on statistical kernel functions; the activity rate is derived from the catalogue data, both its spatial dependence (without a seismogenetic zonation) and its magnitude dependence (without using Gutenberg–Richter's law). The catalogue is that of the Instituto Geográfico Nacional, supplemented with other catalogues around the periphery; the quantification of events has been homogenised and spatially or temporally interrelated events have been suppressed to assume a Poisson process. The activity rate is determined by the kernel function, the bandwidth and the effective periods. The resulting rate is compared with that produced using Gutenberg–Richter statistics and a zoned approach. Three attenuation laws have been employed, one for deep sources and two for shallower events, depending on whether their magnitude was above or below 5. The results are presented as seismic hazard maps for different spectral frequencies and for return periods of 475 and 2475 yr, which allows constructing uniform hazard spectra.
Resumo:
The seismic hazard of the Iberian Peninsula is analysed using a nonparametric methodology based on statistical kernel functions; the activity rate is derived from the catalogue data, both its spatial dependence (without a seismogenic zonation) and its magnitude dependence (without using Gutenberg–Richter's relationship). The catalogue is that of the Instituto Geográfico Nacional, supplemented with other catalogues around the periphery; the quantification of events has been homogenised and spatially or temporally interrelated events have been suppressed to assume a Poisson process. The activity rate is determined by the kernel function, the bandwidth and the effective periods. The resulting rate is compared with that produced using Gutenberg–Richter statistics and a zoned approach. Three attenuation relationships have been employed, one for deep sources and two for shallower events, depending on whether their magnitude was above or below 5. The results are presented as seismic hazard maps for different spectral frequencies and for return periods of 475 and 2475 yr, which allows constructing uniform hazard spectra
Resumo:
Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
This package includes various Mata functions. kern(): various kernel functions; kint(): kernel integral functions; kdel0(): canonical bandwidth of kernel; quantile(): quantile function; median(): median; iqrange(): inter-quartile range; ecdf(): cumulative distribution function; relrank(): grade transformation; ranks(): ranks/cumulative frequencies; freq(): compute frequency counts; histogram(): produce histogram data; mgof(): multinomial goodness-of-fit tests; collapse(): summary statistics by subgroups; _collapse(): summary statistics by subgroups; gini(): Gini coefficient; sample(): draw random sample; srswr(): SRS with replacement; srswor(): SRS without replacement; upswr(): UPS with replacement; upswor(): UPS without replacement; bs(): bootstrap estimation; bs2(): bootstrap estimation; bs_report(): report bootstrap results; jk(): jackknife estimation; jk_report(): report jackknife results; subset(): obtain subsets, one at a time; composition(): obtain compositions, one by one; ncompositions(): determine number of compositions; partition(): obtain partitions, one at a time; npartitionss(): determine number of partitions; rsubset(): draw random subset; rcomposition(): draw random composition; colvar(): variance, by column; meancolvar(): mean and variance, by column; variance0(): population variance; meanvariance0(): mean and population variance; mse(): mean squared error; colmse(): mean squared error, by column; sse(): sum of squared errors; colsse(): sum of squared errors, by column; benford(): Benford distribution; cauchy(): cumulative Cauchy-Lorentz dist.; cauchyden(): Cauchy-Lorentz density; cauchytail(): reverse cumulative Cauchy-Lorentz; invcauchy(): inverse cumulative Cauchy-Lorentz; rbinomial(): generate binomial random numbers; cebinomial(): cond. expect. of binomial r.v.; root(): Brent's univariate zero finder; nrroot(): Newton-Raphson zero finder; finvert(): univariate function inverter; integrate_sr(): univariate function integration (Simpson's rule); integrate_38(): univariate function integration (Simpson's 3/8 rule); ipolate(): linear interpolation; polint(): polynomial inter-/extrapolation; plot(): Draw twoway plot; _plot(): Draw twoway plot; panels(): identify nested panel structure; _panels(): identify panel sizes; npanels(): identify number of panels; nunique(): count number of distinct values; nuniqrows(): count number of unique rows; isconstant(): whether matrix is constant; nobs(): number of observations; colrunsum(): running sum of each column; linbin(): linear binning; fastlinbin(): fast linear binning; exactbin(): exact binning; makegrid(): equally spaced grid points; cut(): categorize data vector; posof(): find element in vector; which(): positions of nonzero elements; locate(): search an ordered vector; hunt(): consecutive search; cond(): matrix conditional operator; expand(): duplicate single rows/columns; _expand(): duplicate rows/columns in place; repeat(): duplicate contents as a whole; _repeat(): duplicate contents in place; unorder2(): stable version of unorder(); jumble2(): stable version of jumble(); _jumble2(): stable version of _jumble(); pieces(): break string into pieces; npieces(): count number of pieces; _npieces(): count number of pieces; invtokens(): reverse of tokens(); realofstr(): convert string into real; strexpand(): expand string argument; matlist(): display a (real) matrix; insheet(): read spreadsheet file; infile(): read free-format file; outsheet(): write spreadsheet file; callf(): pass optional args to function; callf_setup(): setup for mm_callf().
Resumo:
Kernel-Functions, Machine Learning, Least Squares, Speech Recognition, Classification, Regression
Resumo:
The front speed of the Neolithic (farmer) spread in Europe decreased as it reached Northern latitudes, where the Mesolithic (huntergatherer) population density was higher. Here, we describe a reaction diffusion model with (i) an anisotropic dispersion kernel depending on the Mesolithicpopulation density gradient and (ii) a modified population growth equation. Both effects are related to the space available for the Neolithic population. The model is able to explain the slowdown of the Neolithic front as observed from archaeological data
Resumo:
Most integrodifference models of biological invasions are based on the nonoverlapping-generations approximation. However, the effect of multiple reproduction events overlapping generations on the front speed can be very important especially for species with a long life spam . Only in one-dimensional space has this approximation been relaxed previously, although almost all biological invasions take place in two dimensions. Here we present a model that takes into account the overlapping generations effect or, more generally, the stage structure of the population , and we analyze the main differences with the corresponding nonoverlappinggenerations results
Resumo:
Bimodal dispersal probability distributions with characteristic distances differing by several orders of magnitude have been derived and favorably compared to observations by Nathan [Nature (London) 418, 409 (2002)]. For such bimodal kernels, we show that two-dimensional molecular dynamics computer simulations are unable to yield accurate front speeds. Analytically, the usual continuous-space random walks (CSRWs) are applied to two dimensions. We also introduce discrete-space random walks and use them to check the CSRW results (because of the inefficiency of the numerical simulations). The physical results reported are shown to predict front speeds high enough to possibly explain Reid's paradox of rapid tree migration. We also show that, for a time-ordered evolution equation, fronts are always slower in two dimensions than in one dimension and that this difference is important both for unimodal and for bimodal kernels
Resumo:
This paper presents a novel image classification scheme for benthic coral reef images that can be applied to both single image and composite mosaic datasets. The proposed method can be configured to the characteristics (e.g., the size of the dataset, number of classes, resolution of the samples, color information availability, class types, etc.) of individual datasets. The proposed method uses completed local binary pattern (CLBP), grey level co-occurrence matrix (GLCM), Gabor filter response, and opponent angle and hue channel color histograms as feature descriptors. For classification, either k-nearest neighbor (KNN), neural network (NN), support vector machine (SVM) or probability density weighted mean distance (PDWMD) is used. The combination of features and classifiers that attains the best results is presented together with the guidelines for selection. The accuracy and efficiency of our proposed method are compared with other state-of-the-art techniques using three benthic and three texture datasets. The proposed method achieves the highest overall classification accuracy of any of the tested methods and has moderate execution time. Finally, the proposed classification scheme is applied to a large-scale image mosaic of the Red Sea to create a completely classified thematic map of the reef benthos
Resumo:
The speed of traveling fronts for a two-dimensional model of a delayed reactiondispersal process is derived analytically and from simulations of molecular dynamics. We show that the one-dimensional (1D) and two-dimensional (2D) versions of a given kernel do not yield always the same speed. It is also shown that the speeds of time-delayed fronts may be higher than those predicted by the corresponding non-delayed models. This result is shown for systems with peaked dispersal kernels which lead to ballistic transport
Resumo:
Bimodal dispersal probability distributions with characteristic distances differing by several orders of magnitude have been derived and favorably compared to observations by Nathan [Nature (London) 418, 409 (2002)]. For such bimodal kernels, we show that two-dimensional molecular dynamics computer simulations are unable to yield accurate front speeds. Analytically, the usual continuous-space random walks (CSRWs) are applied to two dimensions. We also introduce discrete-space random walks and use them to check the CSRW results (because of the inefficiency of the numerical simulations). The physical results reported are shown to predict front speeds high enough to possibly explain Reid's paradox of rapid tree migration. We also show that, for a time-ordered evolution equation, fronts are always slower in two dimensions than in one dimension and that this difference is important both for unimodal and for bimodal kernels