904 resultados para Supervector kernel


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

PURPOSE: The aim of this study was to develop models based on kernel regression and probability estimation in order to predict and map IRC in Switzerland by taking into account all of the following: architectural factors, spatial relationships between the measurements, as well as geological information. METHODS: We looked at about 240,000 IRC measurements carried out in about 150,000 houses. As predictor variables we included: building type, foundation type, year of construction, detector type, geographical coordinates, altitude, temperature and lithology into the kernel estimation models. We developed predictive maps as well as a map of the local probability to exceed 300 Bq/m(3). Additionally, we developed a map of a confidence index in order to estimate the reliability of the probability map. RESULTS: Our models were able to explain 28% of the variations of IRC data. All variables added information to the model. The model estimation revealed a bandwidth for each variable, making it possible to characterize the influence of each variable on the IRC estimation. Furthermore, we assessed the mapping characteristics of kernel estimation overall as well as by municipality. Overall, our model reproduces spatial IRC patterns which were already obtained earlier. On the municipal level, we could show that our model accounts well for IRC trends within municipal boundaries. Finally, we found that different building characteristics result in different IRC maps. Maps corresponding to detached houses with concrete foundations indicate systematically smaller IRC than maps corresponding to farms with earth foundation. CONCLUSIONS: IRC mapping based on kernel estimation is a powerful tool to predict and analyze IRC on a large-scale as well as on a local level. This approach enables to develop tailor-made maps for different architectural elements and measurement conditions and to account at the same time for geological information and spatial relations between IRC measurements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dose kernel convolution (DK) methods have been proposed to speed up absorbed dose calculations in molecular radionuclide therapy. Our aim was to evaluate the impact of tissue density heterogeneities (TDH) on dosimetry when using a DK method and to propose a simple density-correction method. METHODS: This study has been conducted on 3 clinical cases: case 1, non-Hodgkin lymphoma treated with (131)I-tositumomab; case 2, a neuroendocrine tumor treatment simulated with (177)Lu-peptides; and case 3, hepatocellular carcinoma treated with (90)Y-microspheres. Absorbed dose calculations were performed using a direct Monte Carlo approach accounting for TDH (3D-RD), and a DK approach (VoxelDose, or VD). For each individual voxel, the VD absorbed dose, D(VD), calculated assuming uniform density, was corrected for density, giving D(VDd). The average 3D-RD absorbed dose values, D(3DRD), were compared with D(VD) and D(VDd), using the relative difference Δ(VD/3DRD). At the voxel level, density-binned Δ(VD/3DRD) and Δ(VDd/3DRD) were plotted against ρ and fitted with a linear regression. RESULTS: The D(VD) calculations showed a good agreement with D(3DRD). Δ(VD/3DRD) was less than 3.5%, except for the tumor of case 1 (5.9%) and the renal cortex of case 2 (5.6%). At the voxel level, the Δ(VD/3DRD) range was 0%-14% for cases 1 and 2, and -3% to 7% for case 3. All 3 cases showed a linear relationship between voxel bin-averaged Δ(VD/3DRD) and density, ρ: case 1 (Δ = -0.56ρ + 0.62, R(2) = 0.93), case 2 (Δ = -0.91ρ + 0.96, R(2) = 0.99), and case 3 (Δ = -0.69ρ + 0.72, R(2) = 0.91). The density correction improved the agreement of the DK method with the Monte Carlo approach (Δ(VDd/3DRD) < 1.1%), but with a lesser extent for the tumor of case 1 (3.1%). At the voxel level, the Δ(VDd/3DRD) range decreased for the 3 clinical cases (case 1, -1% to 4%; case 2, -0.5% to 1.5%, and -1.5% to 2%). No more linear regression existed for cases 2 and 3, contrary to case 1 (Δ = 0.41ρ - 0.38, R(2) = 0.88) although the slope in case 1 was less pronounced. CONCLUSION: This study shows a small influence of TDH in the abdominal region for 3 representative clinical cases. A simple density-correction method was proposed and improved the comparison in the absorbed dose calculations when using our voxel S value implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We prove upper pointwise estimates for the Bergman kernel of the weighted Fock space of entire functions in $L^{2}(e^{-2\phi}) $ where $\phi$ is a subharmonic function with $\Delta\phi$ a doubling measure. We derive estimates for the canonical solution operator to the inhomogeneous Cauchy-Riemann equation and we characterize the compactness of this operator in terms of $\Delta\phi$.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Let $Q$ be a suitable real function on $C$. An $n$-Fekete set corresponding to $Q$ is a subset ${Z_{n1}},\dotsb, Z_{nn}}$ of $C$ which maximizes the expression $\Pi^n_i_{

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a new kernel estimation of the cumulative distribution function based on transformation and on bias reducing techniques. We derive the optimal bandwidth that minimises the asymptotic integrated mean squared error. The simulation results show that our proposed kernel estimation improves alternative approaches when the variable has an extreme value distribution with heavy tail and the sample size is small.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Defatted Brazil nut kernel flour, a rich source of high quality proteins, is presently being utilized in the formulation of animal feeds. One of the possible ways to improve its utilization for human consumption is through improvement in its functional properties. In the present study, changes in some of the functional properties of Brazil nut kernel globulin were evaluated after acetylation at 58.6, 66.2 and 75.3% levels. The solubility of acetylated globulin was improved above pH 6.0 but was reduced in the pH range of 3.0-4.0. Water and oil absorption capacity, as well as the viscosity increased with increase in the level of acetylation. Level of modification also influenced the emulsifying capacity: decreased at pH 3.0, but increased at pH 7.0 and 9.0. Highest emulsion activity (approximately 62.2%) was observed at pH 3.0 followed by pH 9.0 and pH 7.0 and least (about 11.8%) at pH 5.0. Emulsion stability also followed similar behavior as that of emulsion activity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this study was to investigate and model the water absorption process by corn kernels with different levels of mechanical damage Corn kernels of AG 1510 variety with moisture content of 14.2 (% d.b.) were used. Different mechanical damage levels were indirectly evaluated by electrical conductivity measurements. The absorption process was based on the industrial corn wet milling process, in which the product was soaked with a 0.2% sulfur dioxide (SO2) solution and 0.55% lactic acid (C3H6O3) in distilled water, under controlled temperatures of 40, 50, 60, and 70 ºC and different mechanical damage levels. The Peleg model was used for the analysis and modeling of water absorption process. The conclusion is that the structural changes caused by the mechanical damage to the corn kernels influenced the initial rates of water absorption, which were higher for the most damaged kernels, and they also changed the equilibrium moisture contents of the kernels. The Peleg model was well adjusted to the experimental data presenting satisfactory values for the analyzed statistic parameters for all temperatures regardless of the damage level of the corn kernels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Solid lipid particles have been investigated by food researchers due to their ability to enhance the incorporation and bioavailability of lipophilic bioactives in aqueous formulations. The objectives of this study were to evaluate the physicochemical stability and digestibility of lipid microparticles produced with tristearin and palm kernel oil. The motivation for conducting this study was the fact that mixing lipids can prevent the expulsion of the bioactive from the lipid core and enhance the digestibility of lipid structures. The lipid microparticles containing different palm kernel oil contents were stable after 60 days of storage according to the particle size and zeta potential data. Their calorimetric behavior indicated that they were composed of a very heterogeneous lipid matrix. Lipid microparticles were stable under various conditions of ionic strength, sugar concentration, temperature, and pH. Digestibility assays indicated no differences in the release of free fatty acids, which was approximately 30% in all analises. The in vitro digestibility tests showed that the amount of palm kernel in the particles did not affect the percentage of lipolysis, probably due to the high amount of surfactants used and/or the solid state of the microparticles.