888 resultados para Reproducing kernel
Resumo:
In this paper, we develop a data-driven methodology to characterize the likelihood of orographic precipitation enhancement using sequences of weather radar images and a digital elevation model (DEM). Geographical locations with topographic characteristics favorable to enforce repeatable and persistent orographic precipitation such as stationary cells, upslope rainfall enhancement, and repeated convective initiation are detected by analyzing the spatial distribution of a set of precipitation cells extracted from radar imagery. Topographic features such as terrain convexity and gradients computed from the DEM at multiple spatial scales as well as velocity fields estimated from sequences of weather radar images are used as explanatory factors to describe the occurrence of localized precipitation enhancement. The latter is represented as a binary process by defining a threshold on the number of cell occurrences at particular locations. Both two-class and one-class support vector machine classifiers are tested to separate the presumed orographic cells from the nonorographic ones in the space of contributing topographic and flow features. Site-based validation is carried out to estimate realistic generalization skills of the obtained spatial prediction models. Due to the high class separability, the decision function of the classifiers can be interpreted as a likelihood or susceptibility of orographic precipitation enhancement. The developed approach can serve as a basis for refining radar-based quantitative precipitation estimates and short-term forecasts or for generating stochastic precipitation ensembles conditioned on the local topography.
Resumo:
Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.
Resumo:
En aquest treball demostrem que en la classe de jocs d'assignació amb diagonal dominant (Solymosi i Raghavan, 2001), el repartiment de Thompson (que coincideix amb el valor tau) és l'únic punt del core que és maximal respecte de la relació de dominància de Lorenz, i a més coincideix amb la solucié de Dutta i Ray (1989), també coneguda com solució igualitària. En segon lloc, mitjançant una condició més forta que la de diagonal dominant, introduïm una nova classe de jocs d'assignació on cada agent obté amb la seva parella òptima almenys el doble que amb qualsevol altra parella. Per aquests jocs d'assignació amb diagonal 2-dominant, el repartiment de Thompson és l'únic punt del kernel, i per tant el nucleolo.
Resumo:
Meiosis in triploids faces the seemingly insuperable difficulty of dividing an odd number of chromosome sets by two. Triploid vertebrates usually circumvent this problem through either asexuality or some forms of hybridogenesis, including meiotic hybridogenesis that involve a reproductive community of different ploidy levels and genome composition. Batura toads (Bufo baturae; 3n = 33 chromosomes), however, present an all-triploid sexual reproduction. This hybrid species has two genome copies carrying a nucleolus-organizing region (NOR+) on chromosome 6, and a third copy without it (NOR-). Males only produce haploid NOR+ sperm, while ova are diploid, containing one NOR+ and one NOR- set. Here, we conduct sibship analyses with co-dominant microsatellite markers so as (i) to confirm the purely clonal and maternal transmission of the NOR- set, and (ii) to demonstrate Mendelian segregation and recombination of the NOR+ sets in both sexes. This new reproductive mode in vertebrates ('pre-equalizing hybrid meiosis') offers an ideal opportunity to study the evolution of non-recombining genomes. Elucidating the mechanisms that allow simultaneous transmission of two genomes, one of Mendelian, the other of clonal inheritance, might shed light on the general processes that regulate meiosis in vertebrates.
Resumo:
The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.
Resumo:
Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.
Resumo:
PURPOSE: The aim of this study was to develop models based on kernel regression and probability estimation in order to predict and map IRC in Switzerland by taking into account all of the following: architectural factors, spatial relationships between the measurements, as well as geological information. METHODS: We looked at about 240,000 IRC measurements carried out in about 150,000 houses. As predictor variables we included: building type, foundation type, year of construction, detector type, geographical coordinates, altitude, temperature and lithology into the kernel estimation models. We developed predictive maps as well as a map of the local probability to exceed 300 Bq/m(3). Additionally, we developed a map of a confidence index in order to estimate the reliability of the probability map. RESULTS: Our models were able to explain 28% of the variations of IRC data. All variables added information to the model. The model estimation revealed a bandwidth for each variable, making it possible to characterize the influence of each variable on the IRC estimation. Furthermore, we assessed the mapping characteristics of kernel estimation overall as well as by municipality. Overall, our model reproduces spatial IRC patterns which were already obtained earlier. On the municipal level, we could show that our model accounts well for IRC trends within municipal boundaries. Finally, we found that different building characteristics result in different IRC maps. Maps corresponding to detached houses with concrete foundations indicate systematically smaller IRC than maps corresponding to farms with earth foundation. CONCLUSIONS: IRC mapping based on kernel estimation is a powerful tool to predict and analyze IRC on a large-scale as well as on a local level. This approach enables to develop tailor-made maps for different architectural elements and measurement conditions and to account at the same time for geological information and spatial relations between IRC measurements.
Resumo:
Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.
Resumo:
Dose kernel convolution (DK) methods have been proposed to speed up absorbed dose calculations in molecular radionuclide therapy. Our aim was to evaluate the impact of tissue density heterogeneities (TDH) on dosimetry when using a DK method and to propose a simple density-correction method. METHODS: This study has been conducted on 3 clinical cases: case 1, non-Hodgkin lymphoma treated with (131)I-tositumomab; case 2, a neuroendocrine tumor treatment simulated with (177)Lu-peptides; and case 3, hepatocellular carcinoma treated with (90)Y-microspheres. Absorbed dose calculations were performed using a direct Monte Carlo approach accounting for TDH (3D-RD), and a DK approach (VoxelDose, or VD). For each individual voxel, the VD absorbed dose, D(VD), calculated assuming uniform density, was corrected for density, giving D(VDd). The average 3D-RD absorbed dose values, D(3DRD), were compared with D(VD) and D(VDd), using the relative difference Δ(VD/3DRD). At the voxel level, density-binned Δ(VD/3DRD) and Δ(VDd/3DRD) were plotted against ρ and fitted with a linear regression. RESULTS: The D(VD) calculations showed a good agreement with D(3DRD). Δ(VD/3DRD) was less than 3.5%, except for the tumor of case 1 (5.9%) and the renal cortex of case 2 (5.6%). At the voxel level, the Δ(VD/3DRD) range was 0%-14% for cases 1 and 2, and -3% to 7% for case 3. All 3 cases showed a linear relationship between voxel bin-averaged Δ(VD/3DRD) and density, ρ: case 1 (Δ = -0.56ρ + 0.62, R(2) = 0.93), case 2 (Δ = -0.91ρ + 0.96, R(2) = 0.99), and case 3 (Δ = -0.69ρ + 0.72, R(2) = 0.91). The density correction improved the agreement of the DK method with the Monte Carlo approach (Δ(VDd/3DRD) < 1.1%), but with a lesser extent for the tumor of case 1 (3.1%). At the voxel level, the Δ(VDd/3DRD) range decreased for the 3 clinical cases (case 1, -1% to 4%; case 2, -0.5% to 1.5%, and -1.5% to 2%). No more linear regression existed for cases 2 and 3, contrary to case 1 (Δ = 0.41ρ - 0.38, R(2) = 0.88) although the slope in case 1 was less pronounced. CONCLUSION: This study shows a small influence of TDH in the abdominal region for 3 representative clinical cases. A simple density-correction method was proposed and improved the comparison in the absorbed dose calculations when using our voxel S value implementation.
Resumo:
The objective of this work was to obtain organic compounds similar to the ones found in the organic matter of anthropogenic dark earth of Amazonia (ADE) using a chemical functionalization procedure on activated charcoal, as well as to determine their ecotoxicity. Based on the study of the organic matter from ADE, an organic model was proposed and an attempt to reproduce it was described. Activated charcoal was oxidized with the use of sodium hypochlorite at different concentrations. Nuclear magnetic resonance was performed to verify if the spectra of the obtained products were similar to the ones of humic acids from ADE. The similarity between spectra indicated that the obtained products were polycondensed aromatic structures with carboxyl groups: a soil amendment that can contribute to soil fertility and to its sustainable use. An ecotoxicological test with Daphnia similis was performed on the more soluble fraction (fulvic acids) of the produced soil amendment. Aryl chloride was formed during the synthesis of the organic compounds from activated charcoal functionalization and partially removed through a purification process. However, it is probable that some aryl chloride remained in the final product, since the ecotoxicological test indicated that the chemical functionalized soil amendment is moderately toxic.
Resumo:
We prove upper pointwise estimates for the Bergman kernel of the weighted Fock space of entire functions in $L^{2}(e^{-2\phi}) $ where $\phi$ is a subharmonic function with $\Delta\phi$ a doubling measure. We derive estimates for the canonical solution operator to the inhomogeneous Cauchy-Riemann equation and we characterize the compactness of this operator in terms of $\Delta\phi$.
Resumo:
Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.
Resumo:
Let $Q$ be a suitable real function on $C$. An $n$-Fekete set corresponding to $Q$ is a subset ${Z_{n1}},\dotsb, Z_{nn}}$ of $C$ which maximizes the expression $\Pi^n_i_{
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.