867 resultados para least square-support vector machine


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose two active learning algorithms for semiautomatic definition of training samples in remote sensing image classification. Based on predefined heuristics, the classifier ranks the unlabeled pixels and automatically chooses those that are considered the most valuable for its improvement. Once the pixels have been selected, the analyst labels them manually and the process is iterated. Starting with a small and nonoptimal training set, the model itself builds the optimal set of samples which minimizes the classification error. We have applied the proposed algorithms to a variety of remote sensing data, including very high resolution and hyperspectral images, using support vector machines. Experimental results confirm the consistency of the methods. The required number of training samples can be reduced to 10% using the methods proposed, reaching the same level of accuracy as larger data sets. A comparison with a state-of-the-art active learning method, margin sampling, is provided, highlighting advantages of the methods proposed. The effect of spatial resolution and separability of the classes on the quality of the selection of pixels is also discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The development of statistical models for forensic fingerprint identification purposes has been the subject of increasing research attention in recent years. This can be partly seen as a response to a number of commentators who claim that the scientific basis for fingerprint identification has not been adequately demonstrated. In addition, key forensic identification bodies such as ENFSI [1] and IAI [2] have recently endorsed and acknowledged the potential benefits of using statistical models as an important tool in support of the fingerprint identification process within the ACE-V framework. In this paper, we introduce a new Likelihood Ratio (LR) model based on Support Vector Machines (SVMs) trained with features discovered via morphometric and spatial analyses of corresponding minutiae configurations for both match and close non-match populations often found in AFIS candidate lists. Computed LR values are derived from a probabilistic framework based on SVMs that discover the intrinsic spatial differences of match and close non-match populations. Lastly, experimentation performed on a set of over 120,000 publicly available fingerprint images (mostly sourced from the National Institute of Standards and Technology (NIST) datasets) and a distortion set of approximately 40,000 images, is presented, illustrating that the proposed LR model is reliably guiding towards the right proposition in the identification assessment of match and close non-match populations. Results further indicate that the proposed model is a promising tool for fingerprint practitioners to use for analysing the spatial consistency of corresponding minutiae configurations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work was to investigate heterosis and its components in 16 white grain maize populations presenting high quality protein. These populations were divided according to grain type in order to establish different heterosis groups. The crosses were carried out according to a partial diallel cross design among flint and dent populations. Seven agronomic traits were evaluated in three environments while four leaf diseases and incidence of corn stunt were evaluated in one. Least square procedure was applied to the normal equation X'Xbeta = X'Y, to estimate the model effects and their respective sum of squares. Among the heterosis components, in diallel analysis, significance for average heterosis in grain yield, number of days to female flowering and to all evaluated diseases was detected. Specific heterosis was significant for days to female flowering and resistance to Puccinia polysora. Results concerned to grain yield trait indicate that populations with superior performance in dent group, no matter what flint population group is used in crosses, tend to generate superior intervarietal hybrids. In decreasing order of preference, the dent type populations CMS 476, ZQP/B 103 and ZQP/B 101 and the flint type CMS 461, CMS 460, ZQP/B 104 and ZQP/B 102 are recommended to form composites.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This letter presents advanced classification methods for very high resolution images. Efficient multisource information, both spectral and spatial, is exploited through the use of composite kernels in support vector machines. Weighted summations of kernels accounting for separate sources of spectral and spatial information are analyzed and compared to classical approaches such as pure spectral classification or stacked approaches using all the features in a single vector. Model selection problems are addressed, as well as the importance of the different kernels in the weighted summation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Raman spectroscopy combined with chemometrics has recently become a widespread technique for the analysis of pharmaceutical solid forms. The application presented in this paper is the investigation of counterfeit medicines. This increasingly serious issue involves networks that are an integral part of industrialized organized crime. Efficient analytical tools are consequently required to fight against it. Quick and reliable authentication means are needed to allow the deployment of measures from the company and the authorities. For this purpose a method in two steps has been implemented here. The first step enables the identification of pharmaceutical tablets and capsules and the detection of their counterfeits. A nonlinear classification method, the Support Vector Machines (SVM), is computed together with a correlation with the database and the detection of Active Pharmaceutical Ingredient (API) peaks in the suspect product. If a counterfeit is detected, the second step allows its chemical profiling among former counterfeits in a forensic intelligence perspective. For this second step a classification based on Principal Component Analysis (PCA) and correlation distance measurements is applied to the Raman spectra of the counterfeits.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, mixed spectral-structural kernel machines are proposed for the classification of very-high resolution images. The simultaneous use of multispectral and structural features (computed using morphological filters) allows a significant increase in classification accuracy of remote sensing images. Subsequently, weighted summation kernel support vector machines are proposed and applied in order to take into account the multiscale nature of the scene considered. Such classifiers use the Mercer property of kernel matrices to compute a new kernel matrix accounting simultaneously for two scale parameters. Tests on a Zurich QuickBird image show the relevance of the proposed method : using the mixed spectral-structural features, the classification accuracy increases of about 5%, achieving a Kappa index of 0.97. The multikernel approach proposed provide an overall accuracy of 98.90% with related Kappa index of 0.985.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis different parameters influencing critical flux in protein ultrafiltration and membrane foul-ing were studied. Short reviews of proteins, cross-flow ultrafiltration, flux decline and criticalflux and the basic theory of Partial Least Square analysis (PLS) are given at the beginning. The experiments were mainly performed using dilute solutions of globular proteins, commercial polymeric membranes and laboratory scale apparatuses. Fouling was studied by flux, streaming potential and FTIR-ATR measurements. Critical flux was evaluated by different kinds of stepwise procedures and by both con-stant pressure and constant flux methods. The critical flux was affected by transmembrane pressure, flow velocity, protein concentration, mem-brane hydrophobicity and protein and membrane charges. Generally, the lowest critical fluxes were obtained at the isoelectric points of the protein and the highest in the presence of electrostatic repulsion between the membrane surface and the protein molecules. In the laminar flow regime the critical flux increased with flow velocity, but not any more above this region. An increase in concentration de-creased the critical flux. Hydrophobic membranes showed fouling in all charge conditionsand, furthermore, especially at the beginning of the experiment even at very low transmembrane pressures. Fouling of these membranes was thought to be due to protein adsorption by hydrophobic interactions. The hydrophilic membranes used suffered more from reversible fouling and concentration polarisation than from irreversible foul-ing. They became fouled at higher transmembrane pressures becauseof pore blocking. In this thesis some new aspects on critical flux are presented that are important for ultrafiltration and fractionation of proteins.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Metastatic melanomas are frequently refractory to most adjuvant therapies such as chemotherapies and radiotherapies. Recently, immunotherapies have shown good results in the treatment of some metastatic melanomas. Immune cell infiltration in the tumor has been associated with successful immunotherapy. More generally, tumor infiltrating lymphocytes (TILs) in the primary tumor and in metastases of melanoma patients have been demonstrated to correlate positively with favorable clinical outcomes. Altogether, these findings suggest the importance of being able to identify, quantify and characterize immune infiltration at the tumor site for a better diagnostic and treatment choice. In this paper, we used Fourier Transform Infrared (FTIR) imaging to identify and quantify different subpopulations of T cells: the cytotoxic T cells (CD8+), the helper T cells (CD4+) and the regulatory T cells (T reg). As a proof of concept, we investigated pure populations isolated from human peripheral blood from 6 healthy donors. These subpopulations were isolated from blood samples by magnetic labeling and purities were assessed by Fluorescence Activated Cell Sorting (FACS). The results presented here show that Fourier Transform Infrared (FTIR) imaging followed by supervised Partial Least Square Discriminant Analysis (PLS-DA) allows an accurate identification of CD4+ T cells and CD8+ T cells (>86%). We then developed a PLS regression allowing the quantification of T reg in a different mix of immune cells (e.g. Peripheral Blood Mononuclear Cells (PBMCs)). Altogether, these results demonstrate the sensitivity of infrared imaging to detect the low biological variability observed in T cell subpopulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sähkönkulutuksen lyhyen aikavälin ennustamista on tutkittu jo pitkään. Pohjoismaisien sähkömarkkinoiden vapautuminen on vaikuttanut sähkönkulutuksen ennustamiseen. Aluksi työssä perehdyttiin aiheeseen liittyvään kirjallisuuteen. Sähkönkulutuksen käyttäytymistä tutkittiin eri aikoina. Lämpötila tilastojen käyttökelpoisuutta arvioitiin sähkönkulutusennustetta ajatellen. Kulutus ennusteet tehtiin tunneittain ja ennustejaksona käytettiin yhtä viikkoa. Työssä tutkittiin sähkönkulutuksen- ja lämpötiladatan saatavuutta ja laatua Nord Poolin markkina-alueelta. Syötettävien tietojen ominaisuudet vaikuttavat tunnittaiseen sähkönkulutuksen ennustamiseen. Sähkönkulutuksen ennustamista varten mallinnettiin kaksi lähestymistapaa. Testattavina malleina käytettiin regressiomallia ja autoregressiivistä mallia (autoregressive model, ARX). Mallien parametrit estimoitiin pienimmän neliösumman menetelmällä. Tulokset osoittavat että kulutus- ja lämpötiladata on tarkastettava jälkikäteen koska reaaliaikaisen syötetietojen laatu on huonoa. Lämpötila vaikuttaa kulutukseen talvella, mutta se voidaan jättää huomiotta kesäkaudella. Regressiomalli on vakaampi kuin ARX malli. Regressiomallin virhetermi voidaan mallintaa aikasarjamallia hyväksikäyttäen.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this study is to confirm the factorial structure of the Identification-Commitment Inventory (ICI) developed within the frame of the Human System Audit (HSA) (Quijano et al. in Revist Psicol Soc Apl 10(2):27-61, 2000; Pap Psicól Revist Col Of Psicó 29:92-106, 2008). Commitment and identification are understood by the HSA at an individual level as part of the quality of human processes and resources in an organization; and therefore as antecedents of important organizational outcomes, such as personnel turnover intentions, organizational citizenship behavior, etc. (Meyer et al. in J Org Behav 27:665-683, 2006). The theoretical integrative model which underlies ICI Quijano et al. (2000) was tested in a sample (N = 625) of workers in a Spanish public hospital. Confirmatory factor analysis through structural equation modeling was performed. Elliptical least square solution was chosen as estimator procedure on account of non-normal distribution of the variables. The results confirm the goodness of fit of an integrative model, which underlies the relation between Commitment and Identification, although each one is operatively different.