161 resultados para correlation-based feature selection

em Université de Lausanne, Switzerland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A combined strategy based on the computation of absorption energies, using the ZINDO/S semiempirical method, for a statistically relevant number of thermally sampled configurations extracted from QM/MM trajectories is used to establish a one-to-one correspondence between the structures of the different early intermediates (dark, batho, BSI, lumi) involved in the initial steps of the rhodopsin photoactivation mechanism and their optical spectra. A systematic analysis of the results based on a correlation-based feature selection algorithm shows that the origin of the color shifts among these intermediates can be mainly ascribed to alterations in intrinsic properties of the chromophore structure, which are tuned by several residues located in the protein binding pocket. In addition to the expected electrostatic and dipolar effects caused by the charged residues (Glu113, Glu181) and to strong hydrogen bonding with Glu113, other interactions such as π-stacking with Ala117 and Thr118 backbone atoms, van der Waals contacts with Gly114 and Ala292, and CH/π weak interactions with Tyr268, Ala117, Thr118, and Ser186 side chains are found to make non-negligible contributions to the modulation of the color tuning among the different rhodopsin photointermediates.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The analysis of multi-modal and multi-sensor images is nowadays of paramount importance for Earth Observation (EO) applications. There exist a variety of methods that aim at fusing the different sources of information to obtain a compact representation of such datasets. However, for change detection existing methods are often unable to deal with heterogeneous image sources and very few consider possible nonlinearities in the data. Additionally, the availability of labeled information is very limited in change detection applications. For these reasons, we present the use of a semi-supervised kernel-based feature extraction technique. It incorporates a manifold regularization accounting for the geometric distribution and jointly addressing the small sample problem. An exhaustive example using Landsat 5 data illustrates the potential of the method for multi-sensor change detection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Difficult tracheal intubation assessment is an important research topic in anesthesia as failed intubations are important causes of mortality in anesthetic practice. The modified Mallampati score is widely used, alone or in conjunction with other criteria, to predict the difficulty of intubation. This work presents an automatic method to assess the modified Mallampati score from an image of a patient with the mouth wide open. For this purpose we propose an active appearance models (AAM) based method and use linear support vector machines (SVM) to select a subset of relevant features obtained using the AAM. This feature selection step proves to be essential as it improves drastically the performance of classification, which is obtained using SVM with RBF kernel and majority voting. We test our method on images of 100 patients undergoing elective surgery and achieve 97.9% accuracy in the leave-one-out crossvalidation test and provide a key element to an automatic difficult intubation assessment system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we study the relevance of multiple kernel learning (MKL) for the automatic selection of time series inputs. Recently, MKL has gained great attention in the machine learning community due to its flexibility in modelling complex patterns and performing feature selection. In general, MKL constructs the kernel as a weighted linear combination of basis kernels, exploiting different sources of information. An efficient algorithm wrapping a Support Vector Regression model for optimizing the MKL weights, named SimpleMKL, is used for the analysis. In this sense, MKL performs feature selection by discarding inputs/kernels with low or null weights. The approach proposed is tested with simulated linear and nonlinear time series (AutoRegressive, Henon and Lorenz series).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is composed of three main parts. The first consists of a state of the art of the different notions that are significant to understand the elements surrounding art authentication in general, and of signatures in particular, and that the author deemed them necessary to fully grasp the microcosm that makes up this particular market. Individuals with a solid knowledge of the art and expertise area, and that are particularly interested in the present study are advised to advance directly to the fourth Chapter. The expertise of the signature, it's reliability, and the factors impacting the expert's conclusions are brought forward. The final aim of the state of the art is to offer a general list of recommendations based on an exhaustive review of the current literature and given in light of all of the exposed issues. These guidelines are specifically formulated for the expertise of signatures on paintings, but can also be applied to wider themes in the area of signature examination. The second part of this thesis covers the experimental stages of the research. It consists of the method developed to authenticate painted signatures on works of art. This method is articulated around several main objectives: defining measurable features on painted signatures and defining their relevance in order to establish the separation capacities between groups of authentic and simulated signatures. For the first time, numerical analyses of painted signatures have been obtained and are used to attribute their authorship to given artists. An in-depth discussion of the developed method constitutes the third and final part of this study. It evaluates the opportunities and constraints when applied by signature and handwriting experts in forensic science. A brief summary covering each chapter allows a rapid overview of the study and summarizes the aims and main themes of each chapter. These outlines presented below summarize the aims and main themes addressed in each chapter. Part I - Theory Chapter 1 exposes legal aspects surrounding the authentication of works of art by art experts. The definition of what is legally authentic, the quality and types of the experts that can express an opinion concerning the authorship of a specific painting, and standard deontological rules are addressed. The practices applied in Switzerland will be specifically dealt with. Chapter 2 presents an overview of the different scientific analyses that can be carried out on paintings (from the canvas to the top coat). Scientific examinations of works of art have become more common, as more and more museums equip themselves with laboratories, thus an understanding of their role in the art authentication process is vital. The added value that a signature expertise can have in comparison to other scientific techniques is also addressed. Chapter 3 provides a historical overview of the signature on paintings throughout the ages, in order to offer the reader an understanding of the origin of the signature on works of art and its evolution through time. An explanation is given on the transitions that the signature went through from the 15th century on and how it progressively took on its widely known modern form. Both this chapter and chapter 2 are presented to show the reader the rich sources of information that can be provided to describe a painting, and how the signature is one of these sources. Chapter 4 focuses on the different hypotheses the FHE must keep in mind when examining a painted signature, since a number of scenarios can be encountered when dealing with signatures on works of art. The different forms of signatures, as well as the variables that may have an influence on the painted signatures, are also presented. Finally, the current state of knowledge of the examination procedure of signatures in forensic science in general, and in particular for painted signatures, is exposed. The state of the art of the assessment of the authorship of signatures on paintings is established and discussed in light of the theoretical facets mentioned previously. Chapter 5 considers key elements that can have an impact on the FHE during his or her2 examinations. This includes a discussion on elements such as the skill, confidence and competence of an expert, as well as the potential bias effects he might encounter. A better understanding of elements surrounding handwriting examinations, to, in turn, better communicate results and conclusions to an audience, is also undertaken. Chapter 6 reviews the judicial acceptance of signature analysis in Courts and closes the state of the art section of this thesis. This chapter brings forward the current issues pertaining to the appreciation of this expertise by the non- forensic community, and will discuss the increasing number of claims of the unscientific nature of signature authentication. The necessity to aim for more scientific, comprehensive and transparent authentication methods will be discussed. The theoretical part of this thesis is concluded by a series of general recommendations for forensic handwriting examiners in forensic science, specifically for the expertise of signatures on paintings. These recommendations stem from the exhaustive review of the literature and the issues exposed from this review and can also be applied to the traditional examination of signatures (on paper). Part II - Experimental part Chapter 7 describes and defines the sampling, extraction and analysis phases of the research. The sampling stage of artists' signatures and their respective simulations are presented, followed by the steps that were undertaken to extract and determine sets of characteristics, specific to each artist, that describe their signatures. The method is based on a study of five artists and a group of individuals acting as forgers for the sake of this study. Finally, the analysis procedure of these characteristics to assess of the strength of evidence, and based on a Bayesian reasoning process, is presented. Chapter 8 outlines the results concerning both the artist and simulation corpuses after their optical observation, followed by the results of the analysis phase of the research. The feature selection process and the likelihood ratio evaluation are the main themes that are addressed. The discrimination power between both corpuses is illustrated through multivariate analysis. Part III - Discussion Chapter 9 discusses the materials, the methods, and the obtained results of the research. The opportunities, but also constraints and limits, of the developed method are exposed. Future works that can be carried out subsequent to the results of the study are also presented. Chapter 10, the last chapter of this thesis, proposes a strategy to incorporate the model developed in the last chapters into the traditional signature expertise procedures. Thus, the strength of this expertise is discussed in conjunction with the traditional conclusions reached by forensic handwriting examiners in forensic science. Finally, this chapter summarizes and advocates a list of formal recommendations for good practices for handwriting examiners. In conclusion, the research highlights the interdisciplinary aspect of signature examination of signatures on paintings. The current state of knowledge of the judicial quality of art experts, along with the scientific and historical analysis of paintings and signatures, are overviewed to give the reader a feel of the different factors that have an impact on this particular subject. The temperamental acceptance of forensic signature analysis in court, also presented in the state of the art, explicitly demonstrates the necessity of a better recognition of signature expertise by courts of law. This general acceptance, however, can only be achieved by producing high quality results through a well-defined examination process. This research offers an original approach to attribute a painted signature to a certain artist: for the first time, a probabilistic model used to measure the discriminative potential between authentic and simulated painted signatures is studied. The opportunities and limits that lie within this method of scientifically establishing the authorship of signatures on works of art are thus presented. In addition, the second key contribution of this work proposes a procedure to combine the developed method into that used traditionally signature experts in forensic science. Such an implementation into the holistic traditional signature examination casework is a large step providing the forensic, judicial and art communities with a solid-based reasoning framework for the examination of signatures on paintings. The framework and preliminary results associated with this research have been published (Montani, 2009a) and presented at international forensic science conferences (Montani, 2009b; Montani, 2012).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sensing the chemical warnings present in the environment is essential for species survival. In mammals, this form of danger communication occurs via the release of natural predator scents that can involuntarily warn the prey or by the production of alarm pheromones by the stressed prey alerting its conspecifics. Although we previously identified the olfactory Grueneberg ganglion as the sensory organ through which mammalian alarm pheromones signal a threatening situation, the chemical nature of these cues remains elusive. We here identify, through chemical analysis in combination with a series of physiological and behavioral tests, the chemical structure of a mouse alarm pheromone. To successfully recognize the volatile cues that signal danger, we based our selection on their activation of the mouse olfactory Grueneberg ganglion and the concomitant display of innate fear reactions. Interestingly, we found that the chemical structure of the identified mouse alarm pheromone has similar features as the sulfur-containing volatiles that are released by predating carnivores. Our findings thus not only reveal a chemical Leitmotiv that underlies signaling of fear, but also point to a double role for the olfactory Grueneberg ganglion in intraspecies as well as interspecies communication of danger.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this research was to evaluate how fingerprint analysts would incorporate information from newly developed tools into their decision making processes. Specifically, we assessed effects using the following: (1) a quality tool to aid in the assessment of the clarity of the friction ridge details, (2) a statistical tool to provide likelihood ratios representing the strength of the corresponding features between compared fingerprints, and (3) consensus information from a group of trained fingerprint experts. The measured variables for the effect on examiner performance were the accuracy and reproducibility of the conclusions against the ground truth (including the impact on error rates) and the analyst accuracy and variation for feature selection and comparison.¦The results showed that participants using the consensus information from other fingerprint experts demonstrated more consistency and accuracy in minutiae selection. They also demonstrated higher accuracy, sensitivity, and specificity in the decisions reported. The quality tool also affected minutiae selection (which, in turn, had limited influence on the reported decisions); the statistical tool did not appear to influence the reported decisions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Among the types of remote sensing acquisitions, optical images are certainly one of the most widely relied upon data sources for Earth observation. They provide detailed measurements of the electromagnetic radiation reflected or emitted by each pixel in the scene. Through a process termed supervised land-cover classification, this allows to automatically yet accurately distinguish objects at the surface of our planet. In this respect, when producing a land-cover map of the surveyed area, the availability of training examples representative of each thematic class is crucial for the success of the classification procedure. However, in real applications, due to several constraints on the sample collection process, labeled pixels are usually scarce. When analyzing an image for which those key samples are unavailable, a viable solution consists in resorting to the ground truth data of other previously acquired images. This option is attractive but several factors such as atmospheric, ground and acquisition conditions can cause radiometric differences between the images, hindering therefore the transfer of knowledge from one image to another. The goal of this Thesis is to supply remote sensing image analysts with suitable processing techniques to ensure a robust portability of the classification models across different images. The ultimate purpose is to map the land-cover classes over large spatial and temporal extents with minimal ground information. To overcome, or simply quantify, the observed shifts in the statistical distribution of the spectra of the materials, we study four approaches issued from the field of machine learning. First, we propose a strategy to intelligently sample the image of interest to collect the labels only in correspondence of the most useful pixels. This iterative routine is based on a constant evaluation of the pertinence to the new image of the initial training data actually belonging to a different image. Second, an approach to reduce the radiometric differences among the images by projecting the respective pixels in a common new data space is presented. We analyze a kernel-based feature extraction framework suited for such problems, showing that, after this relative normalization, the cross-image generalization abilities of a classifier are highly increased. Third, we test a new data-driven measure of distance between probability distributions to assess the distortions caused by differences in the acquisition geometry affecting series of multi-angle images. Also, we gauge the portability of classification models through the sequences. In both exercises, the efficacy of classic physically- and statistically-based normalization methods is discussed. Finally, we explore a new family of approaches based on sparse representations of the samples to reciprocally convert the data space of two images. The projection function bridging the images allows a synthesis of new pixels with more similar characteristics ultimately facilitating the land-cover mapping across images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Remote sensing image processing is nowadays a mature research area. The techniques developed in the field allow many real-life applications with great societal value. For instance, urban monitoring, fire detection or flood prediction can have a great impact on economical and environmental issues. To attain such objectives, the remote sensing community has turned into a multidisciplinary field of science that embraces physics, signal theory, computer science, electronics, and communications. From a machine learning and signal/image processing point of view, all the applications are tackled under specific formalisms, such as classification and clustering, regression and function approximation, image coding, restoration and enhancement, source unmixing, data fusion or feature selection and extraction. This paper serves as a survey of methods and applications, and reviews the last methodological advances in remote sensing image processing.