44 resultados para Feature selection algorithm
em Université de Lausanne, Switzerland
Resumo:
A combined strategy based on the computation of absorption energies, using the ZINDO/S semiempirical method, for a statistically relevant number of thermally sampled configurations extracted from QM/MM trajectories is used to establish a one-to-one correspondence between the structures of the different early intermediates (dark, batho, BSI, lumi) involved in the initial steps of the rhodopsin photoactivation mechanism and their optical spectra. A systematic analysis of the results based on a correlation-based feature selection algorithm shows that the origin of the color shifts among these intermediates can be mainly ascribed to alterations in intrinsic properties of the chromophore structure, which are tuned by several residues located in the protein binding pocket. In addition to the expected electrostatic and dipolar effects caused by the charged residues (Glu113, Glu181) and to strong hydrogen bonding with Glu113, other interactions such as π-stacking with Ala117 and Thr118 backbone atoms, van der Waals contacts with Gly114 and Ala292, and CH/π weak interactions with Tyr268, Ala117, Thr118, and Ser186 side chains are found to make non-negligible contributions to the modulation of the color tuning among the different rhodopsin photointermediates.
Resumo:
We investigate the relevance of morphological operators for the classification of land use in urban scenes using submetric panchromatic imagery. A support vector machine is used for the classification. Six types of filters have been employed: opening and closing, opening and closing by reconstruction, and opening and closing top hat. The type and scale of the filters are discussed, and a feature selection algorithm called recursive feature elimination is applied to decrease the dimensionality of the input data. The analysis performed on two QuickBird panchromatic images showed that simple opening and closing operators are the most relevant for classification at such a high spatial resolution. Moreover, mixed sets combining simple and reconstruction filters provided the best performance. Tests performed on both images, having areas characterized by different architectural styles, yielded similar results for both feature selection and classification accuracy, suggesting the generalization of the feature sets highlighted.
Resumo:
In this paper we study the relevance of multiple kernel learning (MKL) for the automatic selection of time series inputs. Recently, MKL has gained great attention in the machine learning community due to its flexibility in modelling complex patterns and performing feature selection. In general, MKL constructs the kernel as a weighted linear combination of basis kernels, exploiting different sources of information. An efficient algorithm wrapping a Support Vector Regression model for optimizing the MKL weights, named SimpleMKL, is used for the analysis. In this sense, MKL performs feature selection by discarding inputs/kernels with low or null weights. The approach proposed is tested with simulated linear and nonlinear time series (AutoRegressive, Henon and Lorenz series).
Resumo:
Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.
Resumo:
This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.
Resumo:
Difficult tracheal intubation assessment is an important research topic in anesthesia as failed intubations are important causes of mortality in anesthetic practice. The modified Mallampati score is widely used, alone or in conjunction with other criteria, to predict the difficulty of intubation. This work presents an automatic method to assess the modified Mallampati score from an image of a patient with the mouth wide open. For this purpose we propose an active appearance models (AAM) based method and use linear support vector machines (SVM) to select a subset of relevant features obtained using the AAM. This feature selection step proves to be essential as it improves drastically the performance of classification, which is obtained using SVM with RBF kernel and majority voting. We test our method on images of 100 patients undergoing elective surgery and achieve 97.9% accuracy in the leave-one-out crossvalidation test and provide a key element to an automatic difficult intubation assessment system.
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
The aim of this research was to evaluate how fingerprint analysts would incorporate information from newly developed tools into their decision making processes. Specifically, we assessed effects using the following: (1) a quality tool to aid in the assessment of the clarity of the friction ridge details, (2) a statistical tool to provide likelihood ratios representing the strength of the corresponding features between compared fingerprints, and (3) consensus information from a group of trained fingerprint experts. The measured variables for the effect on examiner performance were the accuracy and reproducibility of the conclusions against the ground truth (including the impact on error rates) and the analyst accuracy and variation for feature selection and comparison.¦The results showed that participants using the consensus information from other fingerprint experts demonstrated more consistency and accuracy in minutiae selection. They also demonstrated higher accuracy, sensitivity, and specificity in the decisions reported. The quality tool also affected minutiae selection (which, in turn, had limited influence on the reported decisions); the statistical tool did not appear to influence the reported decisions.
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.
Resumo:
Remote sensing image processing is nowadays a mature research area. The techniques developed in the field allow many real-life applications with great societal value. For instance, urban monitoring, fire detection or flood prediction can have a great impact on economical and environmental issues. To attain such objectives, the remote sensing community has turned into a multidisciplinary field of science that embraces physics, signal theory, computer science, electronics, and communications. From a machine learning and signal/image processing point of view, all the applications are tackled under specific formalisms, such as classification and clustering, regression and function approximation, image coding, restoration and enhancement, source unmixing, data fusion or feature selection and extraction. This paper serves as a survey of methods and applications, and reviews the last methodological advances in remote sensing image processing.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
This thesis is composed of three main parts. The first consists of a state of the art of the different notions that are significant to understand the elements surrounding art authentication in general, and of signatures in particular, and that the author deemed them necessary to fully grasp the microcosm that makes up this particular market. Individuals with a solid knowledge of the art and expertise area, and that are particularly interested in the present study are advised to advance directly to the fourth Chapter. The expertise of the signature, it's reliability, and the factors impacting the expert's conclusions are brought forward. The final aim of the state of the art is to offer a general list of recommendations based on an exhaustive review of the current literature and given in light of all of the exposed issues. These guidelines are specifically formulated for the expertise of signatures on paintings, but can also be applied to wider themes in the area of signature examination. The second part of this thesis covers the experimental stages of the research. It consists of the method developed to authenticate painted signatures on works of art. This method is articulated around several main objectives: defining measurable features on painted signatures and defining their relevance in order to establish the separation capacities between groups of authentic and simulated signatures. For the first time, numerical analyses of painted signatures have been obtained and are used to attribute their authorship to given artists. An in-depth discussion of the developed method constitutes the third and final part of this study. It evaluates the opportunities and constraints when applied by signature and handwriting experts in forensic science. A brief summary covering each chapter allows a rapid overview of the study and summarizes the aims and main themes of each chapter. These outlines presented below summarize the aims and main themes addressed in each chapter. Part I - Theory Chapter 1 exposes legal aspects surrounding the authentication of works of art by art experts. The definition of what is legally authentic, the quality and types of the experts that can express an opinion concerning the authorship of a specific painting, and standard deontological rules are addressed. The practices applied in Switzerland will be specifically dealt with. Chapter 2 presents an overview of the different scientific analyses that can be carried out on paintings (from the canvas to the top coat). Scientific examinations of works of art have become more common, as more and more museums equip themselves with laboratories, thus an understanding of their role in the art authentication process is vital. The added value that a signature expertise can have in comparison to other scientific techniques is also addressed. Chapter 3 provides a historical overview of the signature on paintings throughout the ages, in order to offer the reader an understanding of the origin of the signature on works of art and its evolution through time. An explanation is given on the transitions that the signature went through from the 15th century on and how it progressively took on its widely known modern form. Both this chapter and chapter 2 are presented to show the reader the rich sources of information that can be provided to describe a painting, and how the signature is one of these sources. Chapter 4 focuses on the different hypotheses the FHE must keep in mind when examining a painted signature, since a number of scenarios can be encountered when dealing with signatures on works of art. The different forms of signatures, as well as the variables that may have an influence on the painted signatures, are also presented. Finally, the current state of knowledge of the examination procedure of signatures in forensic science in general, and in particular for painted signatures, is exposed. The state of the art of the assessment of the authorship of signatures on paintings is established and discussed in light of the theoretical facets mentioned previously. Chapter 5 considers key elements that can have an impact on the FHE during his or her2 examinations. This includes a discussion on elements such as the skill, confidence and competence of an expert, as well as the potential bias effects he might encounter. A better understanding of elements surrounding handwriting examinations, to, in turn, better communicate results and conclusions to an audience, is also undertaken. Chapter 6 reviews the judicial acceptance of signature analysis in Courts and closes the state of the art section of this thesis. This chapter brings forward the current issues pertaining to the appreciation of this expertise by the non- forensic community, and will discuss the increasing number of claims of the unscientific nature of signature authentication. The necessity to aim for more scientific, comprehensive and transparent authentication methods will be discussed. The theoretical part of this thesis is concluded by a series of general recommendations for forensic handwriting examiners in forensic science, specifically for the expertise of signatures on paintings. These recommendations stem from the exhaustive review of the literature and the issues exposed from this review and can also be applied to the traditional examination of signatures (on paper). Part II - Experimental part Chapter 7 describes and defines the sampling, extraction and analysis phases of the research. The sampling stage of artists' signatures and their respective simulations are presented, followed by the steps that were undertaken to extract and determine sets of characteristics, specific to each artist, that describe their signatures. The method is based on a study of five artists and a group of individuals acting as forgers for the sake of this study. Finally, the analysis procedure of these characteristics to assess of the strength of evidence, and based on a Bayesian reasoning process, is presented. Chapter 8 outlines the results concerning both the artist and simulation corpuses after their optical observation, followed by the results of the analysis phase of the research. The feature selection process and the likelihood ratio evaluation are the main themes that are addressed. The discrimination power between both corpuses is illustrated through multivariate analysis. Part III - Discussion Chapter 9 discusses the materials, the methods, and the obtained results of the research. The opportunities, but also constraints and limits, of the developed method are exposed. Future works that can be carried out subsequent to the results of the study are also presented. Chapter 10, the last chapter of this thesis, proposes a strategy to incorporate the model developed in the last chapters into the traditional signature expertise procedures. Thus, the strength of this expertise is discussed in conjunction with the traditional conclusions reached by forensic handwriting examiners in forensic science. Finally, this chapter summarizes and advocates a list of formal recommendations for good practices for handwriting examiners. In conclusion, the research highlights the interdisciplinary aspect of signature examination of signatures on paintings. The current state of knowledge of the judicial quality of art experts, along with the scientific and historical analysis of paintings and signatures, are overviewed to give the reader a feel of the different factors that have an impact on this particular subject. The temperamental acceptance of forensic signature analysis in court, also presented in the state of the art, explicitly demonstrates the necessity of a better recognition of signature expertise by courts of law. This general acceptance, however, can only be achieved by producing high quality results through a well-defined examination process. This research offers an original approach to attribute a painted signature to a certain artist: for the first time, a probabilistic model used to measure the discriminative potential between authentic and simulated painted signatures is studied. The opportunities and limits that lie within this method of scientifically establishing the authorship of signatures on works of art are thus presented. In addition, the second key contribution of this work proposes a procedure to combine the developed method into that used traditionally signature experts in forensic science. Such an implementation into the holistic traditional signature examination casework is a large step providing the forensic, judicial and art communities with a solid-based reasoning framework for the examination of signatures on paintings. The framework and preliminary results associated with this research have been published (Montani, 2009a) and presented at international forensic science conferences (Montani, 2009b; Montani, 2012).
Resumo:
BACKGROUND: Transcranial magnetic stimulation combined with electroencephalogram (TMS-EEG) can be used to explore the dynamical state of neuronal networks. In patients with epilepsy, TMS can induce epileptiform discharges (EDs) with a stochastic occurrence despite constant stimulation parameters. This observation raises the possibility that the pre-stimulation period contains multiple covert states of brain excitability some of which are associated with the generation of EDs. OBJECTIVE: To investigate whether the interictal period contains "high excitability" states that upon brain stimulation produce EDs and can be differentiated from "low excitability" states producing normal appearing TMS-EEG responses. METHODS: In a cohort of 25 patients with Genetic Generalized Epilepsies (GGE) we identified two subjects characterized by the intermittent development of TMS-induced EDs. The high-excitability in the pre-stimulation period was assessed using multiple measures of univariate time series analysis. Measures providing optimal discrimination were identified by feature selection techniques. The "high excitability" states emerged in multiple loci (indicating diffuse cortical hyperexcitability) and were clearly differentiated on the basis of 14 measures from "low excitability" states (accuracy = 0.7). CONCLUSION: In GGE, the interictal period contains multiple, quasi-stable covert states of excitability a class of which is associated with the generation of TMS-induced EDs. The relevance of these findings to theoretical models of ictogenesis is discussed.