929 resultados para FEATURE SELECTION
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
Photo-mosaicing techniques have become popular for seafloor mapping in various marine science applications. However, the common methods cannot accurately map regions with high relief and topographical variations. Ortho-mosaicing borrowed from photogrammetry is an alternative technique that enables taking into account the 3-D shape of the terrain. A serious bottleneck is the volume of elevation information that needs to be estimated from the video data, fused, and processed for the generation of a composite ortho-photo that covers a relatively large seafloor area. We present a framework that combines the advantages of dense depth-map and 3-D feature estimation techniques based on visual motion cues. The main goal is to identify and reconstruct certain key terrain feature points that adequately represent the surface with minimal complexity in the form of piecewise planar patches. The proposed implementation utilizes local depth maps for feature selection, while tracking over several views enables 3-D reconstruction by bundle adjustment. Experimental results with synthetic and real data validate the effectiveness of the proposed approach
Resumo:
Mosaics have been commonly used as visual maps for undersea exploration and navigation. The position and orientation of an underwater vehicle can be calculated by integrating the apparent motion of the images which form the mosaic. A feature-based mosaicking method is proposed in this paper. The creation of the mosaic is accomplished in four stages: feature selection and matching, detection of points describing the dominant motion, homography computation and mosaic construction. In this work we demonstrate that the use of color and textures as discriminative properties of the image can improve, to a large extent, the accuracy of the constructed mosaic. The system is able to provide 3D metric information concerning the vehicle motion using the knowledge of the intrinsic parameters of the camera while integrating the measurements of an ultrasonic sensor. The experimental results of real images have been tested on the GARBI underwater vehicle
Resumo:
In this work we present the results of experimental work on the development of lexical class-based lexica by automatic means. Our purpose is to assess the use of linguistic lexical-class based information as a feature selection methodology for the use of classifiers in quick lexical development. The results show that the approach can help reduce the human effort required in the development of language resources significantly.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
The aim of this research was to evaluate how fingerprint analysts would incorporate information from newly developed tools into their decision making processes. Specifically, we assessed effects using the following: (1) a quality tool to aid in the assessment of the clarity of the friction ridge details, (2) a statistical tool to provide likelihood ratios representing the strength of the corresponding features between compared fingerprints, and (3) consensus information from a group of trained fingerprint experts. The measured variables for the effect on examiner performance were the accuracy and reproducibility of the conclusions against the ground truth (including the impact on error rates) and the analyst accuracy and variation for feature selection and comparison.¦The results showed that participants using the consensus information from other fingerprint experts demonstrated more consistency and accuracy in minutiae selection. They also demonstrated higher accuracy, sensitivity, and specificity in the decisions reported. The quality tool also affected minutiae selection (which, in turn, had limited influence on the reported decisions); the statistical tool did not appear to influence the reported decisions.
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.
Resumo:
The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.
Resumo:
A combined strategy based on the computation of absorption energies, using the ZINDO/S semiempirical method, for a statistically relevant number of thermally sampled configurations extracted from QM/MM trajectories is used to establish a one-to-one correspondence between the structures of the different early intermediates (dark, batho, BSI, lumi) involved in the initial steps of the rhodopsin photoactivation mechanism and their optical spectra. A systematic analysis of the results based on a correlation-based feature selection algorithm shows that the origin of the color shifts among these intermediates can be mainly ascribed to alterations in intrinsic properties of the chromophore structure, which are tuned by several residues located in the protein binding pocket. In addition to the expected electrostatic and dipolar effects caused by the charged residues (Glu113, Glu181) and to strong hydrogen bonding with Glu113, other interactions such as π-stacking with Ala117 and Thr118 backbone atoms, van der Waals contacts with Gly114 and Ala292, and CH/π weak interactions with Tyr268, Ala117, Thr118, and Ser186 side chains are found to make non-negligible contributions to the modulation of the color tuning among the different rhodopsin photointermediates.
Resumo:
Remote sensing image processing is nowadays a mature research area. The techniques developed in the field allow many real-life applications with great societal value. For instance, urban monitoring, fire detection or flood prediction can have a great impact on economical and environmental issues. To attain such objectives, the remote sensing community has turned into a multidisciplinary field of science that embraces physics, signal theory, computer science, electronics, and communications. From a machine learning and signal/image processing point of view, all the applications are tackled under specific formalisms, such as classification and clustering, regression and function approximation, image coding, restoration and enhancement, source unmixing, data fusion or feature selection and extraction. This paper serves as a survey of methods and applications, and reviews the last methodological advances in remote sensing image processing.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
Alzheimer's disease is the most prevalent form of progressive degenerative dementia; it has a high socio-economic impact in Western countries. Therefore it is one of the most active research areas today. Alzheimer's is sometimes diagnosed by excluding other dementias, and definitive confirmation is only obtained through a post-mortem study of the brain tissue of the patient. The work presented here is part of a larger study that aims to identify novel technologies and biomarkers for early Alzheimer's disease detection, and it focuses on evaluating the suitability of a new approach for early diagnosis of Alzheimer’s disease by non-invasive methods. The purpose is to examine, in a pilot study, the potential of applying Machine Learning algorithms to speech features obtained from suspected Alzheimer sufferers in order help diagnose this disease and determine its degree of severity. Two human capabilities relevant in communication have been analyzed for feature selection: Spontaneous Speech and Emotional Response. The experimental results obtained were very satisfactory and promising for the early diagnosis and classification of Alzheimer’s disease patients.
Resumo:
We investigate the relevance of morphological operators for the classification of land use in urban scenes using submetric panchromatic imagery. A support vector machine is used for the classification. Six types of filters have been employed: opening and closing, opening and closing by reconstruction, and opening and closing top hat. The type and scale of the filters are discussed, and a feature selection algorithm called recursive feature elimination is applied to decrease the dimensionality of the input data. The analysis performed on two QuickBird panchromatic images showed that simple opening and closing operators are the most relevant for classification at such a high spatial resolution. Moreover, mixed sets combining simple and reconstruction filters provided the best performance. Tests performed on both images, having areas characterized by different architectural styles, yielded similar results for both feature selection and classification accuracy, suggesting the generalization of the feature sets highlighted.
Resumo:
This thesis is composed of three main parts. The first consists of a state of the art of the different notions that are significant to understand the elements surrounding art authentication in general, and of signatures in particular, and that the author deemed them necessary to fully grasp the microcosm that makes up this particular market. Individuals with a solid knowledge of the art and expertise area, and that are particularly interested in the present study are advised to advance directly to the fourth Chapter. The expertise of the signature, it's reliability, and the factors impacting the expert's conclusions are brought forward. The final aim of the state of the art is to offer a general list of recommendations based on an exhaustive review of the current literature and given in light of all of the exposed issues. These guidelines are specifically formulated for the expertise of signatures on paintings, but can also be applied to wider themes in the area of signature examination. The second part of this thesis covers the experimental stages of the research. It consists of the method developed to authenticate painted signatures on works of art. This method is articulated around several main objectives: defining measurable features on painted signatures and defining their relevance in order to establish the separation capacities between groups of authentic and simulated signatures. For the first time, numerical analyses of painted signatures have been obtained and are used to attribute their authorship to given artists. An in-depth discussion of the developed method constitutes the third and final part of this study. It evaluates the opportunities and constraints when applied by signature and handwriting experts in forensic science. A brief summary covering each chapter allows a rapid overview of the study and summarizes the aims and main themes of each chapter. These outlines presented below summarize the aims and main themes addressed in each chapter. Part I - Theory Chapter 1 exposes legal aspects surrounding the authentication of works of art by art experts. The definition of what is legally authentic, the quality and types of the experts that can express an opinion concerning the authorship of a specific painting, and standard deontological rules are addressed. The practices applied in Switzerland will be specifically dealt with. Chapter 2 presents an overview of the different scientific analyses that can be carried out on paintings (from the canvas to the top coat). Scientific examinations of works of art have become more common, as more and more museums equip themselves with laboratories, thus an understanding of their role in the art authentication process is vital. The added value that a signature expertise can have in comparison to other scientific techniques is also addressed. Chapter 3 provides a historical overview of the signature on paintings throughout the ages, in order to offer the reader an understanding of the origin of the signature on works of art and its evolution through time. An explanation is given on the transitions that the signature went through from the 15th century on and how it progressively took on its widely known modern form. Both this chapter and chapter 2 are presented to show the reader the rich sources of information that can be provided to describe a painting, and how the signature is one of these sources. Chapter 4 focuses on the different hypotheses the FHE must keep in mind when examining a painted signature, since a number of scenarios can be encountered when dealing with signatures on works of art. The different forms of signatures, as well as the variables that may have an influence on the painted signatures, are also presented. Finally, the current state of knowledge of the examination procedure of signatures in forensic science in general, and in particular for painted signatures, is exposed. The state of the art of the assessment of the authorship of signatures on paintings is established and discussed in light of the theoretical facets mentioned previously. Chapter 5 considers key elements that can have an impact on the FHE during his or her2 examinations. This includes a discussion on elements such as the skill, confidence and competence of an expert, as well as the potential bias effects he might encounter. A better understanding of elements surrounding handwriting examinations, to, in turn, better communicate results and conclusions to an audience, is also undertaken. Chapter 6 reviews the judicial acceptance of signature analysis in Courts and closes the state of the art section of this thesis. This chapter brings forward the current issues pertaining to the appreciation of this expertise by the non- forensic community, and will discuss the increasing number of claims of the unscientific nature of signature authentication. The necessity to aim for more scientific, comprehensive and transparent authentication methods will be discussed. The theoretical part of this thesis is concluded by a series of general recommendations for forensic handwriting examiners in forensic science, specifically for the expertise of signatures on paintings. These recommendations stem from the exhaustive review of the literature and the issues exposed from this review and can also be applied to the traditional examination of signatures (on paper). Part II - Experimental part Chapter 7 describes and defines the sampling, extraction and analysis phases of the research. The sampling stage of artists' signatures and their respective simulations are presented, followed by the steps that were undertaken to extract and determine sets of characteristics, specific to each artist, that describe their signatures. The method is based on a study of five artists and a group of individuals acting as forgers for the sake of this study. Finally, the analysis procedure of these characteristics to assess of the strength of evidence, and based on a Bayesian reasoning process, is presented. Chapter 8 outlines the results concerning both the artist and simulation corpuses after their optical observation, followed by the results of the analysis phase of the research. The feature selection process and the likelihood ratio evaluation are the main themes that are addressed. The discrimination power between both corpuses is illustrated through multivariate analysis. Part III - Discussion Chapter 9 discusses the materials, the methods, and the obtained results of the research. The opportunities, but also constraints and limits, of the developed method are exposed. Future works that can be carried out subsequent to the results of the study are also presented. Chapter 10, the last chapter of this thesis, proposes a strategy to incorporate the model developed in the last chapters into the traditional signature expertise procedures. Thus, the strength of this expertise is discussed in conjunction with the traditional conclusions reached by forensic handwriting examiners in forensic science. Finally, this chapter summarizes and advocates a list of formal recommendations for good practices for handwriting examiners. In conclusion, the research highlights the interdisciplinary aspect of signature examination of signatures on paintings. The current state of knowledge of the judicial quality of art experts, along with the scientific and historical analysis of paintings and signatures, are overviewed to give the reader a feel of the different factors that have an impact on this particular subject. The temperamental acceptance of forensic signature analysis in court, also presented in the state of the art, explicitly demonstrates the necessity of a better recognition of signature expertise by courts of law. This general acceptance, however, can only be achieved by producing high quality results through a well-defined examination process. This research offers an original approach to attribute a painted signature to a certain artist: for the first time, a probabilistic model used to measure the discriminative potential between authentic and simulated painted signatures is studied. The opportunities and limits that lie within this method of scientifically establishing the authorship of signatures on works of art are thus presented. In addition, the second key contribution of this work proposes a procedure to combine the developed method into that used traditionally signature experts in forensic science. Such an implementation into the holistic traditional signature examination casework is a large step providing the forensic, judicial and art communities with a solid-based reasoning framework for the examination of signatures on paintings. The framework and preliminary results associated with this research have been published (Montani, 2009a) and presented at international forensic science conferences (Montani, 2009b; Montani, 2012).