864 resultados para Feature selection algorithm
Resumo:
Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Eletrónica Médica)
Resumo:
The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching 90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.
Resumo:
Olive oil quality grading is traditionally assessed by human sensory evaluation of positive and negative attributes (olfactory, gustatory, and final olfactorygustatory sensations). However, it is not guaranteed that trained panelist can correctly classify monovarietal extra-virgin olive oils according to olive cultivar. In this work, the potential application of human (sensory panelists) and artificial (electronic tongue) sensory evaluation of olive oils was studied aiming to discriminate eight single-cultivar extra-virgin olive oils. Linear discriminant, partial least square discriminant, and sparse partial least square discriminant analyses were evaluated. The best predictive classification was obtained using linear discriminant analysis with simulated annealing selection algorithm. A low-level data fusion approach (18 electronic tongue signals and nine sensory attributes) enabled 100 % leave-one-out cross-validation correct classification, improving the discrimination capability of the individual use of sensor profiles or sensory attributes (70 and 57 % leave-one-out correct classifications, respectively). So, human sensory evaluation and electronic tongue analysis may be used as complementary tools allowing successful monovarietal olive oil discrimination.
Resumo:
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.
Resumo:
El present treball fa un anàlisi i desenvolupament sobre les millores en la velocitat i en l’escalabilitat d'un simulador distribuït de grups de peixos. Aquests resultats s’han obtingut fent servir una nova estratègia de comunicació per als processos lògics (LPs) i canvis en l'algoritme de selecció de veïns que s'aplica a cadascun dels peixos en cada pas de simulació. L’idea proposada permet que cada procés lògic anticipi futures necessitats de dades pels seus veïns reduint el temps de comunicació al limitar la quantitat de missatges intercanviats entre els LPs. El nou algoritme de selecció dels veïns es va desenvolupar amb l'objectiu d'evitar treball innecessari permetent la disminució de les instruccions executades en cada pas de simulació i per cadascun del peixos simulats reduint de forma significativa el temps de simulació.
Resumo:
Difficult tracheal intubation assessment is an important research topic in anesthesia as failed intubations are important causes of mortality in anesthetic practice. The modified Mallampati score is widely used, alone or in conjunction with other criteria, to predict the difficulty of intubation. This work presents an automatic method to assess the modified Mallampati score from an image of a patient with the mouth wide open. For this purpose we propose an active appearance models (AAM) based method and use linear support vector machines (SVM) to select a subset of relevant features obtained using the AAM. This feature selection step proves to be essential as it improves drastically the performance of classification, which is obtained using SVM with RBF kernel and majority voting. We test our method on images of 100 patients undergoing elective surgery and achieve 97.9% accuracy in the leave-one-out crossvalidation test and provide a key element to an automatic difficult intubation assessment system.
Resumo:
A parts based model is a parametrization of an object class using a collection of landmarks following the object structure. The matching of parts based models is one of the problems where pairwise Conditional Random Fields have been successfully applied. The main reason of their effectiveness is tractable inference and learning due to the simplicity of involved graphs, usually trees. However, these models do not consider possible patterns of statistics among sets of landmarks, and thus they sufffer from using too myopic information. To overcome this limitation, we propoese a novel structure based on a hierarchical Conditional Random Fields, which we explain in the first part of this memory. We build a hierarchy of combinations of landmarks, where matching is performed taking into account the whole hierarchy. To preserve tractable inference we effectively sample the label set. We test our method on facial feature selection and human pose estimation on two challenging datasets: Buffy and MultiPIE. In the second part of this memory, we present a novel approach to multiple kernel combination that relies on stacked classification. This method can be used to evaluate the landmarks of the parts-based model approach. Our method is based on combining responses of a set of independent classifiers for each individual kernel. Unlike earlier approaches that linearly combine kernel responses, our approach uses them as inputs to another set of classifiers. We will show that we outperform state-of-the-art methods on most of the standard benchmark datasets.
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
Photo-mosaicing techniques have become popular for seafloor mapping in various marine science applications. However, the common methods cannot accurately map regions with high relief and topographical variations. Ortho-mosaicing borrowed from photogrammetry is an alternative technique that enables taking into account the 3-D shape of the terrain. A serious bottleneck is the volume of elevation information that needs to be estimated from the video data, fused, and processed for the generation of a composite ortho-photo that covers a relatively large seafloor area. We present a framework that combines the advantages of dense depth-map and 3-D feature estimation techniques based on visual motion cues. The main goal is to identify and reconstruct certain key terrain feature points that adequately represent the surface with minimal complexity in the form of piecewise planar patches. The proposed implementation utilizes local depth maps for feature selection, while tracking over several views enables 3-D reconstruction by bundle adjustment. Experimental results with synthetic and real data validate the effectiveness of the proposed approach
Resumo:
This paper describes a navigation system for autonomous underwater vehicles (AUVs) in partially structured environments, such as dams, harbors, marinas or marine platforms. A mechanical scanning imaging sonar is used to obtain information about the location of planar structures present in such environments. A modified version of the Hough transform has been developed to extract line features, together with their uncertainty, from the continuous sonar dataflow. The information obtained is incorporated into a feature-based SLAM algorithm running an Extended Kalman Filter (EKF). Simultaneously, the AUV's position estimate is provided to the feature extraction algorithm to correct the distortions that the vehicle motion produces in the acoustic images. Experiments carried out in a marina located in the Costa Brava (Spain) with the Ictineu AUV show the viability of the proposed approach
Resumo:
Mosaics have been commonly used as visual maps for undersea exploration and navigation. The position and orientation of an underwater vehicle can be calculated by integrating the apparent motion of the images which form the mosaic. A feature-based mosaicking method is proposed in this paper. The creation of the mosaic is accomplished in four stages: feature selection and matching, detection of points describing the dominant motion, homography computation and mosaic construction. In this work we demonstrate that the use of color and textures as discriminative properties of the image can improve, to a large extent, the accuracy of the constructed mosaic. The system is able to provide 3D metric information concerning the vehicle motion using the knowledge of the intrinsic parameters of the camera while integrating the measurements of an ultrasonic sensor. The experimental results of real images have been tested on the GARBI underwater vehicle
Resumo:
In this work we present the results of experimental work on the development of lexical class-based lexica by automatic means. Our purpose is to assess the use of linguistic lexical-class based information as a feature selection methodology for the use of classifiers in quick lexical development. The results show that the approach can help reduce the human effort required in the development of language resources significantly.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
The aim of this research was to evaluate how fingerprint analysts would incorporate information from newly developed tools into their decision making processes. Specifically, we assessed effects using the following: (1) a quality tool to aid in the assessment of the clarity of the friction ridge details, (2) a statistical tool to provide likelihood ratios representing the strength of the corresponding features between compared fingerprints, and (3) consensus information from a group of trained fingerprint experts. The measured variables for the effect on examiner performance were the accuracy and reproducibility of the conclusions against the ground truth (including the impact on error rates) and the analyst accuracy and variation for feature selection and comparison.¦The results showed that participants using the consensus information from other fingerprint experts demonstrated more consistency and accuracy in minutiae selection. They also demonstrated higher accuracy, sensitivity, and specificity in the decisions reported. The quality tool also affected minutiae selection (which, in turn, had limited influence on the reported decisions); the statistical tool did not appear to influence the reported decisions.
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.