24 resultados para Feature selection algorithm
em Universidade do Minho
Resumo:
Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Eletrónica Médica)
Resumo:
Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Eletrónica Médica)
Resumo:
The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching 90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.
Resumo:
Olive oil quality grading is traditionally assessed by human sensory evaluation of positive and negative attributes (olfactory, gustatory, and final olfactorygustatory sensations). However, it is not guaranteed that trained panelist can correctly classify monovarietal extra-virgin olive oils according to olive cultivar. In this work, the potential application of human (sensory panelists) and artificial (electronic tongue) sensory evaluation of olive oils was studied aiming to discriminate eight single-cultivar extra-virgin olive oils. Linear discriminant, partial least square discriminant, and sparse partial least square discriminant analyses were evaluated. The best predictive classification was obtained using linear discriminant analysis with simulated annealing selection algorithm. A low-level data fusion approach (18 electronic tongue signals and nine sensory attributes) enabled 100 % leave-one-out cross-validation correct classification, improving the discrimination capability of the individual use of sensor profiles or sensory attributes (70 and 57 % leave-one-out correct classifications, respectively). So, human sensory evaluation and electronic tongue analysis may be used as complementary tools allowing successful monovarietal olive oil discrimination.
Resumo:
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.
Resumo:
Natural selection favors the survival and reproduction of organisms that are best adapted to their environment. Selection mechanism in evolutionary algorithms mimics this process, aiming to create environmental conditions in which artificial organisms could evolve solving the problem at hand. This paper proposes a new selection scheme for evolutionary multiobjective optimization. The similarity measure that defines the concept of the neighborhood is a key feature of the proposed selection. Contrary to commonly used approaches, usually defined on the basis of distances between either individuals or weight vectors, it is suggested to consider the similarity and neighborhood based on the angle between individuals in the objective space. The smaller the angle, the more similar individuals. This notion is exploited during the mating and environmental selections. The convergence is ensured by minimizing distances from individuals to a reference point, whereas the diversity is preserved by maximizing angles between neighboring individuals. Experimental results reveal a highly competitive performance and useful characteristics of the proposed selection. Its strong diversity preserving ability allows to produce a significantly better performance on some problems when compared with stat-of-the-art algorithms.
Resumo:
Software product lines (SPL) are diverse systems that are developed using a dual engineering process: (a)family engineering defines the commonality and variability among all members of the SPL, and (b) application engineering derives specific products based on the common foundation combined with a variable selection of features. The number of derivable products in an SPL can thus be exponential in the number of features. This inherent complexity poses two main challenges when it comes to modelling: Firstly, the formalism used for modelling SPLs needs to be modular and scalable. Secondly, it should ensure that all products behave correctly by providing the ability to analyse and verify complex models efficiently. In this paper we propose to integrate an established modelling formalism (Petri nets) with the domain of software product line engineering. To this end we extend Petri nets to Feature Nets. While Petri nets provide a framework for formally modelling and verifying single software systems, Feature Nets offer the same sort of benefits for software product lines. We show how SPLs can be modelled in an incremental, modular fashion using Feature Nets, provide a Feature Nets variant that supports modelling dynamic SPLs, and propose an analysis method for SPL modelled as Feature Nets. By facilitating the construction of a single model that includes the various behaviours exhibited by the products in an SPL, we make a significant step towards efficient and practical quality assurance methods for software product lines.
Resumo:
The aim of this paper is to predict time series of SO2 concentrations emitted by coal-fired power stations in order to estimate in advance emission episodes and analyze the influence of some meteorological variables in the prediction. An emission episode is said to occur when the series of bi-hourly means of SO2 is greater than a specific level. For coal-fired power stations it is essential to predict emission epi- sodes sufficiently in advance so appropriate preventive measures can be taken. We proposed a meth- odology to predict SO2 emission episodes based on using an additive model and an algorithm for variable selection. The methodology was applied to the estimation of SO2 emissions registered in sampling lo- cations near a coal-fired power station located in Northern Spain. The results obtained indicate a good performance of the model considering only two terms of the time series and that the inclusion of the meteorological variables in the model is not significant.
Resumo:
The present paper reports the precipitation process of Al3Sc structures in an aluminum scandium alloy, which has been simulated with a synchronous parallel kinetic Monte Carlo (spkMC) algorithm. The spkMC implementation is based on the vacancy diffusion mechanism. To filter the raw data generated by the spkMC simulations, the density-based clustering with noise (DBSCAN) method has been employed. spkMC and DBSCAN algorithms were implemented in the C language and using MPI library. The simulations were conducted in the SeARCH cluster located at the University of Minho. The Al3Sc precipitation was successfully simulated at the atomistic scale with the spkMC. DBSCAN proved to be a valuable aid to identify the precipitates by performing a cluster analysis of the simulation results. The achieved simulations results are in good agreement with those reported in the literature under sequential kinetic Monte Carlo simulations (kMC). The parallel implementation of kMC has provided a 4x speedup over the sequential version.
Resumo:
This paper presents a simulation model, which was incorporated into a Geographic Information System (GIS), in order to calculate the maximum intensity of urban heat islands based on urban geometry data. The method-ology of this study stands on a theoretical-numerical basis (Okeâ s model), followed by the study and selection of existing GIS tools, the design of the calculation model, the incorporation of the resulting algorithm into the GIS platform and the application of the tool, developed as exemplification. The developed tool will help researchers to simulate UHI in different urban scenarios.
Resumo:
Novel input modalities such as touch, tangibles or gestures try to exploit human's innate skills rather than imposing new learning processes. However, despite the recent boom of different natural interaction paradigms, it hasn't been systematically evaluated how these interfaces influence a user's performance or whether each interface could be more or less appropriate when it comes to: 1) different age groups; and 2) different basic operations, as data selection, insertion or manipulation. This work presents the first step of an exploratory evaluation about whether or not the users' performance is indeed influenced by the different interfaces. The key point is to understand how different interaction paradigms affect specific target-audiences (children, adults and older adults) when dealing with a selection task. 60 participants took part in this study to assess how different interfaces may influence the interaction of specific groups of users with regard to their age. Four input modalities were used to perform a selection task and the methodology was based on usability testing (speed, accuracy and user preference). The study suggests a statistically significant difference between mean selection times for each group of users, and also raises new issues regarding the “old” mouse input versus the “new” input modalities.
Resumo:
DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.
Resumo:
Nowadays, the sustainability of buildings has an extreme importance. This concept goes towards the European aims of the Program Horizon 2020, which concerns about the reduction of the environmental impacts through such aspects as the energy efficiency and renewable technologies, among others. Sustainability is an extremely broad concept but, in this work, it is intended to include the concept of sustainability in buildings. Within the concept that aims the integration of environmental, social and economic levels towards the preservation of the planet and the integrity of the users, there are, currently, several types of tools of environmental certification that are applicable to the construction industry (LEED, BREEAM, DGNB, SBTool, among others). Within this context, it is highlighted the tool SBTool (Sustainable Building Tool) that is employed in several countries and can be subject to review in institutions of basic education, which are the base for the formation of the critical masses and for the development of a country. The main aim of this research is to select indicators that can be used in a methodology for sustainability assessment (SBTool) of school buildings in Portugal and in Brazil. In order to achieve it, it will also be analyzed other methodologies that already incorporate parameters directly related with the schools environment, such as BREEAM or LEED.
Resumo:
The artificial fish swarm algorithm has recently been emerged in continuous global optimization. It uses points of a population in space to identify the position of fish in the school. Many real-world optimization problems are described by 0-1 multidimensional knapsack problems that are NP-hard. In the last decades several exact as well as heuristic methods have been proposed for solving these problems. In this paper, a new simpli ed binary version of the artificial fish swarm algorithm is presented, where a point/ fish is represented by a binary string of 0/1 bits. Trial points are created by using crossover and mutation in the different fi sh behavior that are randomly selected by using two user de ned probability values. In order to make the points feasible the presented algorithm uses a random heuristic drop item procedure followed by an add item procedure aiming to increase the profit throughout the adding of more items in the knapsack. A cyclic reinitialization of 50% of the population, and a simple local search that allows the progress of a small percentage of points towards optimality and after that refines the best point in the population greatly improve the quality of the solutions. The presented method is tested on a set of benchmark instances and a comparison with other methods available in literature is shown. The comparison shows that the proposed method can be an alternative method for solving these problems.
Resumo:
The Electromagnetism-like (EM) algorithm is a population- based stochastic global optimization algorithm that uses an attraction- repulsion mechanism to move sample points towards the optimal. In this paper, an implementation of the EM algorithm in the Matlab en- vironment as a useful function for practitioners and for those who want to experiment a new global optimization solver is proposed. A set of benchmark problems are solved in order to evaluate the performance of the implemented method when compared with other stochastic methods available in the Matlab environment. The results con rm that our imple- mentation is a competitive alternative both in term of numerical results and performance. Finally, a case study based on a parameter estimation problem of a biology system shows that the EM implementation could be applied with promising results in the control optimization area.