930 resultados para classification algorithm
Resumo:
Search is now going beyond looking for factual information, and people wish to search for the opinions of others to help them in their own decision-making. Sentiment expressions or opinion expressions are used by users to express their opinion and embody important pieces of information, particularly in online commerce. The main problem that the present dissertation addresses is how to model text to find meaningful words that express a sentiment. In this context, I investigate the viability of automatically generating a sentiment lexicon for opinion retrieval and sentiment classification applications. For this research objective we propose to capture sentiment words that are derived from online users’ reviews. In this approach, we tackle a major challenge in sentiment analysis which is the detection of words that express subjective preference and domain-specific sentiment words such as jargon. To this aim we present a fully generative method that automatically learns a domain-specific lexicon and is fully independent of external sources. Sentiment lexicons can be applied in a broad set of applications, however popular recommendation algorithms have somehow been disconnected from sentiment analysis. Therefore, we present a study that explores the viability of applying sentiment analysis techniques to infer ratings in a recommendation algorithm. Furthermore, entities’ reputation is intrinsically associated with sentiment words that have a positive or negative relation with those entities. Hence, is provided a study that observes the viability of using a domain-specific lexicon to compute entities reputation. Finally, a recommendation system algorithm is improved with the use of sentiment-based ratings and entities reputation.
Resumo:
Ship tracking systems allow Maritime Organizations that are concerned with the Safety at Sea to obtain information on the current location and route of merchant vessels. Thanks to Space technology in recent years the geographical coverage of the ship tracking platforms has increased significantly, from radar based near-shore traffic monitoring towards a worldwide picture of the maritime traffic situation. The long-range tracking systems currently in operations allow the storage of ship position data over many years: a valuable source of knowledge about the shipping routes between different ocean regions. The outcome of this Master project is a software prototype for the estimation of the most operated shipping route between any two geographical locations. The analysis is based on the historical ship positions acquired with long-range tracking systems. The proposed approach makes use of a Genetic Algorithm applied on a training set of relevant ship positions extracted from the long-term storage tracking database of the European Maritime Safety Agency (EMSA). The analysis of some representative shipping routes is presented and the quality of the results and their operational applications are assessed by a Maritime Safety expert.
Resumo:
The present paper reports the precipitation process of Al3Sc structures in an aluminum scandium alloy, which has been simulated with a synchronous parallel kinetic Monte Carlo (spkMC) algorithm. The spkMC implementation is based on the vacancy diffusion mechanism. To filter the raw data generated by the spkMC simulations, the density-based clustering with noise (DBSCAN) method has been employed. spkMC and DBSCAN algorithms were implemented in the C language and using MPI library. The simulations were conducted in the SeARCH cluster located at the University of Minho. The Al3Sc precipitation was successfully simulated at the atomistic scale with the spkMC. DBSCAN proved to be a valuable aid to identify the precipitates by performing a cluster analysis of the simulation results. The achieved simulations results are in good agreement with those reported in the literature under sequential kinetic Monte Carlo simulations (kMC). The parallel implementation of kMC has provided a 4x speedup over the sequential version.
Resumo:
Given the current economic situation of the Portuguese municipalities, it is necessary to identify the priority investments in order to achieve a more efficient financial management. The classification of the road network of the municipality according to the occurrence of traffic accidents is fundamental to set priorities for road interventions. This paper presents a model for road network classification based on traffic accidents integrated in a geographic information system. Its practical application was developed through a case study in the municipality of Barcelos. An equation was defined to obtain a road safety index through the combination of the following indicators: severity, property damage only and accident costs. In addition to the road network classification, the application of the model allows to analyze the spatial coverage of accidents in order to determine the centrality and dispersion of the locations with the highest incidence of road accidents. This analysis can be further refined according to the nature of the accidents namely in collision, runoff and pedestrian crashes.
Resumo:
Nowadays the main honey producing countries require accurate labeling of honey before commercialization, including floral classification. Traditionally, this classification is made by melissopalynology analysis, an accurate but time-consuming task requiring laborious sample pre-treatment and high-skilled technicians. In this work the potential use of a potentiometric electronic tongue for pollinic assessment is evaluated, using monofloral and polyfloral honeys. The results showed that after splitting honeys according to color (white, amber and dark), the novel methodology enabled quantifying the relative percentage of the main pollens (Castanea sp., Echium sp., Erica sp., Eucaliptus sp., Lavandula sp., Prunus sp., Rubus sp. and Trifolium sp.). Multiple linear regression models were established for each type of pollen, based on the best sensors sub-sets selected using the simulated annealing algorithm. To minimize the overfitting risk, a repeated K-fold cross-validation procedure was implemented, ensuring that at least 10-20% of the honeys were used for internal validation. With this approach, a minimum average determination coefficient of 0.91 ± 0.15 was obtained. Also, the proposed technique enabled the correct classification of 92% and 100% of monofloral and polyfloral honeys, respectively. The quite satisfactory performance of the novel procedure for quantifying the relative pollen frequency may envisage its applicability for honey labeling and geographical origin identification. Nevertheless, this approach is not a full alternative to the traditional melissopalynologic analysis; it may be seen as a practical complementary tool for preliminary honey floral classification, leaving only problematic cases for pollinic evaluation.
Resumo:
DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.
Resumo:
The artificial fish swarm algorithm has recently been emerged in continuous global optimization. It uses points of a population in space to identify the position of fish in the school. Many real-world optimization problems are described by 0-1 multidimensional knapsack problems that are NP-hard. In the last decades several exact as well as heuristic methods have been proposed for solving these problems. In this paper, a new simpli ed binary version of the artificial fish swarm algorithm is presented, where a point/ fish is represented by a binary string of 0/1 bits. Trial points are created by using crossover and mutation in the different fi sh behavior that are randomly selected by using two user de ned probability values. In order to make the points feasible the presented algorithm uses a random heuristic drop item procedure followed by an add item procedure aiming to increase the profit throughout the adding of more items in the knapsack. A cyclic reinitialization of 50% of the population, and a simple local search that allows the progress of a small percentage of points towards optimality and after that refines the best point in the population greatly improve the quality of the solutions. The presented method is tested on a set of benchmark instances and a comparison with other methods available in literature is shown. The comparison shows that the proposed method can be an alternative method for solving these problems.
Resumo:
The Electromagnetism-like (EM) algorithm is a population- based stochastic global optimization algorithm that uses an attraction- repulsion mechanism to move sample points towards the optimal. In this paper, an implementation of the EM algorithm in the Matlab en- vironment as a useful function for practitioners and for those who want to experiment a new global optimization solver is proposed. A set of benchmark problems are solved in order to evaluate the performance of the implemented method when compared with other stochastic methods available in the Matlab environment. The results con rm that our imple- mentation is a competitive alternative both in term of numerical results and performance. Finally, a case study based on a parameter estimation problem of a biology system shows that the EM implementation could be applied with promising results in the control optimization area.
Resumo:
In this paper, we propose an extension of the firefly algorithm (FA) to multi-objective optimization. FA is a swarm intelligence optimization algorithm inspired by the flashing behavior of fireflies at night that is capable of computing global solutions to continuous optimization problems. Our proposal relies on a fitness assignment scheme that gives lower fitness values to the positions of fireflies that correspond to non-dominated points with smaller aggregation of objective function distances to the minimum values. Furthermore, FA randomness is based on the spread metric to reduce the gaps between consecutive non-dominated solutions. The obtained results from the preliminary computational experiments show that our proposal gives a dense and well distributed approximated Pareto front with a large number of points.
Resumo:
This paper presents a single-phase Series Active Power Filter (Series APF) for mitigation of the load voltage harmonic content, while maintaining the voltage on the DC side regulated without the support of a voltage source. The proposed series active power filter control algorithm eliminates the additional voltage source to regulate the DC voltage, and with the adopted topology it is not used a coupling transformer to interface the series active power filter with the electrical power grid. The paper describes the control strategy which encapsulates the grid synchronization scheme, the compensation voltage calculation, the damping algorithm and the dead-time compensation. The topology and control strategy of the series active power filter have been evaluated in simulation software and simulations results are presented. Experimental results, obtained with a developed laboratorial prototype, validate the theoretical assumptions, and are within the harmonic spectrum limits imposed by the international recommendations of the IEEE-519 Standard.
Resumo:
Olive oils may be commercialized as intense, medium or light, according to the intensity perception of fruitiness, bitterness and pungency attributes, assessed by a sensory panel. In this work, the capability of an electronic tongue to correctly classify olive oils according to the sensory intensity perception levels was evaluated. Cross-sensitivity and non-specific lipid polymeric membranes were used as sensors. The sensor device was firstly tested using quinine monohydrochloride standard solutions. Mean sensitivities of 14±2 to 25±6 mV/decade, depending on the type of plasticizer used in the lipid membranes, were obtained showing the device capability for evaluating bitterness. Then, linear discriminant models based on sub-sets of sensors, selected by a meta-heuristic simulated annealing algorithm, were established enabling to correctly classify 91% of olive oils according to their intensity sensory grade (leave-one-out cross-validation procedure). This capability was further evaluated using a repeated K-fold cross-validation procedure, showing that the electronic tongue allowed an average correct classification of 80% of the olive oils used for internal-validation. So, the electronic tongue can be seen as a taste sensor, allowing differentiating olive oils with different sensory intensities, and could be used as a preliminary, complementary and practical tool for panelists during olive oil sensory analysis.
Resumo:
Natural selection favors the survival and reproduction of organisms that are best adapted to their environment. Selection mechanism in evolutionary algorithms mimics this process, aiming to create environmental conditions in which artificial organisms could evolve solving the problem at hand. This paper proposes a new selection scheme for evolutionary multiobjective optimization. The similarity measure that defines the concept of the neighborhood is a key feature of the proposed selection. Contrary to commonly used approaches, usually defined on the basis of distances between either individuals or weight vectors, it is suggested to consider the similarity and neighborhood based on the angle between individuals in the objective space. The smaller the angle, the more similar individuals. This notion is exploited during the mating and environmental selections. The convergence is ensured by minimizing distances from individuals to a reference point, whereas the diversity is preserved by maximizing angles between neighboring individuals. Experimental results reveal a highly competitive performance and useful characteristics of the proposed selection. Its strong diversity preserving ability allows to produce a significantly better performance on some problems when compared with stat-of-the-art algorithms.
Resumo:
Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e Computadores
Resumo:
Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Eletrónica Médica)
Resumo:
The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching 90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.