864 resultados para Feature selection algorithm


Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper studies feature subset selection in classification using a multiobjective estimation of distribution algorithm. We consider six functions, namely area under ROC curve, sensitivity, specificity, precision, F1 measure and Brier score, for evaluation of feature subsets and as the objectives of the problem. One of the characteristics of these objective functions is the existence of noise in their values that should be appropriately handled during optimization. Our proposed algorithm consists of two major techniques which are specially designed for the feature subset selection problem. The first one is a solution ranking method based on interval values to handle the noise in the objectives of this problem. The second one is a model estimation method for learning a joint probabilistic model of objectives and variables which is used to generate new solutions and advance through the search space. To simplify model estimation, l1 regularized regression is used to select a subset of problem variables before model learning. The proposed algorithm is compared with a well-known ranking method for interval-valued objectives and a standard multiobjective genetic algorithm. Particularly, the effects of the two new techniques are experimentally investigated. The experimental results show that the proposed algorithm is able to obtain comparable or better performance on the tested datasets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The process of resources systems selection takes an important part in Distributed/Agile/Virtual Enterprises (D/A/V Es) integration. However, the resources systems selection is still a difficult matter to solve in a D/A/VE, as it is pointed out in this paper. Globally, we can say that the selection problem has been equated from different aspects, originating different kinds of models/algorithms to solve it. In order to assist the development of a web prototype tool (broker tool), intelligent and flexible, that integrates all the selection model activities and tools, and with the capacity to adequate to each D/A/V E project or instance (this is the major goal of our final project), we intend in this paper to show: a formulation of a kind of resources selection problem and the limitations of the algorithms proposed to solve it. We formulate a particular case of the problem as an integer programming, which is solved using simplex and branch and bound algorithms, and identify their performance limitations (in terms of processing time) based on simulation results. These limitations depend on the number of processing tasks and on the number of pre-selected resources per processing tasks, defining the domain of applicability of the algorithms for the problem studied. The limitations detected open the necessity of the application of other kind of algorithms (approximate solution algorithms) outside the domain of applicability founded for the algorithms simulated. However, for a broker tool it is very important the knowledge of algorithms limitations, in order to, based on problem features, develop and select the most suitable algorithm that guarantees a good performance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Natural selection favors the survival and reproduction of organisms that are best adapted to their environment. Selection mechanism in evolutionary algorithms mimics this process, aiming to create environmental conditions in which artificial organisms could evolve solving the problem at hand. This paper proposes a new selection scheme for evolutionary multiobjective optimization. The similarity measure that defines the concept of the neighborhood is a key feature of the proposed selection. Contrary to commonly used approaches, usually defined on the basis of distances between either individuals or weight vectors, it is suggested to consider the similarity and neighborhood based on the angle between individuals in the objective space. The smaller the angle, the more similar individuals. This notion is exploited during the mating and environmental selections. The convergence is ensured by minimizing distances from individuals to a reference point, whereas the diversity is preserved by maximizing angles between neighboring individuals. Experimental results reveal a highly competitive performance and useful characteristics of the proposed selection. Its strong diversity preserving ability allows to produce a significantly better performance on some problems when compared with stat-of-the-art algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Executive Summary The unifying theme of this thesis is the pursuit of a satisfactory ways to quantify the riskureward trade-off in financial economics. First in the context of a general asset pricing model, then across models and finally across country borders. The guiding principle in that pursuit was to seek innovative solutions by combining ideas from different fields in economics and broad scientific research. For example, in the first part of this thesis we sought a fruitful application of strong existence results in utility theory to topics in asset pricing. In the second part we implement an idea from the field of fuzzy set theory to the optimal portfolio selection problem, while the third part of this thesis is to the best of our knowledge, the first empirical application of some general results in asset pricing in incomplete markets to the important topic of measurement of financial integration. While the first two parts of this thesis effectively combine well-known ways to quantify the risk-reward trade-offs the third one can be viewed as an empirical verification of the usefulness of the so-called "good deal bounds" theory in designing risk-sensitive pricing bounds. Chapter 1 develops a discrete-time asset pricing model, based on a novel ordinally equivalent representation of recursive utility. To the best of our knowledge, we are the first to use a member of a novel class of recursive utility generators to construct a representative agent model to address some long-lasting issues in asset pricing. Applying strong representation results allows us to show that the model features countercyclical risk premia, for both consumption and financial risk, together with low and procyclical risk free rate. As the recursive utility used nests as a special case the well-known time-state separable utility, all results nest the corresponding ones from the standard model and thus shed light on its well-known shortcomings. The empirical investigation to support these theoretical results, however, showed that as long as one resorts to econometric methods based on approximating conditional moments with unconditional ones, it is not possible to distinguish the model we propose from the standard one. Chapter 2 is a join work with Sergei Sontchik. There we provide theoretical and empirical motivation for aggregation of performance measures. The main idea is that as it makes sense to apply several performance measures ex-post, it also makes sense to base optimal portfolio selection on ex-ante maximization of as many possible performance measures as desired. We thus offer a concrete algorithm for optimal portfolio selection via ex-ante optimization over different horizons of several risk-return trade-offs simultaneously. An empirical application of that algorithm, using seven popular performance measures, suggests that realized returns feature better distributional characteristics relative to those of realized returns from portfolio strategies optimal with respect to single performance measures. When comparing the distributions of realized returns we used two partial risk-reward orderings first and second order stochastic dominance. We first used the Kolmogorov Smirnov test to determine if the two distributions are indeed different, which combined with a visual inspection allowed us to demonstrate that the way we propose to aggregate performance measures leads to portfolio realized returns that first order stochastically dominate the ones that result from optimization only with respect to, for example, Treynor ratio and Jensen's alpha. We checked for second order stochastic dominance via point wise comparison of the so-called absolute Lorenz curve, or the sequence of expected shortfalls for a range of quantiles. As soon as the plot of the absolute Lorenz curve for the aggregated performance measures was above the one corresponding to each individual measure, we were tempted to conclude that the algorithm we propose leads to portfolio returns distribution that second order stochastically dominates virtually all performance measures considered. Chapter 3 proposes a measure of financial integration, based on recent advances in asset pricing in incomplete markets. Given a base market (a set of traded assets) and an index of another market, we propose to measure financial integration through time by the size of the spread between the pricing bounds of the market index, relative to the base market. The bigger the spread around country index A, viewed from market B, the less integrated markets A and B are. We investigate the presence of structural breaks in the size of the spread for EMU member country indices before and after the introduction of the Euro. We find evidence that both the level and the volatility of our financial integration measure increased after the introduction of the Euro. That counterintuitive result suggests the presence of an inherent weakness in the attempt to measure financial integration independently of economic fundamentals. Nevertheless, the results about the bounds on the risk free rate appear plausible from the view point of existing economic theory about the impact of integration on interest rates.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Liquid chromatography-mass spectrometry (LC-MS) datasets can be compared or combined following chromatographic alignment. Here we describe a simple solution to the specific problem of aligning one LC-MS dataset and one LC-MS/MS dataset, acquired on separate instruments from an enzymatic digest of a protein mixture, using feature extraction and a genetic algorithm. First, the LC-MS dataset is searched within a few ppm of the calculated theoretical masses of peptides confidently identified by LC-MS/MS. A piecewise linear function is then fitted to these matched peptides using a genetic algorithm with a fitness function that is insensitive to incorrect matches but sufficiently flexible to adapt to the discrete shifts common when comparing LC datasets. We demonstrate the utility of this method by aligning ion trap LC-MS/MS data with accurate LC-MS data from an FTICR mass spectrometer and show how hybrid datasets can improve peptide and protein identification by combining the speed of the ion trap with the mass accuracy of the FTICR, similar to using a hybrid ion trap-FTICR instrument. We also show that the high resolving power of FTICR can improve precision and linear dynamic range in quantitative proteomics. The alignment software, msalign, is freely available as open source.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We propose a simple and computationally efficient construction algorithm for two class linear-in-the-parameters classifiers. In order to optimize model generalization, a forward orthogonal selection (OFS) procedure is used for minimizing the leave-one-out (LOO) misclassification rate directly. An analytic formula and a set of forward recursive updating formula of the LOO misclassification rate are developed and applied in the proposed algorithm. Numerical examples are used to demonstrate that the proposed algorithm is an excellent alternative approach to construct sparse two class classifiers in terms of performance and computational efficiency.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Radial basis functions can be combined into a network structure that has several advantages over conventional neural network solutions. However, to operate effectively the number and positions of the basis function centres must be carefully selected. Although no rigorous algorithm exists for this purpose, several heuristic methods have been suggested. In this paper a new method is proposed in which radial basis function centres are selected by the mean-tracking clustering algorithm. The mean-tracking algorithm is compared with k means clustering and it is shown that it achieves significantly better results in terms of radial basis function performance. As well as being computationally simpler, the mean-tracking algorithm in general selects better centre positions, thus providing the radial basis functions with better modelling accuracy

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper is concerned with the use of a genetic algorithm to select financial ratios for corporate distress classification models. For this purpose, the fitness value associated to a set of ratios is made to reflect the requirements of maximizing the amount of information available for the model and minimizing the collinearity between the model inputs. A case study involving 60 failed and continuing British firms in the period 1997-2000 is used for illustration. The classification model based on ratios selected by the genetic algorithm compares favorably with a model employing ratios usually found in the financial distress literature.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A formalism recently introduced by Prugel-Bennett and Shapiro uses the methods of statistical mechanics to model the dynamics of genetic algorithms. To be of more general interest than the test cases they consider. In this paper, the technique is applied to the subset sum problem, which is a combinatorial optimization problem with a strongly non-linear energy (fitness) function and many local minima under single spin flip dynamics. It is a problem which exhibits an interesting dynamics, reminiscent of stabilizing selection in population biology. The dynamics are solved under certain simplifying assumptions and are reduced to a set of difference equations for a small number of relevant quantities. The quantities used are the population's cumulants, which describe its shape, and the mean correlation within the population, which measures the microscopic similarity of population members. Including the mean correlation allows a better description of the population than the cumulants alone would provide and represents a new and important extension of the technique. The formalism includes finite population effects and describes problems of realistic size. The theory is shown to agree closely to simulations of a real genetic algorithm and the mean best energy is accurately predicted.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Aircraft manufacturing industries are looking for solutions in order to increase their productivity. One of the solutions is to apply the metrology systems during the production and assembly processes. Metrology Process Model (MPM) (Maropoulos et al, 2007) has been introduced which emphasises metrology applications with assembly planning, manufacturing processes and product designing. Measurability analysis is part of the MPM and the aim of this analysis is to check the feasibility for measuring the designed large scale components. Measurability Analysis has been integrated in order to provide an efficient matching system. Metrology database is structured by developing the Metrology Classification Model. Furthermore, the feature-based selection model is also explained. By combining two classification models, a novel approach and selection processes for integrated measurability analysis system (MAS) are introduced and such integrated MAS could provide much more meaningful matching results for the operators. © Springer-Verlag Berlin Heidelberg 2010.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

During our earlier research, it was recognised that in order to be successful with an indirect genetic algorithm approach using a decoder, the decoder has to strike a balance between being an optimiser in its own right and finding feasible solutions. Previously this balance was achieved manually. Here we extend this by presenting an automated approach where the genetic algorithm itself, simultaneously to solving the problem, sets weights to balance the components out. Subsequently we were able to solve a complex and non-linear scheduling problem better than with a standard direct genetic algorithm implementation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

During our earlier research, it was recognised that in order to be successful with an indirect genetic algorithm approach using a decoder, the decoder has to strike a balance between being an optimiser in its own right and finding feasible solutions. Previously this balance was achieved manually. Here we extend this by presenting an automated approach where the genetic algorithm itself, simultaneously to solving the problem, sets weights to balance the components out. Subsequently we were able to solve a complex and non-linear scheduling problem better than with a standard direct genetic algorithm implementation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

During our earlier research, it was recognised that in order to be successful with an indirect genetic algorithm approach using a decoder, the decoder has to strike a balance between being an optimiser in its own right and finding feasible solutions. Previously this balance was achieved manually. Here we extend this by presenting an automated approach where the genetic algorithm itself, simultaneously to solving the problem, sets weights to balance the components out. Subsequently we were able to solve a complex and non-linear scheduling problem better than with a standard direct genetic algorithm implementation.