882 resultados para GA (Genetic Algorithm)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic programming is known to provide good solutions for many problems like the evolution of network protocols and distributed algorithms. In such cases it is most likely a hardwired module of a design framework that assists the engineer to optimize specific aspects of the system to be developed. It provides its results in a fixed format through an internal interface. In this paper we show how the utility of genetic programming can be increased remarkably by isolating it as a component and integrating it into the model-driven software development process. Our genetic programming framework produces XMI-encoded UML models that can easily be loaded into widely available modeling tools which in turn posses code generation as well as additional analysis and test capabilities. We use the evolution of a distributed election algorithm as an example to illustrate how genetic programming can be combined with model-driven development. This example clearly illustrates the advantages of our approach – the generation of source code in different programming languages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Distributed systems are one of the most vital components of the economy. The most prominent example is probably the internet, a constituent element of our knowledge society. During the recent years, the number of novel network types has steadily increased. Amongst others, sensor networks, distributed systems composed of tiny computational devices with scarce resources, have emerged. The further development and heterogeneous connection of such systems imposes new requirements on the software development process. Mobile and wireless networks, for instance, have to organize themselves autonomously and must be able to react to changes in the environment and to failing nodes alike. Researching new approaches for the design of distributed algorithms may lead to methods with which these requirements can be met efficiently. In this thesis, one such method is developed, tested, and discussed in respect of its practical utility. Our new design approach for distributed algorithms is based on Genetic Programming, a member of the family of evolutionary algorithms. Evolutionary algorithms are metaheuristic optimization methods which copy principles from natural evolution. They use a population of solution candidates which they try to refine step by step in order to attain optimal values for predefined objective functions. The synthesis of an algorithm with our approach starts with an analysis step in which the wanted global behavior of the distributed system is specified. From this specification, objective functions are derived which steer a Genetic Programming process where the solution candidates are distributed programs. The objective functions rate how close these programs approximate the goal behavior in multiple randomized network simulations. The evolutionary process step by step selects the most promising solution candidates and modifies and combines them with mutation and crossover operators. This way, a description of the global behavior of a distributed system is translated automatically to programs which, if executed locally on the nodes of the system, exhibit this behavior. In our work, we test six different ways for representing distributed programs, comprising adaptations and extensions of well-known Genetic Programming methods (SGP, eSGP, and LGP), one bio-inspired approach (Fraglets), and two new program representations called Rule-based Genetic Programming (RBGP, eRBGP) designed by us. We breed programs in these representations for three well-known example problems in distributed systems: election algorithms, the distributed mutual exclusion at a critical section, and the distributed computation of the greatest common divisor of a set of numbers. Synthesizing distributed programs the evolutionary way does not necessarily lead to the envisaged results. In a detailed analysis, we discuss the problematic features which make this form of Genetic Programming particularly hard. The two Rule-based Genetic Programming approaches have been developed especially in order to mitigate these difficulties. In our experiments, at least one of them (eRBGP) turned out to be a very efficient approach and in most cases, was superior to the other representations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introducción: La enfermedad celiaca (EC) es una enfermedad autoinmune (EA) intestinal desencadenada por la ingesta de gluten. Por la falta de información de la presencia de EC en Latinoamérica (LA), nosotros investigamos la prevalencia de la enfermedad en esta región utilizando una revisión sistemática de la literatura y un meta-análisis. Métodos y resultados: Este trabajo fue realizado en dos fases: La primera, fue un estudio de corte transversal de 300 individuos Colombianos. La segunda, fue una revisión sistemática y una meta-regresión siguiendo las guías PRSIMA. Nuestros resultados ponen de manifiesto una falta de anti-transglutaminasa tisular (tTG) e IgA anti-endomisio (EMA) en la población Colombiana. En la revisión sistemática, 72 artículos cumplían con los criterios de selección, la prevalencia estimada de EC en LA fue de 0,46% a 0,64%, mientras que la prevalencia en familiares de primer grado fue de 5,5 a 5,6%, y en los pacientes con diabetes mellitus tipo 1 fue de 4,6% a 8,7% Conclusión: Nuestro estudio muestra que la prevalencia de EC en pacientes sanos de LA es similar a la notificada en la población europea.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The authors present a systolic design for a simple GA mechanism which provides high throughput and unidirectional pipelining by exploiting the inherent parallelism in the genetic operators. The design computes in O(N+G) time steps using O(N2) cells where N is the population size and G is the chromosome length. The area of the device is independent of the chromosome length and so can be easily scaled by replicating the arrays or by employing fine-grain migration. The array is generic in the sense that it does not rely on the fitness function and can be used as an accelerator for any GA application using uniform crossover between pairs of chromosomes. The design can also be used in hybrid systems as an add-on to complement existing designs and methods for fitness function acceleration and island-style population management

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Inferring the spatial expansion dynamics of invading species from molecular data is notoriously difficult due to the complexity of the processes involved. For these demographic scenarios, genetic data obtained from highly variable markers may be profitably combined with specific sampling schemes and information from other sources using a Bayesian approach. The geographic range of the introduced toad Bufo marinus is still expanding in eastern and northern Australia, in each case from isolates established around 1960. A large amount of demographic and historical information is available on both expansion areas. In each area, samples were collected along a transect representing populations of different ages and genotyped at 10 microsatellite loci. Five demographic models of expansion, differing in the dispersal pattern for migrants and founders and in the number of founders, were considered. Because the demographic history is complex, we used an approximate Bayesian method, based on a rejection-regression algorithm. to formally test the relative likelihoods of the five models of expansion and to infer demographic parameters. A stepwise migration-foundation model with founder events was statistically better supported than other four models in both expansion areas. Posterior distributions supported different dynamics of expansion in the studied areas. Populations in the eastern expansion area have a lower stable effective population size and have been founded by a smaller number of individuals than those in the northern expansion area. Once demographically stabilized, populations exchange a substantial number of effective migrants per generation in both expansion areas, and such exchanges are larger in northern than in eastern Australia. The effective number of migrants appears to be considerably lower than that of founders in both expansion areas. We found our inferences to be relatively robust to various assumptions on marker. demographic, and historical features. The method presented here is the only robust, model-based method available so far, which allows inferring complex population dynamics over a short time scale. It also provides the basis for investigating the interplay between population dynamics, drift, and selection in invasive species.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Synapsing Variable Length Crossover (SVLC) algorithm provides a biologically inspired method for performing meaningful crossover between variable length genomes. In addition to providing a rationale for variable length crossover it also provides a genotypic similarity metric for variable length genomes enabling standard niche formation techniques to be used with variable length genomes. Unlike other variable length crossover techniques which consider genomes to be rigid inflexible arrays and where some or all of the crossover points are randomly selected, the SVLC algorithm considers genomes to be flexible and chooses non-random crossover points based on the common parental sequence similarity. The SVLC Algorithm recurrently "glues" or synapses homogenous genetic sub-sequences together. This is done in such a way that common parental sequences are automatically preserved in the offspring with only the genetic differences being exchanged or removed, independent of the length of such differences. In a variable length test problem the SVLC algorithm is shown to outperform current variable length crossover techniques. The SVLC algorithm is also shown to work in a more realistic robot neural network controller evolution application.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recursive least-squares algorithm with a forgetting factor has been extensively applied and studied for the on-line parameter estimation of linear dynamic systems. This paper explores the use of genetic algorithms to improve the performance of the recursive least-squares algorithm in the parameter estimation of time-varying systems. Simulation results show that the hybrid recursive algorithm (GARLS), combining recursive least-squares with genetic algorithms, can achieve better results than the standard recursive least-squares algorithm using only a forgetting factor.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic algorithms (GAs) have been introduced into site layout planning as reported in a number of studies. In these studies, the objective functions were defined so as to employ the GAs in searching for the optimal site layout. However, few studies have been carried out to investigate the actual closeness of relationships between site facilities; it is these relationships that ultimately govern the site layout. This study has determined that the underlying factors of site layout planning for medium-size projects include work flow, personnel flow, safety and environment, and personal preferences. By finding the weightings on these factors and the corresponding closeness indices between each facility, a closeness relationship has been deduced. Two contemporary mathematical approaches - fuzzy logic theory and an entropy measure - were adopted in finding these results in order to minimize the uncertainty and vagueness of the collected data and improve the quality of the information. GAs were then applied to searching for the optimal site layout in a medium-size government project using the GeneHunter software. The objective function involved minimizing the total travel distance. An optimal layout was obtained within a short time. This reveals that the application of GA to site layout planning is highly promising and efficient.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A dosing algorithm including genetic (VKORC1 and CYP2C9 genotypes) and nongenetic factors (age, weight, therapeutic indication, and cotreatment with amiodarone or simvastatin) explained 51% of the variance in stable weekly warfarin doses in 390 patients attending an anticoagulant clinic in a Brazilian public hospital. The VKORC1 3673G>A genotype was the most important predictor of warfarin dose, with a partial R(2) value of 23.9%. Replacing the VKORC1 3673G>A genotype with VKORC1 diplotype did not increase the algorithm`s predictive power. We suggest that three other single-nucleotide polymorphisms (SNPs) (5808T>G, 6853G>C, and 9041G>A) that are in strong linkage disequilibrium (LD) with 3673G>A would be equally good predictors of the warfarin dose requirement. The algorithm`s predictive power was similar across the self-identified ""race/color"" subsets. ""Race/color"" was not associated with stable warfarin dose in the multiple regression model, although the required warfarin dose was significantly lower (P = 0.006) in white (29 +/- 13 mg/week, n = 196) than in black patients (35 +/- 15 mg/week, n = 76).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods: We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results: Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion: The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The sensitivity to microenvironmental changes varies among animals and may be under genetic control. It is essential to take this element into account when aiming at breeding robust farm animals. Here, linear mixed models with genetic effects in the residual variance part of the model can be used. Such models have previously been fitted using EM and MCMC algorithms. Results: We propose the use of double hierarchical generalized linear models (DHGLM), where the squared residuals are assumed to be gamma distributed and the residual variance is fitted using a generalized linear model. The algorithm iterates between two sets of mixed model equations, one on the level of observations and one on the level of variances. The method was validated using simulations and also by re-analyzing a data set on pig litter size that was previously analyzed using a Bayesian approach. The pig litter size data contained 10,060 records from 4,149 sows. The DHGLM was implemented using the ASReml software and the algorithm converged within three minutes on a Linux server. The estimates were similar to those previously obtained using Bayesian methodology, especially the variance components in the residual variance part of the model. Conclusions: We have shown that variance components in the residual variance part of a linear mixed model can be estimated using a DHGLM approach. The method enables analyses of animal models with large numbers of observations. An important future development of the DHGLM methodology is to include the genetic correlation between the random effects in the mean and residual variance parts of the model as a parameter of the DHGLM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, genetic algorithms concepts along with a rotamer library for proteins side chains and implicit solvation potential are used to optimize the tertiary structure of peptides. We starting from the known PDB structure of its backbone which is kept fixed while the side chains allowed adopting the conformations present in the rotamer library. It was used rotamer library independent of backbone and a implicit solvation potential. The structure of Mastoporan-X was predicted using several force fields with a growing complexity; we started it with a field where the only present interaction was Lennard-Jones. We added the Coulombian term and we considered the solvation effects through a term proportional to the solvent accessible area. This paper present good and interesting results obtained using the potential with solvation term and rotamer library. Hence, the algorithm (called YODA) presented here can be a good tool to the prediction problem. (c) 2007 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Milk, fat, and protein yields of Holstein cows from the States of New York and California in the United States were used to estimate (co)variances among yields in the first three lactations, using an animal model and a derivative-free restricted maximum likelihood (REML) algorithm, and to verify if yields in different lactations are the same trait. The data were split in 20 samples, 10 from each state, with means of 5463 and 5543 cows per sample from California and New York. Mean heritability estimates for milk, fat, and protein yields for California data were, respectively, 0.34, 0.35, and 0.40 for first; 0.31, 0.33, and 0.39 for second; and 0.28, 0.31, and 0.37 for third lactations. For New York data, estimates were 0.35, 0.40, and 0.34 for first; 0.34, 0.44, and 0.38 for second; and 0.32, 0.43, and 0.38 for third lactations. Means of estimates of genetic correlations between first and second, first and third, and second and third lactations for California data were 0.86, 0.77, and 0.96 for milk; 0.89, 0.84, and 0.97 for fat; and 0.90, 0.84, and 0.97 for protein yields. Mean estimates for New York data were 0.87, 0.81, and 0.97 for milk; 0.91, 0.86, and 0.98 for fat; and 0.88, 0.82, and 0.98 for protein yields. Environmental correlations varied from 0.30 to 0.50 and were larger between second and third lactations. Phenotypic correlations were similar for both states and varied from 0.52 to 0.66 for milk, fat and protein yields. These estimates are consistent with previous estimates obtained with animal models. Yields in different lactations are not statistically the same trait but for selection programs such yields can be modelled as the same trait because of the high genetic correlations.