954 resultados para genetic algorithms.


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Assembly job shop scheduling problem (AJSP) is one of the most complicated combinatorial optimization problem that involves simultaneously scheduling the processing and assembly operations of complex structured products. The problem becomes even more complicated if a combination of two or more optimization criteria is considered. This thesis addresses an assembly job shop scheduling problem with multiple objectives. The objectives considered are to simultaneously minimizing makespan and total tardiness. In this thesis, two approaches viz., weighted approach and Pareto approach are used for solving the problem. However, it is quite difficult to achieve an optimal solution to this problem with traditional optimization approaches owing to the high computational complexity. Two metaheuristic techniques namely, genetic algorithm and tabu search are investigated in this thesis for solving the multiobjective assembly job shop scheduling problems. Three algorithms based on the two metaheuristic techniques for weighted approach and Pareto approach are proposed for the multi-objective assembly job shop scheduling problem (MOAJSP). A new pairing mechanism is developed for crossover operation in genetic algorithm which leads to improved solutions and faster convergence. The performances of the proposed algorithms are evaluated through a set of test problems and the results are reported. The results reveal that the proposed algorithms based on weighted approach are feasible and effective for solving MOAJSP instances according to the weight assigned to each objective criterion and the proposed algorithms based on Pareto approach are capable of producing a number of good Pareto optimal scheduling plans for MOAJSP instances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic programming is known to provide good solutions for many problems like the evolution of network protocols and distributed algorithms. In such cases it is most likely a hardwired module of a design framework that assists the engineer to optimize specific aspects of the system to be developed. It provides its results in a fixed format through an internal interface. In this paper we show how the utility of genetic programming can be increased remarkably by isolating it as a component and integrating it into the model-driven software development process. Our genetic programming framework produces XMI-encoded UML models that can easily be loaded into widely available modeling tools which in turn posses code generation as well as additional analysis and test capabilities. We use the evolution of a distributed election algorithm as an example to illustrate how genetic programming can be combined with model-driven development. This example clearly illustrates the advantages of our approach – the generation of source code in different programming languages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data mining means to summarize information from large amounts of raw data. It is one of the key technologies in many areas of economy, science, administration and the internet. In this report we introduce an approach for utilizing evolutionary algorithms to breed fuzzy classifier systems. This approach was exercised as part of a structured procedure by the students Achler, Göb and Voigtmann as contribution to the 2006 Data-Mining-Cup contest, yielding encouragingly positive results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this report, we discuss the application of global optimization and Evolutionary Computation to distributed systems. We therefore selected and classified many publications, giving an insight into the wide variety of optimization problems which arise in distributed systems. Some interesting approaches from different areas will be discussed in greater detail with the use of illustrative examples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a parallel genetic algorithm to the Steiner Problem in Networks. Several previous papers have proposed the adoption of GAs and others metaheuristics to solve the SPN demonstrating the validity of their approaches. This work differs from them for two main reasons: the dimension and the characteristics of the networks adopted in the experiments and the aim from which it has been originated. The reason that aimed this work was namely to build a comparison term for validating deterministic and computationally inexpensive algorithms which can be used in practical engineering applications, such as the multicast transmission in the Internet. On the other hand, the large dimensions of our sample networks require the adoption of a parallel implementation of the Steiner GA, which is able to deal with such large problem instances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We describe a model-data fusion (MDF) inter-comparison project (REFLEX), which compared various algorithms for estimating carbon (C) model parameters consistent with both measured carbon fluxes and states and a simple C model. Participants were provided with the model and with both synthetic net ecosystem exchange (NEE) of CO2 and leaf area index (LAI) data, generated from the model with added noise, and observed NEE and LAI data from two eddy covariance sites. Participants endeavoured to estimate model parameters and states consistent with the model for all cases over the two years for which data were provided, and generate predictions for one additional year without observations. Nine participants contributed results using Metropolis algorithms, Kalman filters and a genetic algorithm. For the synthetic data case, parameter estimates compared well with the true values. The results of the analyses indicated that parameters linked directly to gross primary production (GPP) and ecosystem respiration, such as those related to foliage allocation and turnover, or temperature sensitivity of heterotrophic respiration, were best constrained and characterised. Poorly estimated parameters were those related to the allocation to and turnover of fine root/wood pools. Estimates of confidence intervals varied among algorithms, but several algorithms successfully located the true values of annual fluxes from synthetic experiments within relatively narrow 90% confidence intervals, achieving >80% success rate and mean NEE confidence intervals <110 gC m−2 year−1 for the synthetic case. Annual C flux estimates generated by participants generally agreed with gap-filling approaches using half-hourly data. The estimation of ecosystem respiration and GPP through MDF agreed well with outputs from partitioning studies using half-hourly data. Confidence limits on annual NEE increased by an average of 88% in the prediction year compared to the previous year, when data were available. Confidence intervals on annual NEE increased by 30% when observed data were used instead of synthetic data, reflecting and quantifying the addition of model error. Finally, our analyses indicated that incorporating additional constraints, using data on C pools (wood, soil and fine roots) would help to reduce uncertainties for model parameters poorly served by eddy covariance data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In 2006 the Route load balancing algorithm was proposed and compared to other techniques aiming at optimizing the process allocation in grid environments. This algorithm schedules tasks of parallel applications considering computer neighborhoods (where the distance is defined by the network latency). Route presents good results for large environments, although there are cases where neighbors do not have an enough computational capacity nor communication system capable of serving the application. In those situations the Route migrates tasks until they stabilize in a grid area with enough resources. This migration may take long time what reduces the overall performance. In order to improve such stabilization time, this paper proposes RouteGA (Route with Genetic Algorithm support) which considers historical information on parallel application behavior and also the computer capacities and load to optimize the scheduling. This information is extracted by using monitors and summarized in a knowledge base used to quantify the occupation of tasks. Afterwards, such information is used to parameterize a genetic algorithm responsible for optimizing the task allocation. Results confirm that RouteGA outperforms the load balancing carried out by the original Route, which had previously outperformed others scheduling algorithms from literature.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This Thesis Work will concentrate on a very interesting problem, the Vehicle Routing Problem (VRP). In this problem, customers or cities have to be visited and packages have to be transported to each of them, starting from a basis point on the map. The goal is to solve the transportation problem, to be able to deliver the packages-on time for the customers,-enough package for each Customer,-using the available resources- and – of course - to be so effective as it is possible.Although this problem seems to be very easy to solve with a small number of cities or customers, it is not. In this problem the algorithm have to face with several constraints, for example opening hours, package delivery times, truck capacities, etc. This makes this problem a so called Multi Constraint Optimization Problem (MCOP). What’s more, this problem is intractable with current amount of computational power which is available for most of us. As the number of customers grow, the calculations to be done grows exponential fast, because all constraints have to be solved for each customers and it should not be forgotten that the goal is to find a solution, what is best enough, before the time for the calculation is up. This problem is introduced in the first chapter: form its basics, the Traveling Salesman Problem, using some theoretical and mathematical background it is shown, why is it so hard to optimize this problem, and although it is so hard, and there is no best algorithm known for huge number of customers, why is it a worth to deal with it. Just think about a huge transportation company with ten thousands of trucks, millions of customers: how much money could be saved if we would know the optimal path for all our packages.Although there is no best algorithm is known for this kind of optimization problems, we are trying to give an acceptable solution for it in the second and third chapter, where two algorithms are described: the Genetic Algorithm and the Simulated Annealing. Both of them are based on obtaining the processes of nature and material science. These algorithms will hardly ever be able to find the best solution for the problem, but they are able to give a very good solution in special cases within acceptable calculation time.In these chapters (2nd and 3rd) the Genetic Algorithm and Simulated Annealing is described in details, from their basis in the “real world” through their terminology and finally the basic implementation of them. The work will put a stress on the limits of these algorithms, their advantages and disadvantages, and also the comparison of them to each other.Finally, after all of these theories are shown, a simulation will be executed on an artificial environment of the VRP, with both Simulated Annealing and Genetic Algorithm. They will both solve the same problem in the same environment and are going to be compared to each other. The environment and the implementation are also described here, so as the test results obtained.Finally the possible improvements of these algorithms are discussed, and the work will try to answer the “big” question, “Which algorithm is better?”, if this question even exists.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quadratic assignment problems (QAPs) are commonly solved by heuristic methods, where the optimum is sought iteratively. Heuristics are known to provide good solutions but the quality of the solutions, i.e., the confidence interval of the solution is unknown. This paper uses statistical optimum estimation techniques (SOETs) to assess the quality of Genetic algorithm solutions for QAPs. We examine the functioning of different SOETs regarding biasness, coverage rate and length of interval, and then we compare the SOET lower bound with deterministic ones. The commonly used deterministic bounds are confined to only a few algorithms. We show that, the Jackknife estimators have better performance than Weibull estimators, and when the number of heuristic solutions is as large as 100, higher order JK-estimators perform better than lower order ones. Compared with the deterministic bounds, the SOET lower bound performs significantly better than most deterministic lower bounds and is comparable with the best deterministic ones. 

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The sensitivity to microenvironmental changes varies among animals and may be under genetic control. It is essential to take this element into account when aiming at breeding robust farm animals. Here, linear mixed models with genetic effects in the residual variance part of the model can be used. Such models have previously been fitted using EM and MCMC algorithms. Results: We propose the use of double hierarchical generalized linear models (DHGLM), where the squared residuals are assumed to be gamma distributed and the residual variance is fitted using a generalized linear model. The algorithm iterates between two sets of mixed model equations, one on the level of observations and one on the level of variances. The method was validated using simulations and also by re-analyzing a data set on pig litter size that was previously analyzed using a Bayesian approach. The pig litter size data contained 10,060 records from 4,149 sows. The DHGLM was implemented using the ASReml software and the algorithm converged within three minutes on a Linux server. The estimates were similar to those previously obtained using Bayesian methodology, especially the variance components in the residual variance part of the model. Conclusions: We have shown that variance components in the residual variance part of a linear mixed model can be estimated using a DHGLM approach. The method enables analyses of animal models with large numbers of observations. An important future development of the DHGLM methodology is to include the genetic correlation between the random effects in the mean and residual variance parts of the model as a parameter of the DHGLM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a new method for solving large scale p-median problem instances based on real data. We compare different approaches in terms of runtime, memory footprint and quality of solutions obtained. In order to test the different methods on real data, we introduce a new benchmark for the p-median problem based on real Swedish data. Because of the size of the problem addressed, up to 1938 candidate nodes, a number of algorithms, both exact and heuristic, are considered. We also propose an improved hybrid version of a genetic algorithm called impGA. Experiments show that impGA behaves as well as other methods for the standard set of medium-size problems taken from Beasley’s benchmark, but produces comparatively good results in terms of quality, runtime and memory footprint on our specific benchmark based on real Swedish data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bin planning (arrangements) is a key factor in the timber industry. Improper planning of the storage bins may lead to inefficient transportation of resources, which threaten the overall efficiency and thereby limit the profit margins of sawmills. To address this challenge, a simulation model has been developed. However, as numerous alternatives are available for arranging bins, simulating all possibilities will take an enormous amount of time and it is computationally infeasible. A discrete-event simulation model incorporating meta-heuristic algorithms has therefore been investigated in this study. Preliminary investigations indicate that the results achieved by GA based simulation model are promising and better than the other meta-heuristic algorithm. Further, a sensitivity analysis has been done on the GA based optimal arrangement which contributes to gaining insights and knowledge about the real system that ultimately leads to improved and enhanced efficiency in sawmill yards. It is expected that the results achieved in the work will support timber industries in making optimal decisions with respect to arrangement of storage bins in a sawmill yard.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper presents an extended genetic algorithm for solving the optimal transmission network expansion planning problem. Two main improvements have been introduced in the genetic algorithm: (a) initial population obtained by conventional optimisation based methods; (b) mutation approach inspired in the simulated annealing technique, the proposed method is general in the sense that it does not assume any particular property of the problem being solved, such as linearity or convexity. Excellent performance is reported in the test results section of the paper for a difficult large-scale real-life problem: a substantial reduction in investment costs has been obtained with regard to previous solutions obtained via conventional optimisation methods and simulated annealing algorithms; statistical comparison procedures have been employed in benchmarking different versions of the genetic algorithm and simulated annealing methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic variation within and among accessions of the genus Arachis representing sections Extranervosae, Caulorrhizae, Heteranthae, and Triseminatae was evaluated using RFLP and RAPD markers. RAPD markers revealed a higher level of genetic diversity than did RFLP markers, both within and among the species evaluated. Phenograms based on various band-matching algorithms revealed three major clusters of similarity among the sections evaluated. The first group included the species from section Extranervosae, the second group consisted of sections Triseminatae, Caulorrhizae, and Heteranthae, and the third group consisted of one accession of Arachis hypogaea, which had been included as a representative of section Arachis. The phenograms obtained from the RAPD and RFLP data were similar but not identical. Arachis pietrarellii, assayed only by RAPD, showed a high degree of genetic similarity with Arachis villosulicarpa. This observation supported the hypothesis that these two species are closely related. It was also shown that accession V 7786, previously considered to be Arachis sp. aff. pietrarellii, and assayed using both RFLPs and RAPDs, was possibly a new species from section Extranervosae, but very distinct from A. pietrarellii.