904 resultados para Biclustering algorithms
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
Biclustering is simultaneous clustering of both rows and columns of a data matrix. A measure called Mean Squared Residue (MSR) is used to simultaneously evaluate the coherence of rows and columns within a submatrix. In this paper a novel algorithm is developed for biclustering gene expression data using the newly introduced concept of MSR difference threshold. In the first step high quality bicluster seeds are generated using K-Means clustering algorithm. Then more genes and conditions (node) are added to the bicluster. Before adding a node the MSR X of the bicluster is calculated. After adding the node again the MSR Y is calculated. The added node is deleted if Y minus X is greater than MSR difference threshold or if Y is greater than MSR threshold which depends on the dataset. The MSR difference threshold is different for gene list and condition list and it depends on the dataset also. Proper values should be identified through experimentation in order to obtain biclusters of high quality. The results obtained on bench mark dataset clearly indicate that this algorithm is better than many of the existing biclustering algorithms
Resumo:
Les simulations ont été implémentées avec le programme Java.
Resumo:
Increasing antibiotic resistance among uropathogenic Escherichia coli (UPEC) is driving interest in therapeutic targeting of nonconserved virulence factor (VF) genes. The ability to formulate efficacious combinations of antivirulence agents requires an improved understanding of how UPEC deploy these genes. To identify clinically relevant VF combinations, we applied contemporary network analysis and biclustering algorithms to VF profiles from a large, previously characterized inpatient clinical cohort. These mathematical approaches identified four stereotypical VF combinations with distinctive relationships to antibiotic resistance and patient sex that are independent of traditional phylogenetic grouping. Targeting resistance- or sex-associated VFs based upon these contemporary mathematical approaches may facilitate individualized anti-infective therapies and identify synergistic VF combinations in bacterial pathogens.
Biased Random-key Genetic Algorithms For The Winner Determination Problem In Combinatorial Auctions.
Resumo:
Abstract In this paper, we address the problem of picking a subset of bids in a general combinatorial auction so as to maximize the overall profit using the first-price model. This winner determination problem assumes that a single bidding round is held to determine both the winners and prices to be paid. We introduce six variants of biased random-key genetic algorithms for this problem. Three of them use a novel initialization technique that makes use of solutions of intermediate linear programming relaxations of an exact mixed integer-linear programming model as initial chromosomes of the population. An experimental evaluation compares the effectiveness of the proposed algorithms with the standard mixed linear integer programming formulation, a specialized exact algorithm, and the best-performing heuristics proposed for this problem. The proposed algorithms are competitive and offer strong results, mainly for large-scale auctions.
Resumo:
We propose and analyze two different Bayesian online algorithms for learning in discrete Hidden Markov Models and compare their performance with the already known Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of generalization we draw learning curves in simplified situations for these algorithms and compare their performances.
Resumo:
Voltage and current waveforms of a distribution or transmission power system are not pure sinusoids. There are distortions in these waveforms that can be represented as a combination of the fundamental frequency, harmonics and high frequency transients. This paper presents a novel approach to identifying harmonics in power system distorted waveforms. The proposed method is based on Genetic Algorithms, which is an optimization technique inspired by genetics and natural evolution. GOOAL, a specially designed intelligent algorithm for optimization problems, was successfully implemented and tested. Two kinds of representations concerning chromosomes are utilized: binary and real. The results show that the proposed method is more precise than the traditional Fourier Transform, especially considering the real representation of the chromosomes.
Resumo:
This paper presents a strategy for the solution of the WDM optical networks planning. Specifically, the problem of Routing and Wavelength Allocation (RWA) in order to minimize the amount of wavelengths used. In this case, the problem is known as the Min-RWA. Two meta-heuristics (Tabu Search and Simulated Annealing) are applied to take solutions of good quality and high performance. The key point is the degradation of the maximum load on the virtual links in favor of minimization of number of wavelengths used; the objective is to find a good compromise between the metrics of virtual topology (load in Gb/s) and of the physical topology (quantity of wavelengths). The simulations suggest good results when compared to some existing in the literature.
Resumo:
This technical note develops information filter and array algorithms for a linear minimum mean square error estimator of discrete-time Markovian jump linear systems. A numerical example for a two-mode Markovian jump linear system, to show the advantage of using array algorithms to filter this class of systems, is provided.
Resumo:
The continuous growth of peer-to-peer networks has made them responsible for a considerable portion of the current Internet traffic. For this reason, improvements in P2P network resources usage are of central importance. One effective approach for addressing this issue is the deployment of locality algorithms, which allow the system to optimize the peers` selection policy for different network situations and, thus, maximize performance. To date, several locality algorithms have been proposed for use in P2P networks. However, they usually adopt heterogeneous criteria for measuring the proximity between peers, which hinders a coherent comparison between the different solutions. In this paper, we develop a thoroughly review of popular locality algorithms, based on three main characteristics: the adopted network architecture, distance metric, and resulting peer selection algorithm. As result of this study, we propose a novel and generic taxonomy for locality algorithms in peer-to-peer networks, aiming to enable a better and more coherent evaluation of any individual locality algorithm.
Resumo:
In this paper a computational implementation of an evolutionary algorithm (EA) is shown in order to tackle the problem of reconfiguring radial distribution systems. The developed module considers power quality indices such as long duration interruptions and customer process disruptions due to voltage sags, by using the Monte Carlo simulation method. Power quality costs are modeled into the mathematical problem formulation, which are added to the cost of network losses. As for the EA codification proposed, a decimal representation is used. The EA operators, namely selection, recombination and mutation, which are considered for the reconfiguration algorithm, are herein analyzed. A number of selection procedures are analyzed, namely tournament, elitism and a mixed technique using both elitism and tournament. The recombination operator was developed by considering a chromosome structure representation that maps the network branches and system radiality, and another structure that takes into account the network topology and feasibility of network operation to exchange genetic material. The topologies regarding the initial population are randomly produced so as radial configurations are produced through the Prim and Kruskal algorithms that rapidly build minimum spanning trees. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
This paper presents a family of algorithms for approximate inference in credal networks (that is, models based on directed acyclic graphs and set-valued probabilities) that contain only binary variables. Such networks can represent incomplete or vague beliefs, lack of data, and disagreements among experts; they can also encode models based on belief functions and possibilistic measures. All algorithms for approximate inference in this paper rely on exact inferences in credal networks based on polytrees with binary variables, as these inferences have polynomial complexity. We are inspired by approximate algorithms for Bayesian networks; thus the Loopy 2U algorithm resembles Loopy Belief Propagation, while the Iterated Partial Evaluation and Structured Variational 2U algorithms are, respectively, based on Localized Partial Evaluation and variational techniques. (C) 2007 Elsevier Inc. All rights reserved.
Resumo:
The flowshop scheduling problem with blocking in-process is addressed in this paper. In this environment, there are no buffers between successive machines: therefore intermediate queues of jobs waiting in the system for their next operations are not allowed. Heuristic approaches are proposed to minimize the total tardiness criterion. A constructive heuristic that explores specific characteristics of the problem is presented. Moreover, a GRASP-based heuristic is proposed and Coupled with a path relinking strategy to search for better outcomes. Computational tests are presented and the comparisons made with an adaptation of the NEH algorithm and with a branch-and-bound algorithm indicate that the new approaches are promising. (c) 2007 Elsevier Ltd. All rights reserved.
Resumo:
When building genetic maps, it is necessary to choose from several marker ordering algorithms and criteria, and the choice is not always simple. In this study, we evaluate the efficiency of algorithms try (TRY), seriation (SER), rapid chain delineation (RCD), recombination counting and ordering (RECORD) and unidirectional growth (UG), as well as the criteria PARF (product of adjacent recombination fractions), SARF (sum of adjacent recombination fractions), SALOD (sum of adjacent LOD scores) and LHMC (likelihood through hidden Markov chains), used with the RIPPLE algorithm for error verification, in the construction of genetic linkage maps. A linkage map of a hypothetical diploid and monoecious plant species was simulated containing one linkage group and 21 markers with fixed distance of 3 cM between them. In all, 700 F(2) populations were randomly simulated with and 400 individuals with different combinations of dominant and co-dominant markers, as well as 10 and 20% of missing data. The simulations showed that, in the presence of co-dominant markers only, any combination of algorithm and criteria may be used, even for a reduced population size. In the case of a smaller proportion of dominant markers, any of the algorithms and criteria (except SALOD) investigated may be used. In the presence of high proportions of dominant markers and smaller samples (around 100), the probability of repulsion linkage increases between them and, in this case, use of the algorithms TRY and SER associated to RIPPLE with criterion LHMC would provide better results. Heredity (2009) 103, 494-502; doi:10.1038/hdy.2009.96; published online 29 July 2009