882 resultados para Large-Scale Optimization
Resumo:
MOTIVATION: The detection of positive selection is widely used to study gene and genome evolution, but its application remains limited by the high computational cost of existing implementations. We present a series of computational optimizations for more efficient estimation of the likelihood function on large-scale phylogenetic problems. We illustrate our approach using the branch-site model of codon evolution. RESULTS: We introduce novel optimization techniques that substantially outperform both CodeML from the PAML package and our previously optimized sequential version SlimCodeML. These techniques can also be applied to other likelihood-based phylogeny software. Our implementation scales well for large numbers of codons and/or species. It can therefore analyse substantially larger datasets than CodeML. We evaluated FastCodeML on different platforms and measured average sequential speedups of FastCodeML (single-threaded) versus CodeML of up to 5.8, average speedups of FastCodeML (multi-threaded) versus CodeML on a single node (shared memory) of up to 36.9 for 12 CPU cores, and average speedups of the distributed FastCodeML versus CodeML of up to 170.9 on eight nodes (96 CPU cores in total).Availability and implementation: ftp://ftp.vital-it.ch/tools/FastCodeML/. CONTACT: selectome@unil.ch or nicolas.salamin@unil.ch.
Resumo:
La survie des réseaux est un domaine d'étude technique très intéressant ainsi qu'une préoccupation critique dans la conception des réseaux. Compte tenu du fait que de plus en plus de données sont transportées à travers des réseaux de communication, une simple panne peut interrompre des millions d'utilisateurs et engendrer des millions de dollars de pertes de revenu. Les techniques de protection des réseaux consistent à fournir une capacité supplémentaire dans un réseau et à réacheminer les flux automatiquement autour de la panne en utilisant cette disponibilité de capacité. Cette thèse porte sur la conception de réseaux optiques intégrant des techniques de survie qui utilisent des schémas de protection basés sur les p-cycles. Plus précisément, les p-cycles de protection par chemin sont exploités dans le contexte de pannes sur les liens. Notre étude se concentre sur la mise en place de structures de protection par p-cycles, et ce, en supposant que les chemins d'opération pour l'ensemble des requêtes sont définis a priori. La majorité des travaux existants utilisent des heuristiques ou des méthodes de résolution ayant de la difficulté à résoudre des instances de grande taille. L'objectif de cette thèse est double. D'une part, nous proposons des modèles et des méthodes de résolution capables d'aborder des problèmes de plus grande taille que ceux déjà présentés dans la littérature. D'autre part, grâce aux nouveaux algorithmes, nous sommes en mesure de produire des solutions optimales ou quasi-optimales. Pour ce faire, nous nous appuyons sur la technique de génération de colonnes, celle-ci étant adéquate pour résoudre des problèmes de programmation linéaire de grande taille. Dans ce projet, la génération de colonnes est utilisée comme une façon intelligente d'énumérer implicitement des cycles prometteurs. Nous proposons d'abord des formulations pour le problème maître et le problème auxiliaire ainsi qu'un premier algorithme de génération de colonnes pour la conception de réseaux protegées par des p-cycles de la protection par chemin. L'algorithme obtient de meilleures solutions, dans un temps raisonnable, que celles obtenues par les méthodes existantes. Par la suite, une formulation plus compacte est proposée pour le problème auxiliaire. De plus, nous présentons une nouvelle méthode de décomposition hiérarchique qui apporte une grande amélioration de l'efficacité globale de l'algorithme. En ce qui concerne les solutions en nombres entiers, nous proposons deux méthodes heurisiques qui arrivent à trouver des bonnes solutions. Nous nous attardons aussi à une comparaison systématique entre les p-cycles et les schémas classiques de protection partagée. Nous effectuons donc une comparaison précise en utilisant des formulations unifiées et basées sur la génération de colonnes pour obtenir des résultats de bonne qualité. Par la suite, nous évaluons empiriquement les versions orientée et non-orientée des p-cycles pour la protection par lien ainsi que pour la protection par chemin, dans des scénarios de trafic asymétrique. Nous montrons quel est le coût de protection additionnel engendré lorsque des systèmes bidirectionnels sont employés dans de tels scénarios. Finalement, nous étudions une formulation de génération de colonnes pour la conception de réseaux avec des p-cycles en présence d'exigences de disponibilité et nous obtenons des premières bornes inférieures pour ce problème.
Resumo:
L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.
Resumo:
Sensible heat fluxes (QH) are determined using scintillometry and eddy covariance over a suburban area. Two large aperture scintillometers provide spatially integrated fluxes across path lengths of 2.8 km and 5.5 km over Swindon, UK. The shorter scintillometer path spans newly built residential areas and has an approximate source area of 2-4 km2, whilst the long path extends from the rural outskirts to the town centre and has a source area of around 5-10 km2. These large-scale heat fluxes are compared with local-scale eddy covariance measurements. Clear seasonal trends are revealed by the long duration of this dataset and variability in monthly QH is related to the meteorological conditions. At shorter time scales the response of QH to solar radiation often gives rise to close agreement between the measurements, but during times of rapidly changing cloud cover spatial differences in the net radiation (Q*) coincide with greater differences between heat fluxes. For clear days QH lags Q*, thus the ratio of QH to Q* increases throughout the day. In summer the observed energy partitioning is related to the vegetation fraction through use of a footprint model. The results demonstrate the value of scintillometry for integrating surface heterogeneity and offer improved understanding of the influence of anthropogenic materials on surface-atmosphere interactions.
Resumo:
Variational data assimilation is commonly used in environmental forecasting to estimate the current state of the system from a model forecast and observational data. The assimilation problem can be written simply in the form of a nonlinear least squares optimization problem. However the practical solution of the problem in large systems requires many careful choices to be made in the implementation. In this article we present the theory of variational data assimilation and then discuss in detail how it is implemented in practice. Current solutions and open questions are discussed.
Resumo:
Two fundamental processes usually arise in the production planning of many industries. The first one consists of deciding how many final products of each type have to be produced in each period of a planning horizon, the well-known lot sizing problem. The other process consists of cutting raw materials in stock in order to produce smaller parts used in the assembly of final products, the well-studied cutting stock problem. In this paper the decision variables of these two problems are dependent of each other in order to obtain a global optimum solution. Setups that are typically present in lot sizing problems are relaxed together with integer frequencies of cutting patterns in the cutting problem. Therefore, a large scale linear optimizations problem arises, which is exactly solved by a column generated technique. It is worth noting that this new combined problem still takes the trade-off between storage costs (for final products and the parts) and trim losses (in the cutting process). We present some sets of computational tests, analyzed over three different scenarios. These results show that, by combining the problems and using an exact method, it is possible to obtain significant gains when compared to the usual industrial practice, which solve them in sequence. (C) 2010 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
Resumo:
FERNANDES, Fabiano A. N. et al. Optimization of Osmotic Dehydration of Papaya of followed by air-drying. Food Research Internation, v. 39, p. 492-498, 2006.
Resumo:
We have investigated and extensively tested three families of non-convex optimization approaches for solving the transmission network expansion planning problem: simulated annealing (SA), genetic algorithms (GA), and tabu search algorithms (TS). The paper compares the main features of the three approaches and presents an integrated view of these methodologies. A hybrid approach is then proposed which presents performances which are far better than the ones obtained with any of these approaches individually. Results obtained in tests performed with large scale real-life networks are summarized.
Resumo:
Two fundamental processes usually arise in the production planning of many industries. The first one consists of deciding how many final products of each type have to be produced in each period of a planning horizon, the well-known lot sizing problem. The other process consists of cutting raw materials in stock in order to produce smaller parts used in the assembly of final products, the well-studied cutting stock problem. In this paper the decision variables of these two problems are dependent of each other in order to obtain a global optimum solution. Setups that are typically present in lot sizing problems are relaxed together with integer frequencies of cutting patterns in the cutting problem. Therefore, a large scale linear optimizations problem arises, which is exactly solved by a column generated technique. It is worth noting that this new combined problem still takes the trade-off between storage costs (for final products and the parts) and trim losses (in the cutting process). We present some sets of computational tests, analyzed over three different scenarios. These results show that, by combining the problems and using an exact method, it is possible to obtain significant gains when compared to the usual industrial practice, which solve them in sequence. (C) 2010 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
Resumo:
We have investigated and extensively tested three families of non-convex optimization approaches for solving the transmission network expansion planning problem: simulated annealing (SA), genetic algorithms (GA), and tabu search algorithms (TS). The paper compares the main features of the three approaches and presents an integrated view of these methodologies. A hybrid approach is then proposed which presents performances which are far better than the ones obtained with any of these approaches individually. Results obtained in tests performed with large scale real-life networks are summarized.
Resumo:
Wireless sensor network (WSN) Is a technology that can be used to monitor and actuate on environments in a non-intrusive way. The main difference from WSN and traditional sensor networks is the low dependability of WSN nodes. In this way, WSN solutions are based on a huge number of cheap tiny nodes that can present faults in hardware, software and wireless communication. The deployment of hundreds of nodes can overcome the low dependability of individual nodes, however this strategy introduces a lot of challenges regarding network management, real-time requirements and self-optimization. In this paper we present a simulated annealing approach that self-optimize large scale WSN. Simulation results indicate that our approach can achieve self-optimization characteristics in a dynamic WSN. © 2012 IEEE.
Resumo:
Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.
Resumo:
Background ArtinM is a D-mannose-specific lectin from Artocarpus integrifolia seeds that induces neutrophil migration and activation, degranulation of mast cells, acceleration of wound healing, induction of interleukin-12 production by macrophages and dendritic cells, and protective T helper 1 immune response against Leishmania major, Leishmania amazonensis and Paracoccidioides brasiliensis infections. Considering the important biological properties of ArtinM and its therapeutic applicability, this study was designed to produce high-level expression of active recombinant ArtinM (rArtinM) in Escherichia coli system. Results The ArtinM coding region was inserted in pET29a(+) vector and expressed in E. coli BL21(DE3)-Codon Plus-RP. The conditions for overexpression of soluble ArtinM were optimized testing different parameters: temperatures (20, 25, 30 or 37°C) and shaking speeds (130, 200 or 220 rpm) during induction, concentrations of the induction agent IPTG (0.01-4 mM) and periods of induction (1-19 h). BL21-CodonPlus(DE3)-RP cells induced under the optimized conditions (incubation at 20°C, at a shaking speed of 130 rpm, induction with 0.4 mM IPTG for 19 h) resulted in the accumulation of large amounts of soluble rArtinM. The culture provided 22.4 mg/L of rArtinM, which activity was determined by its one-step purification through affinity chromatography on immobilized D-mannose and glycoarray analysis. Gel filtration showed that rArtinM is monomeric, contrasting with the tetrameric form of the plant native protein (jArtinM). The analysis of intact rArtinM by mass spectrometry revealed a 16,099.5 Da molecular mass, and the peptide mass fingerprint and esi-cid-ms/ms of amino acid sequences of peptides from a tryptic digest covered 41% of the total ArtinM amino acid sequence. In addition, circular dichroism and fluorescence spectroscopy of rArtinM indicated that its global fold comprises β-sheet structure. Conclusions Overall, the optimized process to express rArtinM in E. coli provided high amounts of soluble, correctly folded and active recombinant protein, compatible with large scale production of the lectin.
Resumo:
García et al. present a class of column generation (CG) algorithms for nonlinear programs. Its main motivation from a theoretical viewpoint is that under some circumstances, finite convergence can be achieved, in much the same way as for the classic simplicial decomposition method; the main practical motivation is that within the class there are certain nonlinear column generation problems that can accelerate the convergence of a solution approach which generates a sequence of feasible points. This algorithm can, for example, accelerate simplicial decomposition schemes by making the subproblems nonlinear. This paper complements the theoretical study on the asymptotic and finite convergence of these methods given in [1] with an experimental study focused on their computational efficiency. Three types of numerical experiments are conducted. The first group of test problems has been designed to study the parameters involved in these methods. The second group has been designed to investigate the role and the computation of the prolongation of the generated columns to the relative boundary. The last one has been designed to carry out a more complete investigation of the difference in computational efficiency between linear and nonlinear column generation approaches. In order to carry out this investigation, we consider two types of test problems: the first one is the nonlinear, capacitated single-commodity network flow problem of which several large-scale instances with varied degrees of nonlinearity and total capacity are constructed and investigated, and the second one is a combined traffic assignment model
Resumo:
The operating theatres are the engine of the hospitals; proper management of the operating rooms and its staff represents a great challenge for managers and its results impact directly in the budget of the hospital. This work presents a MILP model for the efficient schedule of multiple surgeries in Operating Rooms (ORs) during a working day. This model considers multiple surgeons and ORs and different types of surgeries. Stochastic strategies are also implemented for taking into account the uncertain in surgery durations (pre-incision, incision, post-incision times). In addition, a heuristic-based methods and a MILP decomposition approach is proposed for solving large-scale ORs scheduling problems in computational efficient way. All these computer-aided strategies has been implemented in AIMMS, as an advanced modeling and optimization software, developing a user friendly solution tool for the operating room management under uncertainty.