981 resultados para SPANNING TREE PROBLEM
Resumo:
Une réconciliation entre un arbre de gènes et un arbre d’espèces décrit une histoire d’évolution des gènes homologues en termes de duplications et pertes de gènes. Pour inférer une réconciliation pour un arbre de gènes et un arbre d’espèces, la parcimonie est généralement utilisée selon le nombre de duplications et/ou de pertes. Les modèles de réconciliation sont basés sur des critères probabilistes ou combinatoires. Le premier article définit un modèle combinatoire simple et général où les duplications et les pertes sont clairement identifiées et la réconciliation parcimonieuse n’est pas la seule considérée. Une architecture de toutes les réconciliations est définie et des algorithmes efficaces (soit de dénombrement, de génération aléatoire et d’exploration) sont développés pour étudier les propriétés combinatoires de l’espace de toutes les réconciliations ou seulement les plus parcimonieuses. Basée sur le processus classique nommé naissance-et-mort, un algorithme qui calcule la vraisemblance d’une réconciliation a récemment été proposé. Le deuxième article utilise cet algorithme avec les outils combinatoires décrits ci-haut pour calculer efficacement (soit approximativement ou exactement) les probabilités postérieures des réconciliations localisées dans le sous-espace considéré. Basé sur des taux réalistes (selon un modèle probabiliste) de duplication et de perte et sur des données réelles/simulées de familles de champignons, nos résultats suggèrent que la masse probabiliste de toute l’espace des réconciliations est principalement localisée autour des réconciliations parcimonieuses. Dans un contexte d’approximation de la probabilité d’une réconciliation, notre approche est une alternative intéressante face aux méthodes MCMC et peut être meilleure qu’une approche sophistiquée, efficace et exacte pour calculer la probabilité d’une réconciliation donnée. Le problème nommé Gene Tree Parsimony (GTP) est d’inférer un arbre d’espèces qui minimise le nombre de duplications et/ou de pertes pour un ensemble d’arbres de gènes. Basé sur une approche qui explore tout l’espace des arbres d’espèces pour les génomes considérés et un calcul efficace des coûts de réconciliation, le troisième article décrit un algorithme de Branch-and-Bound pour résoudre de façon exacte le problème GTP. Lorsque le nombre de taxa est trop grand, notre algorithme peut facilement considérer des relations prédéfinies entre ensembles de taxa. Nous avons testé notre algorithme sur des familles de gènes de 29 eucaryotes.
Resumo:
The long-term variability of the Siberian High, the dominant Northern Hemisphere anticyclone during winter, is largely unknown. To investigate how this feature varied prior to the instrumental record, we present a reconstruction of a Dec-Feb Siberian High (SH) index based on Eurasian and North American tree rings. Spanning 1599-1980, it provides information on SH variability over the past four centuries. A decline in the instrumental SH index since the late 1970s, related to Eurasian warming, is the most striking feature over the past four hundred years. It is associated with a highly significant (p < 0.0001) step change in 1989. Significant similar to 3-4 yr spectral peaks in the reconstruction fall within the range of variability of the East Asian winter monsoon (which has also declined recently) and lend further support to proposed relationships between these largescale features of the climate system.
Resumo:
Various popular machine learning techniques, like support vector machines, are originally conceived for the solution of two-class (binary) classification problems. However, a large number of real problems present more than two classes. A common approach to generalize binary learning techniques to solve problems with more than two classes, also known as multiclass classification problems, consists of hierarchically decomposing the multiclass problem into multiple binary sub-problems, whose outputs are combined to define the predicted class. This strategy results in a tree of binary classifiers, where each internal node corresponds to a binary classifier distinguishing two groups of classes and the leaf nodes correspond to the problem classes. This paper investigates how measures of the separability between classes can be employed in the construction of binary-tree-based multiclass classifiers, adapting the decompositions performed to each particular multiclass problem. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Let M = (V, E, A) be a mixed graph with vertex set V, edge set E and arc set A. A cycle cover of M is a family C = {C(1), ... , C(k)} of cycles of M such that each edge/arc of M belongs to at least one cycle in C. The weight of C is Sigma(k)(i=1) vertical bar C(i)vertical bar. The minimum cycle cover problem is the following: given a strongly connected mixed graph M without bridges, find a cycle cover of M with weight as small as possible. The Chinese postman problem is: given a strongly connected mixed graph M, find a minimum length closed walk using all edges and arcs of M. These problems are NP-hard. We show that they can be solved in polynomial time if M has bounded tree-width. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Some dynamic properties for a light ray suffering specular reflections inside a periodically corrugated waveguide are studied. The dynamics of the model is described in terms of a two dimensional nonlinear area preserving map. We show that the phase space is mixed in the sense that there are KAM islands surrounded by a large chaotic sea that is confined by two invariant spanning curves. We have used a connection with the Standard Mapping near a transition from local to global chaos and found the position of these two invariant spanning curves limiting the size of the chaotic sea as function of the control parameter.
Resumo:
The authors report a massive attack by Pseudomyrmex ants on a human who touched a Triplaria - novice tree (Triplaris spp). The ants naturally live in these trees and their stings cause intense pain and discrete to moderate local inflammation. The problem is common in sonic Brazilian regions and can be prevented by identifying the trees.
Resumo:
In this paper a novel Branch and Bound (B&B) algorithm to solve the transmission expansion planning which is a non-convex mixed integer nonlinear programming problem (MINLP) is presented. Based on defining the options of the separating variables and makes a search in breadth, we call this algorithm a B&BML algorithm. The proposed algorithm is implemented in AMPL and an open source Ipopt solver is used to solve the nonlinear programming (NLP) problems of all candidates in the B&B tree. Strategies have been developed to address the problem of non-linearity and non-convexity of the search region. The proposed algorithm is applied to the problem of long-term transmission expansion planning modeled as an MINLP problem. The proposed algorithm has carried out on five commonly used test systems such as Garver 6-Bus, IEEE 24-Bus, 46-Bus South Brazilian test systems, Bolivian 57-Bus, and Colombian 93-Bus. Results show that the proposed methodology not only can find the best known solution but it also yields a large reduction between 24% to 77.6% in the number of NLP problems regarding to the size of the systems.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.
Resumo:
Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
We solve two inverse spectral problems for star graphs of Stieltjes strings with Dirichlet and Neumann boundary conditions, respectively, at a selected vertex called root. The root is either the central vertex or, in the more challenging problem, a pendant vertex of the star graph. At all other pendant vertices Dirichlet conditions are imposed; at the central vertex, at which a mass may be placed, continuity and Kirchhoff conditions are assumed. We derive conditions on two sets of real numbers to be the spectra of the above Dirichlet and Neumann problems. Our solution for the inverse problems is constructive: we establish algorithms to recover the mass distribution on the star graph (i.e. the point masses and lengths of subintervals between them) from these two spectra and from the lengths of the separate strings. If the root is a pendant vertex, the two spectra uniquely determine the parameters on the main string (i.e. the string incident to the root) if the length of the main string is known. The mass distribution on the other edges need not be unique; the reason for this is the non-uniqueness caused by the non-strict interlacing of the given data in the case when the root is the central vertex. Finally, we relate of our results to tree-patterned matrix inverse problems.
Resumo:
To improve our understanding of the Asian monsoon system, we developed a hydroclimate reconstruction in a marginal monsoon shoulder region for the period prior to the industrial era. Here, we present the first moisture sensitive tree-ring chronology, spanning 501 years for the Dieshan Mountain area, a boundary region of the Asian summer monsoon in the northeastern Tibetan Plateau. This reconstruction was derived from 101 cores of 68 old-growth Chinese pine (Pinus tabulaeformis) trees. We introduce a Hilbert–Huang Transform (HHT) based standardization method to develop the tree-ring chronology, which has the advantages of excluding non-climatic disturbances in individual tree-ring series. Based on the reliable portion of the chronology, we reconstructed the annual (prior July to current June) precipitation history since 1637 for the Dieshan Mountain area and were able to explain 41.3% of the variance. The extremely dry years in this reconstruction were also found in historical documents and are also associated with El Niño episodes. Dry periods were reconstructed for 1718–1725, 1766–1770 and 1920–1933, whereas 1782–1788 and 1979–1985 were wet periods. The spatial signatures of these events were supported by data from other marginal regions of the Asian summer monsoon. Over the past four centuries, out-of-phase relationships between hydroclimate variations in the Dieshan Mountain area and far western Mongolia were observed during the 1718–1725 and 1766–1770 dry periods and the 1979–1985 wet period.