77 resultados para MAP-GERMS
Resumo:
This work examines the conformational ensemble involved in β-hairpin folding by means of advanced molecular dynamics simulations and dimensionality reduction. A fully atomistic description of the protein and the surrounding solvent molecules is used, and this complex energy landscape is sampled by means of parallel tempering metadynamics simulations. The ensemble of configurations explored is analyzed using the recently proposed sketch-map algorithm. Further simulations allow us to probe how mutations affect the structures adopted by this protein. We find that many of the configurations adopted by a mutant are the same as those adopted by the wild-type protein. Furthermore, certain mutations destabilize secondary-structure-containing configurations by preventing the formation of hydrogen bonds or by promoting the formation of new intramolecular contacts. Our analysis demonstrates that machine-learning techniques can be used to study the energy landscapes of complex molecules and that the visualizations that are generated in this way provide a natural basis for examining how the stabilities of particular configurations of the molecule are affected by factors such as temperature or structural mutations.
Resumo:
We study the sensitivity of a MAP configuration of a discrete probabilistic graphical model with respect to perturbations of its parameters. These perturbations are global, in the sense that simultaneous perturbations of all the parameters (or any chosen subset of them) are allowed. Our main contribution is an exact algorithm that can check whether the MAP configuration is robust with respect to given perturbations. Its complexity is essentially the same as that of obtaining the MAP configuration itself, so it can be promptly used with minimal effort. We use our algorithm to identify the largest global perturbation that does not induce a change in the MAP configuration, and we successfully apply this robustness measure in two practical scenarios: the prediction of facial action units with posed images and the classification of multiple real public data sets. A strong correlation between the proposed robustness measure and accuracy is verified in both scenarios.
Resumo:
This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. It is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure), which extends previous complexity results. Furthermore, a Fully Polynomial Time Approximation Scheme for MAP in networks with bounded treewidth and bounded number of states per variable is developed. Approximation schemes were thought to be impossible, but here it is shown otherwise under the assumptions just mentioned, which are adopted in most applications.
Resumo:
This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure). Such proofs extend previous complexity results for the problem. Inapproximability results are also derived in the case of trees if the number of states per variable is not bounded. Although the problem is shown to be hard and inapproximable even in very simple scenarios, a new exact algorithm is described that is empirically fast in networks of bounded treewidth and bounded number of states per variable. The same algorithm is used as basis of a Fully Polynomial Time Approximation Scheme for MAP under such assumptions. Approximation schemes were generally thought to be impossible for this problem, but we show otherwise for classes of networks that are important in practice. The algorithms are extensively tested using some well-known networks as well as random generated cases to show their effectiveness.
Resumo:
This paper strengthens the NP-hardness result for the (partial) maximum a posteriori (MAP) problem in Bayesian networks with topology of trees (every variable has at most one parent) and variable cardinality at most three. MAP is the problem of querying the most probable state configuration of some (not necessarily all) of the network variables given evidence. It is demonstrated that the problem remains hard even in such simplistic networks.
Resumo:
This paper presents a new anytime algorithm for the marginal MAP problem in graphical models of bounded treewidth. We show asymptotic convergence and theoretical error bounds for any fixed step. Experiments show that it compares well to a state-of-the-art systematic search algorithm.
Resumo:
The X-linked lymphoproliferative syndrome (XLP) is an inherited immuno-deficiency to Epstein-Barr virus infection that has been mapped to chromosome Xq25. Molecular analysis of XLP patients from ten different families identified a small interstitial constitutional deletion in 1 patient (XLP-D). This deletion, initially defined by a single marker, DF83, known to map to interval Xq24-q26.1, is nested within a previously reported and much larger deletion in another XLP patient (XLP-739). A cosmid minilibrary was constructed from a single mega-YAC and used to establish a contig encompassing the whole XLP-D deletion and a portion of the XLP-739 deletion. Based on this contig, the size of the XLP-D deletion can be estimated at 130 kb. The identification of this minimal deletion, within which at least a portion of the XLP gene is likely to reside, should greatly facilitate efforts in isolating the gene.
Resumo:
This paper describes an investigation of various shroud bleed slot configurations of a centrifugal compressor using CFD with a manual multi-block structured grid generation method. The compressor under investigation is used in a turbocharger application for a heavy duty diesel engine of approximately 400hp. The baseline numerical model has been developed and validated against experimental performance measurements. The influence of the bleed slot flow field on a range of operating conditions between surge and choke has been analysed in detail. The impact of the returning bleed flow on the incidence at the impeller blade leading edge due to its mixing with the main through-flow has also been studied. From the baseline geometry, a number of modifications to the bleed slot width have been proposed, and a detailed comparison of the flow characteristics performed. The impact of slot variations on the inlet incidence angle has been investigated, highlighting the improvement in surge and choked flow capability. Along with this, the influence of the bleed slot on stabilizing the blade passage flow by the suction of the tip and over-tip vortex flow by the slot has been considered near surge.
Resumo:
As data analytics are growing in importance they are also quickly becoming one of the dominant application domains that require parallel processing. This paper investigates the applicability of OpenMP, the dominant shared-memory parallel programming model in high-performance computing, to the domain of data analytics. We contrast the performance and programmability of key data analytics benchmarks against Phoenix++, a state-of-the-art shared memory map/reduce programming system. Our study shows that OpenMP outperforms the Phoenix++ system by a large margin for several benchmarks. In other cases, however, the programming model is lacking support for this application domain.
Resumo:
Single component geochemical maps are the most basic representation of spatial elemental distributions and commonly used in environmental and exploration geochemistry. However, the compositional nature of geochemical data imposes several limitations on how the data should be presented. The problems relate to the constant sum problem (closure), and the inherently multivariate relative information conveyed by compositional data. Well known is, for instance, the tendency of all heavy metals to show lower values in soils with significant contributions of diluting elements (e.g., the quartz dilution effect); or the contrary effect, apparent enrichment in many elements due to removal of potassium during weathering. The validity of classical single component maps is thus investigated, and reasonable alternatives that honour the compositional character of geochemical concentrations are presented. The first recommended such method relies on knowledge-driven log-ratios, chosen to highlight certain geochemical relations or to filter known artefacts (e.g. dilution with SiO2 or volatiles). This is similar to the classical normalisation approach to a single element. The second approach uses the (so called) log-contrasts, that employ suitable statistical methods (such as classification techniques, regression analysis, principal component analysis, clustering of variables, etc.) to extract potentially interesting geochemical summaries. The caution from this work is that if a compositional approach is not used, it becomes difficult to guarantee that any identified pattern, trend or anomaly is not an artefact of the constant sum constraint. In summary the authors recommend a chain of enquiry that involves searching for the appropriate statistical method that can answer the required geological or geochemical question whilst maintaining the integrity of the compositional nature of the data. The required log-ratio transformations should be applied followed by the chosen statistical method. Interpreting the results may require a closer working relationship between statisticians, data analysts and geochemists.
Resumo:
The environmental quality of land is often assessed by the calculation of threshold values which aim to differentiate between concentrations of elements based on whether the soils are in residential or industrial sites. In Europe, for example, soil guideline values exist for agricultural and grazing land. A threshold is often set to differentiate between concentrations of the element that naturally occur in the soil and concentrations that result from diffuse anthropogenic sources. Regional geochemistry and, in particular, single component geochemical maps are increasingly being used to determine these baseline environmental assessments. The key question raised in this paper is whether the geochemical map can provide an accurate interpretation on its own. Implicit is the thought that single component geochemical maps represent absolute abundances. However,because of the compositional (closed) nature of the data univariate geochemical maps cannot be compared directly with one another.. As a result, any interpretation based on them is vulnerable to spurious correlation problems. What does this mean for soil geochemistry mapping, baseline quality documentation, soil resource assessment or risk evaluation? Despite the limitation of relative abundances, individual raw geochemical maps are deemed fundamental to several applications of geochemical maps including environmental assessments. However, element toxicity is related to its bioavailable concentration, which is lowered if its source is mixed with another source. Elements interact, for example under reducing conditions with iron oxides, its solid state is lost and arsenic becomes soluble and mobile. Both of these matters may be more adequately dealt with if a single component map is not interpreted in isolation to determine baseline and threshold assessments. A range of alternative compositionally compliant representations based on log-ratio and log-contrast approaches are explored to supplement the classical single component maps for environmental assessment. Case study examples are shown based on the Tellus soil geochemical dataset, covering Northern Ireland and the results of in vitro oral bioaccessibility testing carried out on a sub-set of archived Tellus Survey shallow soils following the Unified BARGE (Bioaccessibility Research Group of Europe).
Resumo:
We study the computational complexity of finding maximum a posteriori configurations in Bayesian networks whose probabilities are specified by logical formulas. This approach leads to a fine grained study in which local information such as context-sensitive independence and determinism can be considered. It also allows us to characterize more precisely the jump from tractability to NP-hardness and beyond, and to consider the complexity introduced by evidence alone.