959 resultados para probabilistic graphical model


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Landslides are widely distributed along the main stream banks of the Three Gorges Reservoir area. Especially with the acceleration of the human economic activities in the recent 30 years, the occurrence of landslide hazards in the local area trends to be more serious. Because of the special geological, topographic and climatic conditions of the Three Gorges areas, many Paleo-landslides are found along the gentle slope terrain of the population relocation sites. Under the natural condition, the Paleo-landslides usually keep stable. The Paleo-landslides might revive while they are influenced under the strong rainfall, water storage and migration engineering disturbance. Therefore, the prediction and prevention of landslide hazards have become the important problem involving with the safety of migration engineering of the Three Gorges Reservoir area.The past research on the landslides of the Three Gorges area is mainly concentrated on the stability analysis of individual landslide, and importance was little attached to the knowledge on the geological environment background of the formation of regional landslides. So, the relationship between distribution and evolution of landslides and globe dynamic processes was very scarce in the past research. With further study, it becomes difficult to explain the reasons for the magnitude and frequency of major geological hazards in terms of single endogenic or exogenic processes. It is possible to resolve the causes of major landslides in the Three Gorges area through the systematic research of regional tectonics and river evolution history.In present paper, based on the view of coupling of earth's endogenic and exogenic processes, the author researches the temporal and spacial distribution and formation evolution of major landslides(Volume^lOOX 104m3) in the Three Gorges Reservoir area through integration of first-hand sources statistics, .geological evolution history, isotope dating and numerical simulation method etc. And considering the main formation factors of landslides (topography, geology and rainfall condition), the author discusses the occurrence probability and prediction model of rainfall induced landslides.The distribution and magnitude of Paleo-landslides in the Three Gorges area is mainly controlled by lithology, geological structure, bank slope shape and geostress field etc. The major Paleo-landslides are concentrated on the periods 2.7-15.0 X 104aB.R, which conrresponds to the warm and wettest Paleoclimate stages. In the same time, the Three Gorges area experiences with the quickest crust uplift phase since 15.0X 104aB.P. It is indicated that the dynamic factor of polyphase major Paleo-landslides is the coupling processes of neotectonic movement and Quaternary climate changes. Based on the numerical simulation results of the formation evolution of Baota landslide, the quick crust uplift makes the deep river incision and the geostress relief causes the rock body of banks flexible. Under the strong rainfall condition, the pore-water pressure resulted from rain penetration and high flood level can have the shear strength of weak structural plane decrease to a great degree. Therefore, the bank slope is easy to slide at the slope bottom where shear stress concentrates. Finally, it forms the composite draught-traction type landslide of dip stratified rocks.The susceptibility idea for the rainfall induced landslide is put forward in this paper and the degree of susceptibility is graded in terms of the topography and geological conditions of landslides. Base on the integration with geological environment factors and rainfall condition, the author gives a new probabilistic prediction model for rainfall induced landslides. As an example from Chongqing City of the Three Gorges area, selecting the 5 factors of topography, lithology combination, slope shape, rock structure and hydrogeology and 21 kinds of status as prediction variables, the susceptibility zonation is carried out by information methods. The prediction criterion of landslides is established by two factors: the maximum 24 hour rainfall and the antecedent effective precipitation of 15 days. The new prediction model is possible to actualize the real-time regional landslide prediction and improve accuracy of landslide forecast.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An increasing number of parameter estimation tasks involve the use of at least two information sources, one complete but limited, the other abundant but incomplete. Standard algorithms such as EM (or em) used in this context are unfortunately not stable in the sense that they can lead to a dramatic loss of accuracy with the inclusion of incomplete observations. We provide a more controlled solution to this problem through differential equations that govern the evolution of locally optimal solutions (fixed points) as a function of the source weighting. This approach permits us to explicitly identify any critical (bifurcation) points leading to choices unsupported by the available complete data. The approach readily applies to any graphical model in O(n^3) time where n is the number of parameters. We use the naive Bayes model to illustrate these ideas and demonstrate the effectiveness of our approach in the context of text classification problems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The goal of this work is to learn a parsimonious and informative representation for high-dimensional time series. Conceptually, this comprises two distinct yet tightly coupled tasks: learning a low-dimensional manifold and modeling the dynamical process. These two tasks have a complementary relationship as the temporal constraints provide valuable neighborhood information for dimensionality reduction and conversely, the low-dimensional space allows dynamics to be learnt efficiently. Solving these two tasks simultaneously allows important information to be exchanged mutually. If nonlinear models are required to capture the rich complexity of time series, then the learning problem becomes harder as the nonlinearities in both tasks are coupled. The proposed solution approximates the nonlinear manifold and dynamics using piecewise linear models. The interactions among the linear models are captured in a graphical model. By exploiting the model structure, efficient inference and learning algorithms are obtained without oversimplifying the model of the underlying dynamical process. Evaluation of the proposed framework with competing approaches is conducted in three sets of experiments: dimensionality reduction and reconstruction using synthetic time series, video synthesis using a dynamic texture database, and human motion synthesis, classification and tracking on a benchmark data set. In all experiments, the proposed approach provides superior performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The goal of this work is to learn a parsimonious and informative representation for high-dimensional time series. Conceptually, this comprises two distinct yet tightly coupled tasks: learning a low-dimensional manifold and modeling the dynamical process. These two tasks have a complementary relationship as the temporal constraints provide valuable neighborhood information for dimensionality reduction and conversely, the low-dimensional space allows dynamics to be learnt efficiently. Solving these two tasks simultaneously allows important information to be exchanged mutually. If nonlinear models are required to capture the rich complexity of time series, then the learning problem becomes harder as the nonlinearities in both tasks are coupled. The proposed solution approximates the nonlinear manifold and dynamics using piecewise linear models. The interactions among the linear models are captured in a graphical model. The model structure setup and parameter learning are done using a variational Bayesian approach, which enables automatic Bayesian model structure selection, hence solving the problem of over-fitting. By exploiting the model structure, efficient inference and learning algorithms are obtained without oversimplifying the model of the underlying dynamical process. Evaluation of the proposed framework with competing approaches is conducted in three sets of experiments: dimensionality reduction and reconstruction using synthetic time series, video synthesis using a dynamic texture database, and human motion synthesis, classification and tracking on a benchmark data set. In all experiments, the proposed approach provides superior performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds’ algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). A range of experiments show that we obtain models with better accuracy than TAN and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Credal nets are probabilistic graphical models which extend Bayesian nets to cope with sets of distributions. An algorithm for approximate credal network updating is presented. The problem in its general formulation is a multilinear optimization task, which can be linearized by an appropriate rule for fixing all the local models apart from those of a single variable. This simple idea can be iterated and quickly leads to accurate inferences. A transformation is also derived to reduce decision making in credal networks based on the maximality criterion to updating. The decision task is proved to have the same complexity of standard inference, being NPPP-complete for general credal nets and NP-complete for polytrees. Similar results are derived for the E-admissibility criterion. Numerical experiments confirm a good performance of the method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Credal networks relax the precise probability requirement of Bayesian networks, enabling a richer representation of uncertainty in the form of closed convex sets of probability measures. The increase in expressiveness comes at the expense of higher computational costs. In this paper, we present a new variable elimination algorithm for exactly computing posterior inferences in extensively specified credal networks, which is empirically shown to outperform a state-of-the-art algorithm. The algorithm is then turned into a provably good approximation scheme, that is, a procedure that for any input is guaranteed to return a solution not worse than the optimum by a given factor. Remarkably, we show that when the networks have bounded treewidth and bounded number of states per variable the approximation algorithm runs in time polynomial in the input size and in the inverse of the error factor, thus being the first known fully polynomial-time approximation scheme for inference in credal networks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hidden Markov models (HMMs) are widely used models for sequential data. As with other probabilistic graphical models, they require the specification of precise probability values, which can be too restrictive for some domains, especially when data are scarce or costly to acquire. We present a generalized version of HMMs, whose quantification can be done by sets of, instead of single, probability distributions. Our models have the ability to suspend judgment when there is not enough statistical evidence, and can serve as a sensitivity analysis tool for standard non-stationary HMMs. Efficient inference algorithms are developed to address standard HMM usage such as the computation of likelihoods and most probable explanations. Experiments with real data show that the use of imprecise probabilities leads to more reliable inferences without compromising efficiency.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Credal nets generalize Bayesian nets by relaxing the requirement of precision of probabilities. Credal nets are considerably more expressive than Bayesian nets, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal nets. The algorithm is based on an important representation result we prove for general credal nets: that any credal net can be equivalently reformulated as a credal net with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal net is updated by L2U, a loopy approximate algorithm for binary credal nets. Thus, we generalize L2U to non-binary credal nets, obtaining an accurate and scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences is evaluated by empirical tests.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

While current speech recognisers give acceptable performance in carefully controlled environments, their performance degrades rapidly when they are applied in more realistic situations. Generally, the environmental noise may be classified into two classes: the wide-band noise and narrow band noise. While the multi-band model has been shown to be capable of dealing with speech corrupted by narrow-band noise, it is ineffective for wide-band noise. In this paper, we suggest a combination of the frequency-filtering technique with the probabilistic union model in the multi-band approach. The new system has been tested on the TIDIGITS database, corrupted by white noise, noise collected from a railway station, and narrow-band noise, respectively. The results have shown that this approach is capable of dealing with noise of narrow-band or wide-band characteristics, assuming no knowledge about the noisy environment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Un facteur d’incertitude de 10 est utilisé par défaut lors de l’élaboration des valeurs toxicologiques de référence en santé environnementale, afin de tenir compte de la variabilité interindividuelle dans la population. La composante toxicocinétique de cette variabilité correspond à racine de 10, soit 3,16. Sa validité a auparavant été étudiée sur la base de données pharmaceutiques colligées auprès de diverses populations (adultes, enfants, aînés). Ainsi, il est possible de comparer la valeur de 3,16 au Facteur d’ajustement pour la cinétique humaine (FACH), qui constitue le rapport entre un centile élevé (ex. : 95e) de la distribution de la dose interne dans des sous-groupes présumés sensibles et sa médiane chez l’adulte, ou encore à l’intérieur d’une population générale. Toutefois, les données expérimentales humaines sur les polluants environnementaux sont rares. De plus, ces substances ont généralement des propriétés sensiblement différentes de celles des médicaments. Il est donc difficile de valider, pour les polluants, les estimations faites à partir des données sur les médicaments. Pour résoudre ce problème, la modélisation toxicocinétique à base physiologique (TCBP) a été utilisée pour simuler la variabilité interindividuelle des doses internes lors de l’exposition aux polluants. Cependant, les études réalisées à ce jour n’ont que peu permis d’évaluer l’impact des conditions d’exposition (c.-à-d. voie, durée, intensité), des propriétés physico/biochimiques des polluants, et des caractéristiques de la population exposée sur la valeur du FACH et donc la validité de la valeur par défaut de 3,16. Les travaux de la présente thèse visent à combler ces lacunes. À l’aide de simulations de Monte-Carlo, un modèle TCBP a d’abord été utilisé pour simuler la variabilité interindividuelle des doses internes (c.-à-d. chez les adultes, ainés, enfants, femmes enceintes) de contaminants de l’eau lors d’une exposition par voie orale, respiratoire, ou cutanée. Dans un deuxième temps, un tel modèle a été utilisé pour simuler cette variabilité lors de l’inhalation de contaminants à intensité et durée variables. Ensuite, un algorithme toxicocinétique à l’équilibre probabiliste a été utilisé pour estimer la variabilité interindividuelle des doses internes lors d’expositions chroniques à des contaminants hypothétiques aux propriétés physico/biochimiques variables. Ainsi, les propriétés de volatilité, de fraction métabolisée, de voie métabolique empruntée ainsi que de biodisponibilité orale ont fait l’objet d’analyses spécifiques. Finalement, l’impact du référent considéré et des caractéristiques démographiques sur la valeur du FACH lors de l’inhalation chronique a été évalué, en ayant recours également à un algorithme toxicocinétique à l’équilibre. Les distributions de doses internes générées dans les divers scénarios élaborés ont permis de calculer dans chaque cas le FACH selon l’approche décrite plus haut. Cette étude a mis en lumière les divers déterminants de la sensibilité toxicocinétique selon le sous-groupe et la mesure de dose interne considérée. Elle a permis de caractériser les déterminants du FACH et donc les cas où ce dernier dépasse la valeur par défaut de 3,16 (jusqu’à 28,3), observés presqu’uniquement chez les nouveau-nés et en fonction de la substance mère. Cette thèse contribue à améliorer les connaissances dans le domaine de l’analyse du risque toxicologique en caractérisant le FACH selon diverses considérations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Undirected graphical models are widely used in statistics, physics and machine vision. However Bayesian parameter estimation for undirected models is extremely challenging, since evaluation of the posterior typically involves the calculation of an intractable normalising constant. This problem has received much attention, but very little of this has focussed on the important practical case where the data consists of noisy or incomplete observations of the underlying hidden structure. This paper specifically addresses this problem, comparing two alternative methodologies. In the first of these approaches particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently explore the parameter space, combined with the exchange algorithm (Murray et al., 2006) for avoiding the calculation of the intractable normalising constant (a proof showing that this combination targets the correct distribution in found in a supplementary appendix online). This approach is compared with approximate Bayesian computation (Pritchard et al., 1999). Applications to estimating the parameters of Ising models and exponential random graphs from noisy data are presented. Each algorithm used in the paper targets an approximation to the true posterior due to the use of MCMC to simulate from the latent graphical model, in lieu of being able to do this exactly in general. The supplementary appendix also describes the nature of the resulting approximation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

What is the relationship between magnitude judgments relying on directly available characteristics versus probabilistic cues? Question frame was manipulated in a comparative judgment task previously assumed to involve inference across a probabilistic mental model (e.g., “which city is largest” – the “larger” question – versus “which city is smallest” – the “smaller” question). Participants identified either the largest or smallest city (Experiments 1a, 2) or the richest or poorest person (Experiment 1b) in a three-alternative forced choice (3-AFC) task (Experiment 1) or 2-AFC task (Experiment 2). Response times revealed an interaction between question frame and the number of options recognized. When asked the smaller question, response times were shorter when none of the options were recognized. The opposite pattern was found when asked the larger question: response time was shorter when all options were recognized. These task-stimuli congruity results in judgment under uncertainty are consistent with, and predicted by, theories of magnitude comparison which make use of deductive inferences from declarative knowledge.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A critical question in data mining is that can we always trust what discovered by a data mining system unconditionally? The answer is obviously not. If not, when can we trust the discovery then? What are the factors that affect the reliability of the discovery? How do they affect the reliability of the discovery? These are some interesting questions to be investigated.

In this paper we will firstly provide a definition and the measurements of reliability, and analyse the factors that affect the reliability. We then examine the impact of model complexity, weak links, varying sample sizes and the ability of different learners to the reliability of graphical model discovery. The experimental results reveal that (1) the larger sample size for the discovery, the higher reliability we will get; (2) the stronger a graph link is, the easier the discovery will be and thus the higher the reliability it can achieve; (3) the complexity of a graph also plays an important role in the discovery. The higher the complexity of a graph is, the more difficult to induce the graph and the lower reliability it would be.