959 resultados para Probabilistic graphical model
Resumo:
Automatic identification and extraction of bone contours from X-ray images is an essential first step task for further medical image analysis. In this paper we propose a 3D statistical model based framework for the proximal femur contour extraction from calibrated X-ray images. The automatic initialization is solved by an estimation of Bayesian network algorithm to fit a multiple component geometrical model to the X-ray data. The contour extraction is accomplished by a non-rigid 2D/3D registration between a 3D statistical model and the X-ray images, in which bone contours are extracted by a graphical model based Bayesian inference. Preliminary experiments on clinical data sets verified its validity
Resumo:
Accurate seasonal to interannual streamflow forecasts based on climate information are critical for optimal management and operation of water resources systems. Considering most water supply systems are multipurpose, operating these systems to meet increasing demand under the growing stresses of climate variability and climate change, population and economic growth, and environmental concerns could be very challenging. This study was to investigate improvement in water resources systems management through the use of seasonal climate forecasts. Hydrological persistence (streamflow and precipitation) and large-scale recurrent oceanic-atmospheric patterns such as the El Niño/Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO), the Atlantic Multidecadal Oscillation (AMO), the Pacific North American (PNA), and customized sea surface temperature (SST) indices were investigated for their potential to improve streamflow forecast accuracy and increase forecast lead-time in a river basin in central Texas. First, an ordinal polytomous logistic regression approach is proposed as a means of incorporating multiple predictor variables into a probabilistic forecast model. Forecast performance is assessed through a cross-validation procedure, using distributions-oriented metrics, and implications for decision making are discussed. Results indicate that, of the predictors evaluated, only hydrologic persistence and Pacific Ocean sea surface temperature patterns associated with ENSO and PDO provide forecasts which are statistically better than climatology. Secondly, a class of data mining techniques, known as tree-structured models, is investigated to address the nonlinear dynamics of climate teleconnections and screen promising probabilistic streamflow forecast models for river-reservoir systems. Results show that the tree-structured models can effectively capture the nonlinear features hidden in the data. Skill scores of probabilistic forecasts generated by both classification trees and logistic regression trees indicate that seasonal inflows throughout the system can be predicted with sufficient accuracy to improve water management, especially in the winter and spring seasons in central Texas. Lastly, a simplified two-stage stochastic economic-optimization model was proposed to investigate improvement in water use efficiency and the potential value of using seasonal forecasts, under the assumption of optimal decision making under uncertainty. Model results demonstrate that incorporating the probabilistic inflow forecasts into the optimization model can provide a significant improvement in seasonal water contract benefits over climatology, with lower average deficits (increased reliability) for a given average contract amount, or improved mean contract benefits for a given level of reliability compared to climatology. The results also illustrate the trade-off between the expected contract amount and reliability, i.e., larger contracts can be signed at greater risk.
Resumo:
This paper proposed an automated 3D lumbar intervertebral disc (IVD) segmentation strategy from MRI data. Starting from two user supplied landmarks, the geometrical parameters of all lumbar vertebral bodies and intervertebral discs are automatically extracted from a mid-sagittal slice using a graphical model based approach. After that, a three-dimensional (3D) variable-radius soft tube model of the lumbar spine column is built to guide the 3D disc segmentation. The disc segmentation is achieved as a multi-kernel diffeomorphic registration between a 3D template of the disc and the observed MRI data. Experiments on 15 patient data sets showed the robustness and the accuracy of the proposed algorithm.
Resumo:
This paper proposed an automated three-dimensional (3D) lumbar intervertebral disc (IVD) segmentation strategy from Magnetic Resonance Imaging (MRI) data. Starting from two user supplied landmarks, the geometrical parameters of all lumbar vertebral bodies and intervertebral discs are automatically extracted from a mid-sagittal slice using a graphical model based template matching approach. Based on the estimated two-dimensional (2D) geometrical parameters, a 3D variable-radius soft tube model of the lumbar spine column is built by model fitting to the 3D data volume. Taking the geometrical information from the 3D lumbar spine column as constraints and segmentation initialization, the disc segmentation is achieved by a multi-kernel diffeomorphic registration between a 3D template of the disc and the observed MRI data. Experiments on 15 patient data sets showed the robustness and the accuracy of the proposed algorithm.
Resumo:
Learning the structure of a graphical model from data is a common task in a wide range of practical applications. In this paper, we focus on Gaussian Bayesian networks, i.e., on continuous data and directed acyclic graphs with a joint probability density of all variables given by a Gaussian. We propose to work in an equivalence class search space, specifically using the k-greedy equivalence search algorithm. This, combined with regularization techniques to guide the structure search, can learn sparse networks close to the one that generated the data. We provide results on some synthetic networks and on modeling the gene network of the two biological pathways regulating the biosynthesis of isoprenoids for the Arabidopsis thaliana plant
Resumo:
Belief propagation (BP) is a technique for distributed inference in wireless networks and is often used even when the underlying graphical model contains cycles. In this paper, we propose a uniformly reweighted BP scheme that reduces the impact of cycles by weighting messages by a constant ?edge appearance probability? rho ? 1. We apply this algorithm to distributed binary hypothesis testing problems (e.g., distributed detection) in wireless networks with Markov random field models. We demonstrate that in the considered setting the proposed method outperforms standard BP, while maintaining similar complexity. We then show that the optimal ? can be approximated as a simple function of the average node degree, and can hence be computed in a distributed fashion through a consensus algorithm.
Resumo:
Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson’s Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson’s patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson’s disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables.
Resumo:
El aprendizaje automático y la cienciometría son las disciplinas científicas que se tratan en esta tesis. El aprendizaje automático trata sobre la construcción y el estudio de algoritmos que puedan aprender a partir de datos, mientras que la cienciometría se ocupa principalmente del análisis de la ciencia desde una perspectiva cuantitativa. Hoy en día, los avances en el aprendizaje automático proporcionan las herramientas matemáticas y estadísticas para trabajar correctamente con la gran cantidad de datos cienciométricos almacenados en bases de datos bibliográficas. En este contexto, el uso de nuevos métodos de aprendizaje automático en aplicaciones de cienciometría es el foco de atención de esta tesis doctoral. Esta tesis propone nuevas contribuciones en el aprendizaje automático que podrían arrojar luz sobre el área de la cienciometría. Estas contribuciones están divididas en tres partes: Varios modelos supervisados (in)sensibles al coste son aprendidos para predecir el éxito científico de los artículos y los investigadores. Los modelos sensibles al coste no están interesados en maximizar la precisión de clasificación, sino en la minimización del coste total esperado derivado de los errores ocasionados. En este contexto, los editores de revistas científicas podrían disponer de una herramienta capaz de predecir el número de citas de un artículo en el fututo antes de ser publicado, mientras que los comités de promoción podrían predecir el incremento anual del índice h de los investigadores en los primeros años. Estos modelos predictivos podrían allanar el camino hacia nuevos sistemas de evaluación. Varios modelos gráficos probabilísticos son aprendidos para explotar y descubrir nuevas relaciones entre el gran número de índices bibliométricos existentes. En este contexto, la comunidad científica podría medir cómo algunos índices influyen en otros en términos probabilísticos y realizar propagación de la evidencia e inferencia abductiva para responder a preguntas bibliométricas. Además, la comunidad científica podría descubrir qué índices bibliométricos tienen mayor poder predictivo. Este es un problema de regresión multi-respuesta en el que el papel de cada variable, predictiva o respuesta, es desconocido de antemano. Los índices resultantes podrían ser muy útiles para la predicción, es decir, cuando se conocen sus valores, el conocimiento de cualquier valor no proporciona información sobre la predicción de otros índices bibliométricos. Un estudio bibliométrico sobre la investigación española en informática ha sido realizado bajo la cultura de publicar o morir. Este estudio se basa en una metodología de análisis de clusters que caracteriza la actividad en la investigación en términos de productividad, visibilidad, calidad, prestigio y colaboración internacional. Este estudio también analiza los efectos de la colaboración en la productividad y la visibilidad bajo diferentes circunstancias. ABSTRACT Machine learning and scientometrics are the scientific disciplines which are covered in this dissertation. Machine learning deals with the construction and study of algorithms that can learn from data, whereas scientometrics is mainly concerned with the analysis of science from a quantitative perspective. Nowadays, advances in machine learning provide the mathematical and statistical tools for properly working with the vast amount of scientometrics data stored in bibliographic databases. In this context, the use of novel machine learning methods in scientometrics applications is the focus of attention of this dissertation. This dissertation proposes new machine learning contributions which would shed light on the scientometrics area. These contributions are divided in three parts: Several supervised cost-(in)sensitive models are learned to predict the scientific success of articles and researchers. Cost-sensitive models are not interested in maximizing classification accuracy, but in minimizing the expected total cost of the error derived from mistakes in the classification process. In this context, publishers of scientific journals could have a tool capable of predicting the citation count of an article in the future before it is published, whereas promotion committees could predict the annual increase of the h-index of researchers within the first few years. These predictive models would pave the way for new assessment systems. Several probabilistic graphical models are learned to exploit and discover new relationships among the vast number of existing bibliometric indices. In this context, scientific community could measure how some indices influence others in probabilistic terms and perform evidence propagation and abduction inference for answering bibliometric questions. Also, scientific community could uncover which bibliometric indices have a higher predictive power. This is a multi-output regression problem where the role of each variable, predictive or response, is unknown beforehand. The resulting indices could be very useful for prediction purposes, that is, when their index values are known, knowledge of any index value provides no information on the prediction of other bibliometric indices. A scientometric study of the Spanish computer science research is performed under the publish-or-perish culture. This study is based on a cluster analysis methodology which characterizes the research activity in terms of productivity, visibility, quality, prestige and international collaboration. This study also analyzes the effects of collaboration on productivity and visibility under different circumstances.
Resumo:
The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners.
Resumo:
A theoretical model was developed to investigate the relationships among subordinate-manager gender combinations, perceived leadership style, experienced frustration and optimism, organization-based self-esteem and organizational commitment. The model was tested within the context of a probabilistic structural model, a discrete Bayesian network, using cross-sectional data from a global pharmaceutical company. The Bayesian network allowed forward inference to assess the relative influence of gender combination and leadership style on the emotions, self-esteem and commitment consequence variables. Further, diagnostics from backward inference were used to assess the relative influence of variables antecedent to organizational commitment. The results showed that gender combination was independent of leadership style and had a direct impact on subordinates' levels of frustration and optimism. Female manager-female subordinate had the largest probability of optimism, while male manager teamed with a male subordinate had the largest probability of frustration. Furthermore, having a female manager teamed up with a male subordinate resulted in the lowest possibility of frustration. However, the findings show that the gender issue is not simply female managers versus male managers, but is concerned with the interaction of the subordinate-manager gender combination and leadership style in a nonlinear manner. (C) 2003 Elsevier Inc. All rights reserved.
Resumo:
Background: The Lescol Intervention Prevention Study (LIPS) was a multinational randomized controlled trial that showed a 47% reduction in the relative risk of cardiac death and a 22% reduction in major adverse cardiac events (MACEs) from the routine use of fluvastatin, compared with controls, in patients undergoing percutaneous coronary intervention (PCI, defined as angioplasty with or without stents). In this study, MACEs included cardiac death, nonfatal myocardial infarction, and subsequent PCI and coronary artery bypass graft. Diabetes was the greatest risk factor for MACEs. Objective: This study estimated the cost-effectiveness of fluvastatin when used for secondary prevention of MACEs after PCI in people with diabetes. Methods: A post hoc subgroup analysis of patients with diabetes from the LIPS was used to estimate the effectiveness of fluvastatin in reducing myocardial infarction, revascularization, and cardiac death. A probabilistic Markov model was developed using United Kingdom resource and cost data to estimate the additional costs and quality-adjusted life-years (QALYs) gained over 10 years from the perspective of the British National Health Service. The model contained 6 health states, and the transition probabilities were derived from the LIPS data. Crossover from fluvastatin to other lipid-lowering drugs, withdrawal from fluvastatin, and the use of lipid-lowering drugs in the control group were included. Results: In the subgroup of 202 patients with diabetes in the LIPS trial, 18 (15.0%) of 120 fluvastatin patients and 21 (25.6%) of 82 control participants were insulin dependent (P = NS). Compared with the control group, patients treated with fluvastatin can expect to gain an additional mean (SD) of 0.196 (0.139) QALY per patient over 10 years (P < 0.001) and will cost the health service an additional mean (SD) of 10 (E448) (P = NS) (mean [SD] US $16 [$689]). The additional cost per QALY gained was;(51 (US $78). The key determinants of cost-effectiveness included the probabilities of repeat interventions, cardiac death, the cost of fluvastatin, and the time horizon used for the evaluation. Conclusion: Fluvastatin was an economically efficient treatment to prevent MACEs in these patients with diabetes undergoing PCI.
Resumo:
Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.
Resumo:
New Approach’ Directives now govern the health and safety of most products whether destined for workplace or domestic use. These Directives have been enacted into UK law by various specific legislation principally relating to work equipment, machinery and consumer products. This research investigates whether the risk assessment approach used to ensure the safety of machinery may be applied to consumer products. Crucially, consumer products are subject to the Consumer Protection Act (CPA) 1987, where there is no direct reference to “assessing risk”. This contrasts with the law governing the safety of products used in the workplace, where risk assessment underpins the approach. New Approach Directives are supported by European harmonised standards, and in the case of machinery, further supported by the risk assessment standard, EN 1050. The system regulating consumer product safety is discussed, its key elements identified and a graphical model produced. This model incorporates such matters as conformity assessment, the system of regulation, near miss and accident reporting. A key finding of the research is that New Approach Directives have a common feature of specifying essential performance requirements that provide a hazard prompt-list that can form the basis for a risk assessment (the hazard identification stage). Drawing upon 272 prosecution cases, and with thirty examples examined in detail, this research provides evidence that despite the high degree of regulation, unsafe consumer products still find their way onto the market. The research presents a number of risk assessment tools to help Trading Standards Officers (TSOs) prioritise their work at the initial inspection stage when dealing with subsequent enforcement action.
Resumo:
This thesis proposes a novel graphical model for inference called the Affinity Network,which displays the closeness between pairs of variables and is an alternative to Bayesian Networks and Dependency Networks. The Affinity Network shares some similarities with Bayesian Networks and Dependency Networks but avoids their heuristic and stochastic graph construction algorithms by using a message passing scheme. A comparison with the above two instances of graphical models is given for sparse discrete and continuous medical data and data taken from the UCI machine learning repository. The experimental study reveals that the Affinity Network graphs tend to be more accurate on the basis of an exhaustive search with the small datasets. Moreover, the graph construction algorithm is faster than the other two methods with huge datasets. The Affinity Network is also applied to data produced by a synchronised system. A detailed analysis and numerical investigation into this dynamical system is provided and it is shown that the Affinity Network can be used to characterise its emergent behaviour even in the presence of noise.