16 resultados para Estimateur de Bayes
Resumo:
Aims/hypothesis: Diabetic nephropathy is a major diabetic complication, and diabetes is the leading cause of end-stage renal disease (ESRD). Family studies suggest a hereditary component for diabetic nephropathy. However, only a few genes have been associated with diabetic nephropathy or ESRD in diabetic patients. Our aim was to detect novel genetic variants associated with diabetic nephropathy and ESRD. Methods: We exploited a novel algorithm, ‘Bag of Naive Bayes’, whose marker selection strategy is complementary to that of conventional genome-wide association models based on univariate association tests. The analysis was performed on a genome-wide association study of 3,464 patients with type 1 diabetes from the Finnish Diabetic Nephropathy (FinnDiane) Study and subsequently replicated with 4,263 type 1 diabetes patients from the Steno Diabetes Centre, the All Ireland-Warren 3-Genetics of Kidneys in Diabetes UK collection (UK–Republic of Ireland) and the Genetics of Kidneys in Diabetes US Study (GoKinD US). Results: Five genetic loci (WNT4/ZBTB40-rs12137135, RGMA/MCTP2-rs17709344, MAPRE1P2-rs1670754, SEMA6D/SLC24A5-rs12917114 and SIK1-rs2838302) were associated with ESRD in the FinnDiane study. An association between ESRD and rs17709344, tagging the previously identified rs12437854 and located between the RGMA and MCTP2 genes, was replicated in independent case–control cohorts. rs12917114 near SEMA6D was associated with ESRD in the replication cohorts under the genotypic model (p < 0.05), and rs12137135 upstream of WNT4 was associated with ESRD in Steno. Conclusions/interpretation: This study supports the previously identified findings on the RGMA/MCTP2 region and suggests novel susceptibility loci for ESRD. This highlights the importance of applying complementary statistical methods to detect novel genetic variants in diabetic nephropathy and, in general, in complex diseases.
Resumo:
The number of elderly patients requiring hospitalisation in Europe is rising. With a greater proportion of elderly people in the population comes a greater demand for health services and, in particular, hospital care. Thus, with a growing number of elderly patients requiring hospitalisation competing with non-elderly patients for a fixed (and in some cases, decreasing) number of hospital beds, this results in much longer waiting times for patients, often with a less satisfactory hospital experience. However, if a better understanding of the recurring nature of elderly patient movements between the community and hospital can be developed, then it may be possible for alternative provisions of care in the community to be put in place and thus prevent readmission to hospital. The research in this paper aims to model the multiple patient transitions between hospital and community by utilising a mixture of conditional Coxian phase-type distributions that incorporates Bayes' theorem. For the purpose of demonstration, the results of a simulation study are presented and the model is applied to hospital readmission data from the Lombardy region of Italy.
Resumo:
PURPOSE
To investigate changes in gene expression during aging of the retina in the mouse.
METHODS
Total RNA was extracted from the neuroretina of young (3-month-old) and old (20-month-old) mice and processed for microarray analysis. Age-related, differentially expressed genes were assessed by the empiric Bayes shrinkagemoderated t-statistics method. Statistical significance was based on dual criteria of a ratio of change in gene expression >2 and a P < 0.01. Differential expression in 11 selected genes was further verified by real-time PCR. Functional pathways involved in retinal ageing were analyzed by an online software package (DAVID-2008) in differentially expressed gene lists. Age-related changes in differential expression in the identified retinal molecular pathways were further confirmed by immunohistochemical staining of retinal flat mounts and retinal cryosections.
RESULTS
With ageing of the retina, 298 genes were upregulated and 137 genes were downregulated. Functional annotation showed that genes linked to immune responses (Ir genes) and to tissue stress/injury responses (TS/I genes) were most likely to be modified by ageing. The Ir genes affected included those regulating leukocyte activation, chemotaxis, endocytosis, complement activation, phagocytosis, and myeloid cell differentiation, most of which were upregulated, with only a few downregulated. Increased microglial and complement activation in the aging retina was further confirmed by confocal microscopy of retinal tissues. The most strongly upregulated gene was the calcitonin receptor (Calcr; >40-fold in old versus young mice).
CONCLUSIONS
The results suggest that retinal ageing is accompanied by activation of gene sets, which are involved in local inflammatory responses. A modified form of low-grade chronic inflammation (para-inflammation) characterizes these aging changes and involves mainly the innate immune system. The marked upregulation of Calcr in ageing mice most likely reflects this chronic inflammatory/stress response, since calcitonin is a known systemic biomarker of inflammation/sepsis. © Association for Research in Vision and Ophthalmology.
Resumo:
Flutter prediction as currently practiced is usually deterministic, with a single structural model used to represent an aircraft. By using interval analysis to take into account structural variability, recent work has demonstrated that small changes in the structure can lead to very large changes in the altitude at which
utter occurs (Marques, Badcock, et al., J. Aircraft, 2010). In this follow-up work we examine the same phenomenon using probabilistic collocation (PC), an uncertainty quantification technique which can eficiently propagate multivariate stochastic input through a simulation code,
in this case an eigenvalue-based fluid-structure stability code. The resulting analysis predicts the consequences of an uncertain structure on incidence of
utter in probabilistic terms { information that could be useful in planning
flight-tests and assessing the risk of structural failure. The uncertainty in
utter altitude is confirmed to be substantial. Assuming that the structural uncertainty represents a epistemic uncertainty regarding the
structure, it may be reduced with the availability of additional information { for example aeroelastic response data from a flight-test. Such data is used to update the structural uncertainty using Bayes' theorem. The consequent
utter uncertainty is significantly reduced across the entire Mach number range.
Resumo:
Discrete Conditional Phase-type (DC-Ph) models are a family of models which represent skewed survival data conditioned on specific inter-related discrete variables. The survival data is modeled using a Coxian phase-type distribution which is associated with the inter-related variables using a range of possible data mining approaches such as Bayesian networks (BNs), the Naïve Bayes Classification method and classification regression trees. This paper utilizes the Discrete Conditional Phase-type model (DC-Ph) to explore the modeling of patient waiting times in an Accident and Emergency Department of a UK hospital. The resulting DC-Ph model takes on the form of the Coxian phase-type distribution conditioned on the outcome of a logistic regression model.
Resumo:
In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.
Resumo:
Classification methods with embedded feature selection capability are very appealing for the analysis of complex processes since they allow the analysis of root causes even when the number of input variables is high. In this work, we investigate the performance of three techniques for classification within a Monte Carlo strategy with the aim of root cause analysis. We consider the naive bayes classifier and the logistic regression model with two different implementations for controlling model complexity, namely, a LASSO-like implementation with a L1 norm regularization and a fully Bayesian implementation of the logistic model, the so called relevance vector machine. Several challenges can arise when estimating such models mainly linked to the characteristics of the data: a large number of input variables, high correlation among subsets of variables, the situation where the number of variables is higher than the number of available data points and the case of unbalanced datasets. Using an ecological and a semiconductor manufacturing dataset, we show advantages and drawbacks of each method, highlighting the superior performance in term of classification accuracy for the relevance vector machine with respect to the other classifiers. Moreover, we show how the combination of the proposed techniques and the Monte Carlo approach can be used to get more robust insights into the problem under analysis when faced with challenging modelling conditions.
Resumo:
This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds’ algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). A range of experiments show that we obtain models with better accuracy than TAN and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator.
Resumo:
Credal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NP-hard even in trees with binary variables except for a single ternary one. We prove that under epistemic irrelevance the polynomial-time complexity of inferences in credal trees is not likely to extend to more general models (e.g., singly connected topologies). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding their computational complexity. We show that these results remain valid even if we disallow the use of zero probabilities. We also show that the computation of bounds on the probability of the future state in a hidden Markov model is the same whether we assume epistemic irrelevance or strong independence, and we prove an analogous result for inference in Naive Bayes structures. These inferential equivalences are important for practitioners, as hidden Markov models and Naive Bayes networks are used in real applications of imprecise probability.
Resumo:
This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. It is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure), which extends previous complexity results. Furthermore, a Fully Polynomial Time Approximation Scheme for MAP in networks with bounded treewidth and bounded number of states per variable is developed. Approximation schemes were thought to be impossible, but here it is shown otherwise under the assumptions just mentioned, which are adopted in most applications.
Resumo:
This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure). Such proofs extend previous complexity results for the problem. Inapproximability results are also derived in the case of trees if the number of states per variable is not bounded. Although the problem is shown to be hard and inapproximable even in very simple scenarios, a new exact algorithm is described that is empirically fast in networks of bounded treewidth and bounded number of states per variable. The same algorithm is used as basis of a Fully Polynomial Time Approximation Scheme for MAP under such assumptions. Approximation schemes were generally thought to be impossible for this problem, but we show otherwise for classes of networks that are important in practice. The algorithms are extensively tested using some well-known networks as well as random generated cases to show their effectiveness.
Resumo:
This work presents a new general purpose classifier named Averaged Extended Tree Augmented Naive Bayes (AETAN), which is based on combining the advantageous characteristics of Extended Tree Augmented Naive Bayes (ETAN) and Averaged One-Dependence Estimator (AODE) classifiers. We describe the main properties of the approach and algorithms for learning it, along with an analysis of its computational time complexity. Empirical results with numerous data sets indicate that the new approach is superior to ETAN and AODE in terms of both zero-one classification accuracy and log loss. It also compares favourably against weighted AODE and hidden Naive Bayes. The learning phase of the new approach is slower than that of its competitors, while the time complexity for the testing phase is similar. Such characteristics suggest that the new classifier is ideal in scenarios where online learning is not required.