921 resultados para Bayesian classifier
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.
Resumo:
We study the effects of product differentiation in a Stackelberg model with demand uncertainty for the first mover. We do an ex-ante and ex-post analysis of the profits of the leader and of the follower firms in terms of product differentiation and of the demand uncertainty. We show that even with small uncertainty about the demand, the follower firm can achieve greater profits than the leader, if their products are sufficiently differentiated. We also compute the probability of the second firm having higher profit than the leading firm, subsequently showing the advantages and disadvantages of being either the leader or the follower firm.
Resumo:
Discrete data representations are necessary, or at least convenient, in many machine learning problems. While feature selection (FS) techniques aim at finding relevant subsets of features, the goal of feature discretization (FD) is to find concise (quantized) data representations, adequate for the learning task at hand. In this paper, we propose two incremental methods for FD. The first method belongs to the filter family, in which the quality of the discretization is assessed by a (supervised or unsupervised) relevance criterion. The second method is a wrapper, where discretized features are assessed using a classifier. Both methods can be coupled with any static (unsupervised or supervised) discretization procedure and can be used to perform FS as pre-processing or post-processing stages. The proposed methods attain efficient representations suitable for binary and multi-class problems with different types of data, being competitive with existing methods. Moreover, using well-known FS methods with the features discretized by our techniques leads to better accuracy than with the features discretized by other methods or with the original features. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
ABSTRACT OBJECTIVE To describe the spatial patterns of leprosy in the Brazilian state of Tocantins. METHODS This study was based on morbidity data obtained from the Sistema de Informações de Agravos de Notificação (SINAN – Brazilian Notifiable Diseases Information System), of the Ministry of Health. All new leprosy cases in individuals residing in the state of Tocantins, between 2001 and 2012, were included. In addition to the description of general disease indicators, a descriptive spatial analysis, empirical Bayesian analysis and spatial dependence analysis were performed by means of global and local Moran’s indexes. RESULTS A total of 14,542 new cases were recorded during the period under study. Based on the annual case detection rate, 77.0% of the municipalities were classified as hyperendemic (> 40 cases/100,000 inhabitants). Regarding the annual case detection rate in < 15 years-olds, 65.4% of the municipalities were hyperendemic (10.0 to 19.9 cases/100,000 inhabitants); 26.6% had a detection rate of grade 2 disability cases between 5.0 and 9.9 cases/100,000 inhabitants. There was a geographical overlap of clusters of municipalities with high detection rates in hyperendemic areas. Clusters with high disease risk (global Moran’s index: 0.51; p < 0.001), ongoing transmission (0.47; p < 0.001) and late diagnosis (0.44; p < 0.001) were identified mainly in the central-north and southwestern regions of Tocantins. CONCLUSIONS We identified high-risk clusters for transmission and late diagnosis of leprosy in the Brazilian state of Tocantins. Surveillance and control measures should be prioritized in these high-risk municipalities.
Resumo:
OBJECTIVE To evaluate the individual and contextual determinants of the use of health care services in the metropolitan region of Sao Paulo.METHODS Data from the Sao Paulo Megacity study – the Brazilian version of the World Mental Health Survey multicenter study – were used. A total of 3,588 adults living in 69 neighborhoods in the metropolitan region of Sao Paulo, SP, Southeastern Brazil, including 38 municipalities and 31 neighboring districts, were selected using multistratified sampling of the non-institutionalized population. Multilevel Bayesian logistic models were adjusted to identify the individual and contextual determinants of the use of health care services in the past 12 months and presence of a regular physician for routine care.RESULTS The contextual characteristics of the place of residence (income inequality, violence, and median income) showed no significant correlation (p > 0.05) with the use of health care services or with the presence of a regular physician for routine care. The only exception was the negative correlation between living in areas with high income inequality and presence of a regular physician (OR: 0.77; 95%CI 0.60;0.99) after controlling for individual characteristics. The study revealed a strong and consistent correlation between individual characteristics (mainly education and possession of health insurance), use of health care services, and presence of a regular physician. Presence of chronic and mental illnesses was strongly correlated with the use of health care services in the past year (regardless of the individual characteristics) but not with the presence of a regular physician.CONCLUSIONS Individual characteristics including higher education and possession of health insurance were important determinants of the use of health care services in the metropolitan area of Sao Paulo. A better understanding of these determinants is essential for the development of public policies that promote equitable use of health care services.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Ciência e Sistemas de Informação Geográfica
Resumo:
This study focus on the probabilistic modelling of mechanical properties of prestressing strands based on data collected from tensile tests carried out in Laboratório Nacional de Engenharia Civil (LNEC), Portugal, for certification purposes, and covers a period of about 9 years of production. The strands studied were produced by six manufacturers from four countries, namely Portugal, Spain, Italy and Thailand. Variability of the most important mechanical properties is examined and the results are compared with the recommendations of the Probabilistic Model Code, as well as the Eurocodes and earlier studies. The obtained results show a very low variability which, of course, benefits structural safety. Based on those results, probabilistic models for the most important mechanical properties of prestressing strands are proposed.
Resumo:
We consider a Bertrand duopoly model with unknown costs. The firms' aim is to choose the price of its product according to the well-known concept of Bayesian Nash equilibrium. The chooses are made simultaneously by both firms. In this paper, we suppose that each firm has two different technologies, and uses one of them according to a certain probability distribution. The use of either one or the other technology affects the unitary production cost. We show that this game has exactly one Bayesian Nash equilibrium. We analyse the advantages, for firms and for consumers, of using the technology with highest production cost versus the one with cheapest production cost. We prove that the expected profit of each firm increases with the variance of its production costs. We also show that the expected price of each good increases with both expected production costs, being the effect of the expected production costs of the rival dominated by the effect of the own expected production costs.
Resumo:
We study a Bertrand oligopoly model with incomplete information about rivals' costs, where the uncertainty is given by a uniform distribution. We compute the Bayesian-Nash equilibrium of this game, the ex-ante expected profit and the ex-post profit of each firm. We see that, even though only one firm produces in equilibrium, all firms have a positive ex-ante expected profit.
Resumo:
In this paper, we consider a Cournot competition between a nonprofit firm and a for-profit firm in a homogeneous goods market, with uncertain demand. Given an asymmetric tax schedule, we compute explicitly the Bayesian-Nash equilibrium. Furthermore, we analyze the effects of the tax rate and the degree of altruistic preference on market equilibrium outcomes.
Resumo:
We study Bertrand and Cournot oligopoly models with incomplete information about rivals’ costs, where the uncertainty is given by a uniform distribution. We compute the Bayesian- Nash equilibrium of both games, the ex-ante expected profits and the ex-post profits of each firm. We see that, in the price competition, even though only one firm produces in equilibrium, all firms have a positive ex-ante expected profit.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Dissertation presented at the Faculty of Sciences and Technology of the New University of Lisbon to obtain the degree of Doctor in Electrical Engineering, specialty of Robotics and Integrated Manufacturing
Resumo:
In this article, we present the first study on probabilistic tsunami hazard assessment for the Northeast (NE) Atlantic region related to earthquake sources. The methodology combines the probabilistic seismic hazard assessment, tsunami numerical modeling, and statistical approaches. We consider three main tsunamigenic areas, namely the Southwest Iberian Margin, the Gloria, and the Caribbean. For each tsunamigenic zone, we derive the annual recurrence rate for each magnitude range, from Mw 8.0 up to Mw 9.0, with a regular interval, using the Bayesian method, which incorporates seismic information from historical and instrumental catalogs. A numerical code, solving the shallow water equations, is employed to simulate the tsunami propagation and compute near shore wave heights. The probability of exceeding a specific tsunami hazard level during a given time period is calculated using the Poisson distribution. The results are presented in terms of the probability of exceedance of a given tsunami amplitude for 100- and 500-year return periods. The hazard level varies along the NE Atlantic coast, being maximum along the northern segment of the Morocco Atlantic coast, the southern Portuguese coast, and the Spanish coast of the Gulf of Cadiz. We find that the probability that a maximum wave height exceeds 1 m somewhere in the NE Atlantic region reaches 60 and 100 % for 100- and 500-year return periods, respectively. These probability values decrease, respectively, to about 15 and 50 % when considering the exceedance threshold of 5 m for the same return periods of 100 and 500 years.