909 resultados para Nonparametric Bayes
Resumo:
Coleodactylus amazonicus, a small leaf-litter diurnal gecko widely distributed in Amazon Basin has been, considered a single species with no significant morphological differences between populations along its range. A recent molecular study, however, detected large genetic differences between populations of central Amazonia and those in the easternmost part of the Amazon Basin, suggesting the presence of taxonomically unrecognised diversity. In this study, DNA sequences of three mitochondrial (165, cytb, and ND4) and two nuclear genes (RAG-1, c-mos) were used to investigate whether the species currently identified as C. amazonicus contains morphologically cryptic species lineages. The present phylogenetic analysis reveals further genetic subdivision including at least five potential species lineages, restricted to northeastern (lineage A), southeastern (lineage B), central-northern (lineage E) and central-southern (lineages C and D) parts of Amazon Basin. All clades are characterized by exclusive groups of alleles for both nuclear genes and highly divergent mitochondrial haplotype clades, with corrected pairwise net sequence divergence between sister lineages ranging from 9.1% to 20.7% for the entire mtDNA dataset. Results of this study suggest that the real diversity of ""C. amazonicus"" has been underestimated due to its apparent cryptic diversification. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
This work proposes and discusses an approach for inducing Bayesian classifiers aimed at balancing the tradeoff between the precise probability estimates produced by time consuming unrestricted Bayesian networks and the computational efficiency of Naive Bayes (NB) classifiers. The proposed approach is based on the fundamental principles of the Heuristic Search Bayesian network learning. The Markov Blanket concept, as well as a proposed ""approximate Markov Blanket"" are used to reduce the number of nodes that form the Bayesian network to be induced from data. Consequently, the usually high computational cost of the heuristic search learning algorithms can be lessened, while Bayesian network structures better than NB can be achieved. The resulting algorithms, called DMBC (Dynamic Markov Blanket Classifier) and A-DMBC (Approximate DMBC), are empirically assessed in twelve domains that illustrate scenarios of particular interest. The obtained results are compared with NB and Tree Augmented Network (TAN) classifiers, and confinn that both proposed algorithms can provide good classification accuracies and better probability estimates than NB and TAN, while being more computationally efficient than the widely used K2 Algorithm.
Resumo:
The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.
Resumo:
Nesse artigo, tem-se o interesse em avaliar diferentes estratégias de estimação de parâmetros para um modelo de regressão linear múltipla. Para a estimação dos parâmetros do modelo foram utilizados dados de um ensaio clínico em que o interesse foi verificar se o ensaio mecânico da propriedade de força máxima (EM-FM) está associada com a massa femoral, com o diâmetro femoral e com o grupo experimental de ratas ovariectomizadas da raça Rattus norvegicus albinus, variedade Wistar. Para a estimação dos parâmetros do modelo serão comparadas três metodologias: a metodologia clássica, baseada no método dos mínimos quadrados; a metodologia Bayesiana, baseada no teorema de Bayes; e o método Bootstrap, baseado em processos de reamostragem.
Resumo:
In this paper we extend partial linear models with normal errors to Student-t errors Penalized likelihood equations are applied to derive the maximum likelihood estimates which appear to be robust against outlying observations in the sense of the Mahalanobis distance In order to study the sensitivity of the penalized estimates under some usual perturbation schemes in the model or data the local influence curvatures are derived and some diagnostic graphics are proposed A motivating example preliminary analyzed under normal errors is reanalyzed under Student-t errors The local influence approach is used to compare the sensitivity of the model estimates (C) 2010 Elsevier B V All rights reserved
Resumo:
The main object of this paper is to discuss the Bayes estimation of the regression coefficients in the elliptically distributed simple regression model with measurement errors. The posterior distribution for the line parameters is obtained in a closed form, considering the following: the ratio of the error variances is known, informative prior distribution for the error variance, and non-informative prior distributions for the regression coefficients and for the incidental parameters. We proved that the posterior distribution of the regression coefficients has at most two real modes. Situations with a single mode are more likely than those with two modes, especially in large samples. The precision of the modal estimators is studied by deriving the Hessian matrix, which although complicated can be computed numerically. The posterior mean is estimated by using the Gibbs sampling algorithm and approximations by normal distributions. The results are applied to a real data set and connections with results in the literature are reported. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
In this article, we introduce a semi-parametric Bayesian approach based on Dirichlet process priors for the discrete calibration problem in binomial regression models. An interesting topic is the dosimetry problem related to the dose-response model. A hierarchical formulation is provided so that a Markov chain Monte Carlo approach is developed. The methodology is applied to simulated and real data.
Resumo:
Purpose: To evaluate the microvessel density by comparing the performance of anti-factor VIII-related antigen, anti-CD31 and, anti-CD34 monoclonal antibodies in breast cancer. Methods: Twenty-three postmenopausal women diagnosed with Stage II breast cancer submitted to definitive surgical treatment were evaluated. The monoclonal antibodies used were anti-factor VIII, anti-CD31 and anti-CD34. Microvessels were counted in the areas of highest microvessel density in ten random fields (200 x). The data were analyzed using the Kruskal-Wallis nonparametric test (p < 0.05). Results: Mean microvessel densities with anti-factor VIII, anti-CD31 and anti-CD34 were 4.16 +/- 0.38, 4.09 +/- 0.23 and 6.59 +/- 0.42, respectively. Microvessel density as assessed by anti-CD34 was significantly greater than that detected by anti-CD31 or anti-factor VIII (p < 0.0001). There was no statistically significant difference between anti-CD31 and anti-factor VIII (p = 0.4889). Conclusion: The density of stained microvessels was greater and staining was more intense with anti-CD34 compared to anti-CD31 and anti-factor VII-related antigen.
Resumo:
In this paper we show the results of a comparison simulation study for three classification techniques: Multinomial Logistic Regression (MLR), No Metric Discriminant Analysis (NDA) and Linear Discriminant Analysis (LDA). The measure used to compare the performance of the three techniques was the Error Classification Rate (ECR). We found that MLR and LDA techniques have similar performance and that they are better than DNA when the population multivariate distribution is Normal or Logit-Normal. For the case of log-normal and Sinh(-1)-normal multivariate distributions we found that MLR had the better performance.
A robust Bayesian approach to null intercept measurement error model with application to dental data
Resumo:
Measurement error models often arise in epidemiological and clinical research. Usually, in this set up it is assumed that the latent variable has a normal distribution. However, the normality assumption may not be always correct. Skew-normal/independent distribution is a class of asymmetric thick-tailed distributions which includes the Skew-normal distribution as a special case. In this paper, we explore the use of skew-normal/independent distribution as a robust alternative to null intercept measurement error model under a Bayesian paradigm. We assume that the random errors and the unobserved value of the covariate (latent variable) follows jointly a skew-normal/independent distribution, providing an appealing robust alternative to the routine use of symmetric normal distribution in this type of model. Specific distributions examined include univariate and multivariate versions of the skew-normal distribution, the skew-t distributions, the skew-slash distributions and the skew contaminated normal distributions. The methods developed is illustrated using a real data set from a dental clinical trial. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
The objective of the study was to evaluate saliva flow rate, buffer capacity, pH levels, and dental caries experience (DCE) in autistic individuals, comparing the results with a control group (CG). The study was performed on 25 noninstitutionalized autistic boys, divided in two groups. G1 composed of ten children, ages 3-8. G2 composed of 15 adolescents ages 9-13. The CG was composed of 25 healthy boys, randomly selected and also divided in two groups: CG3 composed of 14 children ages 4-8, and CG4 composed of 11 adolescents ages 9-14. Whole saliva was collected under slight suction, and pH and buffer capacity were determined using a digital pHmeter. Buffer capacity was measured by titration using 0.01 N HCl, and the flow rate expressed in ml/min, and the DCE was expressed by decayed, missing, and filled teeth (permanent dentition [DMFT] and primary dentition [dmft]). Data were plotted and submitted to nonparametric (Kruskal-Wallis) and parametric (Student`s t test) statistical tests with a significance level less than 0.05. When comparing G1 and CG3, groups did not differ in flow rate, pH levels, buffer capacity, or DMFT. Groups G2 and CG4 differ significantly in pH (p = 0.007) and pHi = 7.0 (p = 0.001), with lower scores for G2. In autistic individuals aged 3-8 and 9-13, medicated or not, there was no significant statistical difference in flow rate, pH, and buffer capacity. The comparison of DCE among autistic children and CG children with deciduous (dmft) and mixed/permanent decayed, missing, and filled teeth (DMFT) did not show statistical difference (p = 0.743). Data suggest that autistic individuals have neither a higher flow rate nor a better buffer capacity. Similar DCE was observed in both groups studied.
Resumo:
The main purpose of this thesis project is to prediction of symptom severity and cause in data from test battery of the Parkinson’s disease patient, which is based on data mining. The collection of the data is from test battery on a hand in computer. We use the Chi-Square method and check which variables are important and which are not important. Then we apply different data mining techniques on our normalize data and check which technique or method gives good results.The implementation of this thesis is in WEKA. We normalize our data and then apply different methods on this data. The methods which we used are Naïve Bayes, CART and KNN. We draw the Bland Altman and Spearman’s Correlation for checking the final results and prediction of data. The Bland Altman tells how the percentage of our confident level in this data is correct and Spearman’s Correlation tells us our relationship is strong. On the basis of results and analysis we see all three methods give nearly same results. But if we see our CART (J48 Decision Tree) it gives good result of under predicted and over predicted values that’s lies between -2 to +2. The correlation between the Actual and Predicted values is 0,794in CART. Cause gives the better percentage classification result then disability because it can use two classes.
Resumo:
The aim of this thesis is to investigate computerized voice assessment methods to classify between the normal and Dysarthric speech signals. In this proposed system, computerized assessment methods equipped with signal processing and artificial intelligence techniques have been introduced. The sentences used for the measurement of inter-stress intervals (ISI) were read by each subject. These sentences were computed for comparisons between normal and impaired voice. Band pass filter has been used for the preprocessing of speech samples. Speech segmentation is performed using signal energy and spectral centroid to separate voiced and unvoiced areas in speech signal. Acoustic features are extracted from the LPC model and speech segments from each audio signal to find the anomalies. The speech features which have been assessed for classification are Energy Entropy, Zero crossing rate (ZCR), Spectral-Centroid, Mean Fundamental-Frequency (Meanf0), Jitter (RAP), Jitter (PPQ), and Shimmer (APQ). Naïve Bayes (NB) has been used for speech classification. For speech test-1 and test-2, 72% and 80% accuracies of classification between healthy and impaired speech samples have been achieved respectively using the NB. For speech test-3, 64% correct classification is achieved using the NB. The results direct the possibility of speech impairment classification in PD patients based on the clinical rating scale.
Resumo:
The FE ('fixed effects') estimator of technical inefficiency performs poorly when N ('number of firms') is large and T ('number of time observations') is small. We propose estimators of both the firm effects and the inefficiencies, which have small sample gains compared to the traditional FE estimator. The estimators are based on nonparametric kernel regression of unordered variables, which includes the FE estimator as a special case. In terms of global conditional MSE ('mean square error') criterions, it is proved that there are kernel estimators which are efficient to the FE estimators of firm effects and inefficiencies, in finite samples. Monte Carlo simulations supports our theoretical findings and in an empirical example it is shown how the traditional FE estimator and the proposed kernel FE estimator lead to very different conclusions about inefficiency of Indonesian rice farmers.
Resumo:
Este trabalho descreve a especificação e implementação do protótipo Assistente de Feedback que ajuda os usuários a ajustarem os parâmetros do serviço de filtragem de mensagens vindas do correio eletrônico de sistemas como o Direto. O Assistente de Feedback é instalado no computador do usuário do Direto para monitorar suas preferências representadas pelas ações aplicadas nas mensagens do correio eletrônico. O trabalho apresenta, ainda, uma revisão bibliográfica sobre os conceitos gerais de probabilidades, redes Bayesianas e classificadores. Procura-se descrever as características gerais dos classificadores, em especial o Naive Bayes, sua lógica e seu desempenho comparado a outros classificadores. São abordados, também, conceitos relacionados ao modelo de perfil de usuário e o ambiente Direto. O Naive Bayes torna-se atraente para ser utilizado no Assistente de Feedback por apresentar bom desempenho sobre os demais classificadores e por ser eficiente na predição, quando os atributos são independentes entre si. O Assistente de Feedback utiliza um classificador Naive Bayes para predizer as preferências por intermédio das ações do usuário. Utiliza, também, pesos que representarão a satisfação do usuário para os termos extraídos do corpo da mensagem. Esses pesos são associados às ações do usuário para estimar os termos mais interessantes e menos interessantes, pelo valor de suas médias finais. Quando o usuário desejar alterar os filtros de mensagens do Direto, ele solicita ao Assistente de Feedback sugestões para possíveis exclusões dos termos menos interessantes e as possíveis inclusões dos termos mais interessantes. O protótipo é testado utilizando dois métodos de avaliação para medir o grau de precisão e o desempenho do Assistente de Feedback. Os resultados obtidos na avaliação de precisão apresentam valores satisfatórios, considerando o uso de cinco classes pelo classificador do Assistente de Feedback. Os resultados dos testes de desempenho permitem observar que, se forem utilizadas máquinas com configurações mais atualizadas, os usuários conseguirão receber sugestões com tempo de respostas mais toleráveis.