995 resultados para Binomial Model
Resumo:
In this work, we correlate the daily number of human leptospirosis cases with several climatic factors. We used a negative binomial model that considers hospital daily admissions due to leptospirosis as the dependent variable, and the climatic variables of daily precipitation pattern, and maximum and minimum temperature as independent variables. We calculated the monthly leptospirosis admission probabilities from the precipitation and maximum temperature variables. The month of February showed the highest probability, although values were also high during the spring months. The month of February also showed the highest number of hospital admissions. Another interesting result is that, for every 20 mm precipitation, there was an average increase of 31.5% in hospital admissions. Additionally, the relative risk of leptospirosis varied from 1.1 to 2.0 when the precipitation varied from 20 to 140 mm.
Resumo:
BACKGROUND A low level of education and the migration background of parents are associated with the development of caries in children. The aim of this study was to evaluate whether a higher educational level of parents can overcome risks for the development of caries in immigrants in Vienna, Austria. METHODS The educational level of the parents, the school type, and the caries status of 736 randomly selected twelve-year-old children with and without migration background was determined in this cross sectional study. In children attending school in Vienna the decayed, missing, and filled teeth (DMFT) index was determined. For statistical analysis, a mixed negative-binomial-model was used. RESULTS The caries status of the children with migration background was significantly worse compared to that of the native Viennese population. A significant interaction was found between migration background and the educational level of the parents (p = 0.045). No interaction was found between the school type and either the migration background (p = 0.220) or the education level of the parents (p = 0.08). In parents with a higher scholarly education level, migration background (p < 0.01) and school type (p = 0.018) showed an association with DMFT values. In parents with a low education level, however, migration background and school type had no significant association with DMFT values. CONCLUSION These data indicate that children with a migration background are at higher risk to acquire caries than other Viennese children, even when the parents have received a higher education.
Resumo:
With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^
Resumo:
My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.
Resumo:
La duración del viaje vacacional es una decisión del turista con unas implicaciones fundamentales para las organizaciones turísticas, pero que ha recibido una escasa atención por la literatura. Además, los escasos estudios se han centrado en los destinos costeros, cuando el turismo de interior se está erigiendo como una alternativa importante en algunos países. El presente trabajo analiza los factores determinantes de la elección temporal del viaje turístico, distinguiendo el tipo de destino elegido -costa e interior-, y proponiendo varias hipótesis acerca de la influencia de las características de los individuos relacionadas con el destino, de las restricciones personales y de las características sociodemográficas. La metodología aplicada estima, como novedad en este tipo de decisiones, un Modelo Binomial Negativo Truncado que evita los sesgos de estimación de los modelos de regresión y el supuesto restrictivo de igualdad media-varianza del Modelo de Poisson. La aplicación empírica realizada en España sobre una muestra de 1.600 individuos permite concluir, por un lado, que el Modelo Binomial Negativo es más adecuado que el de Poisson para realizar este tipo de análisis. Por otro lado, las dimensiones determinantes de la duración del viaje vacacional son, para ambos destinos, el alojamiento en hotel y apartamento propio, las restricciones temporales, la edad del turista y la forma de organizar el viaje; mientras que el tamaño de la ciudad de residencia y el atributo “precios baratos” es un aspecto diferencial de la costa; y el alojamiento en apartamentos alquilados lo es de los destinos de interior.
Resumo:
The current tendency to undertake more trips, but of shorter duration, throughout the year, has meant that the tourist industry has started to show greater interest in attracting those market segments that opt for more prolonged stays, as they are especially profitable. One of these segments is that of seniors. Given the aging demographic of the population worldwide, which is particularly noticeable in Spain, the object of this study is to identify the variables that determine the length of stay of Spanish seniors at their destination. The Negative Binomial model was adapted to the context of length of stay by Spanish seniors and the determinant factors identified were: age, travel purpose, climate, type of accommodation, group size, trip type and the activities carried out at the destination. This study is a contribution to this field from an empirical point of view, given the scarcity of studies of this type and their eminently descriptive character; as well as from a practical level, with interesting implications for the sector.
Resumo:
Forward error correction (FEC) plays a vital role in coherent optical systems employing multi-level modulation. However, much of coding theory assumes that additive white Gaussian noise (AWGN) is dominant, whereas coherent optical systems have significant phase noise (PN) in addition to AWGN. This changes the error statistics and impacts FEC performance. In this paper, we propose a novel semianalytical method for dimensioning binary Bose-Chaudhuri-Hocquenghem (BCH) codes for systems with PN. Our method involves extracting statistics from pre-FEC bit error rate (BER) simulations. We use these statistics to parameterize a bivariate binomial model that describes the distribution of bit errors. In this way, we relate pre-FEC statistics to post-FEC BER and BCH codes. Our method is applicable to pre-FEC BER around 10-3 and any post-FEC BER. Using numerical simulations, we evaluate the accuracy of our approach for a target post-FEC BER of 10-5. Codes dimensioned with our bivariate binomial model meet the target within 0.2-dB signal-to-noise ratio.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
In this article, for the first time, we propose the negative binomial-beta Weibull (BW) regression model for studying the recurrence of prostate cancer and to predict the cure fraction for patients with clinically localized prostate cancer treated by open radical prostatectomy. The cure model considers that a fraction of the survivors are cured of the disease. The survival function for the population of patients can be modeled by a cure parametric model using the BW distribution. We derive an explicit expansion for the moments of the recurrence time distribution for the uncured individuals. The proposed distribution can be used to model survival data when the hazard rate function is increasing, decreasing, unimodal and bathtub shaped. Another advantage is that the proposed model includes as special sub-models some of the well-known cure rate models discussed in the literature. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes. We analyze a real data set for localized prostate cancer patients after open radical prostatectomy.
Resumo:
Joint generalized linear models and double generalized linear models (DGLMs) were designed to model outcomes for which the variability can be explained using factors and/or covariates. When such factors operate, the usual normal regression models, which inherently exhibit constant variance, will under-represent variation in the data and hence may lead to erroneous inferences. For count and proportion data, such noise factors can generate a so-called overdispersion effect, and the use of binomial and Poisson models underestimates the variability and, consequently, incorrectly indicate significant effects. In this manuscript, we propose a DGLM from a Bayesian perspective, focusing on the case of proportion data, where the overdispersion can be modeled using a random effect that depends on some noise factors. The posterior joint density function was sampled using Monte Carlo Markov Chain algorithms, allowing inferences over the model parameters. An application to a data set on apple tissue culture is presented, for which it is shown that the Bayesian approach is quite feasible, even when limited prior information is available, thereby generating valuable insight for the researcher about its experimental results.
Resumo:
A probabilistic model for intra-familial distribution of infectous disease is proposed and applied to the prevalence of positive serology for Trypanosoma cruzi infection in Northeastern Brazilian sample. This double with one tail excess model fits satisfactorily to the data and its interpretation says that around 51% of these 982 families are free of infection risk; among those that are at risk, 3% have a high risk (0.66), probably due to high domestic infestation of the vector bug; while 97% show a small risk (0.11), probably due to accidental, non-domestic transmission.
Resumo:
It has been argued that by truncating the sample space of the negative binomial and of the inverse Gaussian-Poisson mixture models at zero, one is allowed to extend the parameter space of the model. Here that is proved to be the case for the more general three parameter Tweedie-Poisson mixture model. It is also proved that the distributions in the extended part of the parameter space are not the zero truncation of mixed poisson distributions and that, other than for the negative binomial, they are not mixtures of zero truncated Poisson distributions either. By extending the parameter space one can improve the fit when the frequency of one is larger and the right tail is heavier than is allowed by the unextended model. Considering the extended model also allows one to use the basic maximum likelihood based inference tools when parameter estimates fall in the extended part of the parameter space, and hence when the m.l.e. does not exist under the unextended model. This extended truncated Tweedie-Poisson model is proved to be useful in the analysis of words and species frequency count data.
Resumo:
Many dynamic revenue management models divide the sale period into a finite number of periods T and assume, invoking a fine-enough grid of time, that each period sees at most one booking request. These Poisson-type assumptions restrict the variability of the demand in the model, but researchers and practitioners were willing to overlook this for the benefit of tractability of the models. In this paper, we criticize this model from another angle. Estimating the discrete finite-period model poses problems of indeterminacy and non-robustness: Arbitrarily fixing T leads to arbitrary control values and on the other hand estimating T from data adds an additional layer of indeterminacy. To counter this, we first propose an alternate finite-population model that avoids this problem of fixing T and allows a wider range of demand distributions, while retaining the useful marginal-value properties of the finite-period model. The finite-population model still requires jointly estimating market size and the parameters of the customer purchase model without observing no-purchases. Estimation of market-size when no-purchases are unobservable has rarely been attempted in the marketing or revenue management literature. Indeed, we point out that it is akin to the classical statistical problem of estimating the parameters of a binomial distribution with unknown population size and success probability, and hence likely to be challenging. However, when the purchase probabilities are given by a functional form such as a multinomial-logit model, we propose an estimation heuristic that exploits the specification of the functional form, the variety of the offer sets in a typical RM setting, and qualitative knowledge of arrival rates. Finally we perform simulations to show that the estimator is very promising in obtaining unbiased estimates of population size and the model parameters.
Resumo:
None of the current surveillance streams monitoring the presence of scrapie in Great Britain provide a comprehensive and unbiased estimate of the prevalence of the disease at the holding level. Previous work to estimate the under-ascertainment adjusted prevalence of scrapie in Great Britain applied multiple-list capture–recapture methods. The enforcement of new control measures on scrapie-affected holdings in 2004 has stopped the overlapping between surveillance sources and, hence, the application of multiple-list capture–recapture models. Alternative methods, still under the capture–recapture methodology, relying on repeated entries in one single list have been suggested in these situations. In this article, we apply one-list capture–recapture approaches to data held on the Scrapie Notifications Database to estimate the undetected population of scrapie-affected holdings with clinical disease in Great Britain for the years 2002, 2003, and 2004. For doing so, we develop a new diagnostic tool for indication of heterogeneity as well as a new understanding of the Zelterman and Chao’s lower bound estimators to account for potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a special, locally truncated Poisson likelihood equivalent to a binomial likelihood. This understanding allows the extension of the Zelterman approach by means of logistic regression to include observed heterogeneity in the form of covariates—in case studied here, the holding size and country of origin. Our results confirm the presence of substantial unobserved heterogeneity supporting the application of our two estimators. The total scrapie-affected holding population in Great Britain is around 300 holdings per year. None of the covariates appear to inform the model significantly.
Resumo:
None of the current surveillance streams monitoring the presence of scrapie in Great Britain provide a comprehensive and unbiased estimate of the prevalence of the disease at the holding level. Previous work to estimate the under-ascertainment adjusted prevalence of scrapie in Great Britain applied multiple-list capture-recapture methods. The enforcement of new control measures on scrapie-affected holdings in 2004 has stopped the overlapping between surveillance sources and, hence, the application of multiple-list capture-recapture models. Alternative methods, still under the capture-recapture methodology, relying on repeated entries in one single list have been suggested in these situations. In this article, we apply one-list capture-recapture approaches to data held on the Scrapie Notifications Database to estimate the undetected population of scrapie-affected holdings with clinical disease in Great Britain for the years 2002, 2003, and 2004. For doing so, we develop a new diagnostic tool for indication of heterogeneity as well as a new understanding of the Zelterman and Chao's lower bound estimators to account for potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a special, locally truncated Poisson likelihood equivalent to a binomial likelihood. This understanding allows the extension of the Zelterman approach by means of logistic regression to include observed heterogeneity in the form of covariates-in case studied here, the holding size and country of origin. Our results confirm the presence of substantial unobserved heterogeneity supporting the application of our two estimators. The total scrapie-affected holding population in Great Britain is around 300 holdings per year. None of the covariates appear to inform the model significantly.