989 resultados para binomial distribution
Resumo:
My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.
Resumo:
Conservative procedures in low-dose risk assessment are used to set safety standards for known or suspected carcinogens. However, the assumptions upon which the methods are based and the effects of these methods are not well understood.^ To minimize the number of false-negatives and to reduce the cost of bioassays, animals are given very high doses of potential carcinogens. Results must then be extrapolated to much smaller doses to set safety standards for risks such as one per million. There are a number of competing methods that add a conservative safety factor into these calculations.^ A method of quantifying the conservatism of these methods was described and tested on eight procedures used in setting low-dose safety standards. The results using these procedures were compared by computer simulation and by the use of data from a large scale animal study.^ The method consisted of determining a "true safe dose" (tsd) according to an assumed underlying model. If one assumed that Y = the probability of cancer = P(d), a known mathematical function of the dose, then by setting Y to some predetermined acceptable risk, one can solve for d, the model's "true safe dose".^ Simulations were generated, assuming a binomial distribution, for an artificial bioassay. The eight procedures were then used to determine a "virtual safe dose" (vsd) that estimates the tsd, assuming a risk of one per million. A ratio R = ((tsd-vsd)/vsd) was calculated for each "experiment" (simulation). The mean R of 500 simulations and the probability R $<$ 0 was used to measure the over and under conservatism of each procedure.^ The eight procedures included Weil's method, Hoel's method, the Mantel-Byran method, the improved Mantel-Byran, Gross's method, fitting a one-hit model, Crump's procedure, and applying Rai and Van Ryzin's method to a Weibull model.^ None of the procedures performed uniformly well for all types of dose-response curves. When the data were linear, the one-hit model, Hoel's method, or the Gross-Mantel method worked reasonably well. However, when the data were non-linear, these same methods were overly conservative. Crump's procedure and the Weibull model performed better in these situations. ^
Resumo:
Injection of min K mRNA into Xenopus oocytes results in expression of slowly activating voltage-dependent potassium channels, distinct from those induced by expression of other cloned potassium channels. The min K protein also differs in structure, containing only a single predicted transmembrane domain. While it has been demonstrated that all other cloned potassium channels form by association of four independent subunits, the number of min K monomers which constitute a functional channel is unknown. In rat min K, replacement of Ser-69 by Ala (S69A) causes a shift in the current-voltage (I-V) relationship to more depolarized potentials; currents are not observed at potentials negative to 0 mV. To determine the subunit stoichiometry of min K channels, wild-type and S69A subunits were coexpressed. Injections of a constant amount of wild-type mRNA with increasing amounts of S69A mRNA led to potassium currents of decreasing amplitude upon voltage commands to -20 mV. Applying a binomial distribution to the reduction of current amplitudes as a function of the different coinjection mixtures yielded a subunit stoichiometry of at least 14 monomers for each functional min K channel. A model is presented for how min K subunits may form a channel.
Resumo:
Cystic echinococcosis, caused by Echinococcus grantilosus, is highly endemic in North Africa and the Middle East. This paper examines the abundance and prevalence of infection of E. granulosus in camels in Tunisia. No cysts were found in 103 camels from Kebili, whilst 19 of 188 camels from Benguerden (10.1%) were infected. Of the cysts found 95% were considered fertile with the presence of protoscolices and 80% of protoscolices were considered viable by their ability to exclude aqueous eosin. Molecular techniques were used on cyst material from camels and this demonstrated that the study animals were infected with the G1 sheep strain of E. granulosus. Observed data were fitted to a mathematical model by maximum likelihood techniques to define the parameters and their confidence limits and the negative binomial distribution was used to define the error variance in the observed data. The infection pressure to camels was somewhat lower in comparison to sheep reported in an earlier study. However, because camels are much longer-lived animals, the results of the model fit suggested that older camels have a relatively high prevalence rate, reaching a most likely value of 32% at age 15 years. This could represent an important source of transmission to dogs and hence indirectly to man of this zonotic strain. In common with similar studies on other species, there was no evidence of parasite-induced immunity in camels. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
Stochastic models based on Markov birth processes are constructed to describe the process of invasion of a fly larva by entomopathogenic nematodes. Various forms for the birth (invasion) rates are proposed. These models are then fitted to data sets describing the observed numbers of nematodes that have invaded a fly larval after a fixed period of time. Non-linear birthrates are required to achieve good fits to these data, with their precise form leading to different patterns of invasion being identified for three populations of nematodes considered. One of these (Nemasys) showed the greatest propensity for invasion. This form of modelling may be useful more generally for analysing data that show variation which is different from that expected from a binomial distribution.
Resumo:
O presente estudo cefalométrico retrospectivo teve como objetivo avaliar a influência do padrão esquelético sagital na determinação do padrão esquelético vertical da face (Tipo Facial) em indivíduos com diferentes más oclusões, que procuraram a UMESP para tratamento ortodôntico nos últimos 10 anos. Para isso foram selecionadas as telerradiografias iniciais de 59 pacientes, com idade média de 16 anos e 7 meses variando entre 11 e 25 anos. Estes pacientes foram selecionados após a análise facial subjetiva de 1600 documentações, resultando em 3 grupos. Grupo 1 Padrão I facial; Grupo 2 - Padrão II; e, Grupo 3 - Padrão III. Após esta divisão, foi comparado se a determinação do tipo facial é diferente nas medidas angulares SN.GoGn e SN.Gn. Para testar essa hipótese, utilizou-se uma regressão logística com erros distribuídos de acordo com uma distribuição binomial. Para observar a probabilidade de uma congruência entre SN.Gn e SN.GoGn utilizou-se uma regressão logística individual para cada Padrão Facial. Observou-se que a probabilidade de uma congruência entre SN.Gn e SN.GoGn no Padrão I é relativamente alta (70%), mas para os Padrões II e III essa congruência é relativamente baixa - Padrão II (46%) e Padrão III (37%). O Padrão esquelético sagital da face (Padrão I, II e III) influencia na determinação do Tipo Facial. Utilizar a grandeza SN.Gn parece não ser apropriado para a determinação do Padrão esquelético vertical da face, em virtude do ponto Gn sofrer importantes deslocamentos nos diferentes Padrões Faciais.
Resumo:
This work was supported by the Bulgarian National Science Fund under grant BY-TH-105/2005.
Resumo:
Crash reduction factors (CRFs) are used to estimate the potential number of traffic crashes expected to be prevented from investment in safety improvement projects. The method used to develop CRFs in Florida has been based on the commonly used before-and-after approach. This approach suffers from a widely recognized problem known as regression-to-the-mean (RTM). The Empirical Bayes (EB) method has been introduced as a means to addressing the RTM problem. This method requires the information from both the treatment and reference sites in order to predict the expected number of crashes had the safety improvement projects at the treatment sites not been implemented. The information from the reference sites is estimated from a safety performance function (SPF), which is a mathematical relationship that links crashes to traffic exposure. The objective of this dissertation was to develop the SPFs for different functional classes of the Florida State Highway System. Crash data from years 2001 through 2003 along with traffic and geometric data were used in the SPF model development. SPFs for both rural and urban roadway categories were developed. The modeling data used were based on one-mile segments that contain homogeneous traffic and geometric conditions within each segment. Segments involving intersections were excluded. The scatter plots of data show that the relationships between crashes and traffic exposure are nonlinear, that crashes increase with traffic exposure in an increasing rate. Four regression models, namely, Poisson (PRM), Negative Binomial (NBRM), zero-inflated Poisson (ZIP), and zero-inflated Negative Binomial (ZINB), were fitted to the one-mile segment records for individual roadway categories. The best model was selected for each category based on a combination of the Likelihood Ratio test, the Vuong statistical test, and the Akaike's Information Criterion (AIC). The NBRM model was found to be appropriate for only one category and the ZINB model was found to be more appropriate for six other categories. The overall results show that the Negative Binomial distribution model generally provides a better fit for the data than the Poisson distribution model. In addition, the ZINB model was found to give the best fit when the count data exhibit excess zeros and over-dispersion for most of the roadway categories. While model validation shows that most data points fall within the 95% prediction intervals of the models developed, the Pearson goodness-of-fit measure does not show statistical significance. This is expected as traffic volume is only one of the many factors contributing to the overall crash experience, and that the SPFs are to be applied in conjunction with Accident Modification Factors (AMFs) to further account for the safety impacts of major geometric features before arriving at the final crash prediction. However, with improved traffic and crash data quality, the crash prediction power of SPF models may be further improved.
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
Este estudio presenta la validación de las observaciones que realizó el programa de observación pesquera llamado Programa Bitácoras de Pesca (PBP) durante el periodo 2005 - 2011 en el área de distribución donde operan las embarcaciones industriales de cerco dedicadas a la pesca del stock norte-centro de la anchoveta peruana (Engraulis ringens). Además, durante ese mismo periodo y área de distribución, se estimó la magnitud del descarte por exceso de captura, descarte de juveniles y la captura incidental de dicha pesquera. Se observaron 3 768 viajes de un total de 302 859, representando un porcentaje de 1.2 %. Los datos del descarte por exceso de captura, descarte de juveniles y captura incidental registrados en los viajes observados, se caracterizaron por presentar un alta proporción de ceros. Para la validación de las observaciones, se realizó un estudio de simulación basado en la metodología de Monte Carlo usando un modelo de distribución binomial negativo. Esta permite inferir sobre el nivel de cobertura óptima y conocer si la información obtenida en el programa de observación es contable. De este análisis, se concluye que los niveles de observación actual se deberían incrementar hasta tener un nivel de cobertura de al menos el 10% del total de viajes que realicen en el año las embarcaciones industriales de cerco dedicadas a la pesca del stock norte-centro de la anchoveta peruana. La estimación del descarte por exceso de captura, descarte de juveniles y captura incidental se realizó mediante tres metodologías: Bootstrap, Modelo General Lineal (GLM) y Modelo Delta. Cada metodología estimó distintas magnitudes con tendencias similares. Las magnitudes estimadas fueron comparadas usando un ANOVA Bayesiano, la cual muestra que hubo escasa evidencia que las magnitudes estimadas del descarte por exceso de captura por metodología sean diferentes, lo mismo se presentó para el caso de la captura incidental, mientras que para el descarte de juveniles mostró que hubieron diferencias sustanciales de ser diferentes. La metodología que cumplió los supuestos y explico la mayor variabilidad de las variables modeladas fue el Modelo Delta, el cual parece ser una mejor alternativa para la estimación, debido a la alta proporción de ceros en los datos. Las estimaciones promedio del descarte por exceso de captura, descarte de juveniles y captura incidental aplicando el Modelo Delta, fueron 252 580, 41 772, 44 823 toneladas respectivamente, que en conjunto representaron el 5.74% de los desembarques. Además, con la magnitud de la estimación del descarte de juveniles, se realizó un ejercicio de proyección de biomasa bajo el escenario hipotético de no mortalidad por pesca y que los individuos juveniles descartados sólo presentaron tallas de 8 y 11 cm., en la cual se obtuvo que la biomasa que no estará disponible a la pesca está entre los 52 mil y 93 mil toneladas.
Resumo:
Human radiosensitivity is a quantitative trait that is generally subject to binomial distribution. Individual radiosensitivity, however, may deviate significantly from the mean (by 2-3 standard deviations). Thus, the same dose of radiation may result in different levels of genotoxic damage (commonly measured as chromosome aberration rates) in different individuals. There is significant genetic component in individual radiosensitivity. It is related to carriership of variant alleles of various single-nucleotide polymorphisms (most of these in genes coding for proteins functioning in DNA damage identification and repair); carriership of different number of alleles producing cumulative effects; amplification of gene copies coding for proteins responsible for radioresistance, mobile genetic elements, and others. Among the other factors influencing individual radioresistance are: radioadaptive response; bystander effect; levels of endogenous substances with radioprotective and antimutagenic properties and environmental factors such as lifestyle and diet, physical activity, psychoemotional state, hormonal state, certain drugs, infections and others. These factors may have radioprotective or sensibilising effects. Apparently, there are too many factors that may significantly modulate the biological effects of ionising radiation. Thus, conventional methodologies for biodosimetry (specifically, cytogenetic methods) may produce significant errors if personal traits that may affect radioresistance are not accounted for.
Resumo:
2016
Resumo:
S. japonicum infection is believed to be endemic in 28 of the 80 provinces of the Philippines and the most recent data on schistosomiasis prevalence have shown considerable variability between provinces. In order to increase the efficient allocation of parasitic disease control resources in the country, we aimed to describe the small scale spatial variation in S. japonicum prevalence across the Philippines, quantify the role of the physical environment in driving the spatial variation of S. japonicum, and develop a predictive risk map of S. japonicum infection. Data on S. japonicum infection from 35,754 individuals across the country were geo-located at the barangay level and included in the analysis. The analysis was then stratified geographically for Luzon, the Visayas and Mindanao. Zero-inflated binomial Bayesian geostatistical models of S. japonicum prevalence were developed and diagnostic uncertainty was incorporated. Results of the analysis show that in the three regions, males and individuals aged ≥ 20 years had significantly higher prevalence of S. japonicum compared with females and children <5 years. The role of the environmental variables differed between regions of the Philippines. S. japonicum infection was widespread in the Visayas whereas it was much more focal in Luzon and Mindanao. This analysis revealed significant spatial variation in prevalence of S. japonicum infection in the Philippines. This suggests that a spatially targeted approach to schistosomiasis interventions, including mass drug administration, is warranted. When financially possible, additional schistosomiasis surveys should be prioritized to areas identified to be at high risk, but which were underrepresented in our dataset.
Resumo:
Seasonal population dynamics of the digenean Phyllodistomum pawlovskii in the urinary bladder of the bullhead catfish, Pseudobagrus fulvidraco, were investigated in Liangzi Lake in the flood plain of the Yangtze River in China from February 2001 to July 2002. The overall prevalence of the parasite was high, 41.5% (n = 1,476), while the mean abundance was relatively low, 1.24 +/- 2.11. The parasite exhibited evident seasonality in changes of prevalence and abundance. In brief, prevalence and abundance were very low in midwinter (January), but increased and remained relatively high in other seasons and months. The distribution pattern of this parasite in the fish was overdispersed, with a variance to mean ratio > 1, but its frequency distribution could not be described by the negative binomial model. There were positive correlations between the number of the parasites per fish and the age and length of the fish; a peaked age-parasite abundance curve was not detected in the parasite-host association. It is suggested that the parasite P. pawlovskii has little effect on the population structure of the bullhead catfish.
Resumo:
In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a log-normal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples. Copyright 2012 by the author(s)/owner(s).