980 resultados para Statistical distributions
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
For optimal solutions in health care, decision makers inevitably must evaluate trade-offs, which call for multi-attribute valuation methods. Researchers have proposed using best-worst scaling (BWS) methods which seek to extract information from respondents by asking them to identify the best and worst items in each choice set. While a companion paper describes the different types of BWS, application and their advantages and downsides, this contribution expounds their relationships with microeconomic theory, which also have implications for statistical inference. This article devotes to the microeconomic foundations of preference measurement, also addressing issues such as scale invariance and scale heterogeneity. Furthermore the paper discusses the basics of preference measurement using rating, ranking and stated choice data in the light of the findings of the preceding section. Moreover the paper gives an introduction to the use of stated choice data and juxtaposes BWS with the microeconomic foundations.
Resumo:
Statistical learning can be used to extract the words from continuous speech. Gómez, Bion, and Mehler (Language and Cognitive Processes, 26, 212–223, 2011) proposed an online measure of statistical learning: They superimposed auditory clicks on a continuous artificial speech stream made up of a random succession of trisyllabic nonwords. Participants were instructed to detect these clicks, which could be located either within or between words. The results showed that, over the length of exposure, reaction times (RTs) increased more for within-word than for between-word clicks. This result has been accounted for by means of statistical learning of the between-word boundaries. However, even though statistical learning occurs without an intention to learn, it nevertheless requires attentional resources. Therefore, this process could be affected by a concurrent task such as click detection. In the present study, we evaluated the extent to which the click detection task indeed reflects successful statistical learning. Our results suggest that the emergence of RT differences between within- and between-word click detection is neither systematic nor related to the successful segmentation of the artificial language. Therefore, instead of being an online measure of learning, the click detection task seems to interfere with the extraction of statistical regularities.
Resumo:
info:eu-repo/semantics/published
Resumo:
Of key importance to oil and gas companies is the size distribution of fields in the areas that they are drilling. Recent arguments suggest that there are many more fields yet to be discovered in mature provinces than had previously been thought because the underlying distribution is monotonic not peaked. According to this view the peaked nature of the distribution for discovered fields reflects not the underlying distribution but the effect of economic truncation. This paper contributes to the discussion by analysing up-to-date exploration and discovery data for two mature provinces using the discovery-process model, based on sampling without replacement and implicitly including economic truncation effects. The maximum likelihood estimation involved generates a high-dimensional mixed-integer nonlinear optimization problem. A highly efficient solution strategy is tested, exploiting the separable structure and handling the integer constraints by treating the problem as a masked allocation problem in dynamic programming.
Resumo:
Monte Carlo calculations of the nuclear magnetic relaxation rate in a disordered metal–hydrogen system having a distribution of jump rates are reported. The calculations deal specifically with the spin-locked rotating-frame relaxation time T1ρ. The results demonstrate that the temperature variation of the rate is only weakly dependent on the distribution and it is therefore unlikely that the jump rate distribution can be extracted from relaxation measurements in which temperature is the main variable. It is shown that the alternative of measuring the relaxation rate over a wide range of spin-locking field strengths at a constant temperature can lead to an evaluation of the distribution.
Resumo:
As announced in the November 2000 issue of MathStats&OR [1], one of the projects supported by the Maths, Stats & OR Network funds is an international survey of research into pedagogic issues in statistics and OR. I am taking the lead on this and report here on the progress that has been made during the first year. A paper giving some background to the project and describing initial thinking on how it might be implemented was presented at the 53rd session of the International Statistical Institute in Seoul, Korea, in August 2001 in a session on The future of statistics education research [2]. It sounded easy. I considered that I was something of an expert on surveys having lectured on the topic for many years and having helped students and others who were doing surveys, particularly with the design of their questionnaires. Surely all I had to do was to draft a few questions, send them electronically to colleagues in statistical education who would be only to happy to respond, and summarise their responses? I should have learnt from my experience of advising all those students who thought that doing a survey was easy and to whom I had to explain that their ideas were too ambitious. There are several inter-related stages in survey research and it is important to think about these before rushing into the collection of data. In the case of the survey in question, this planning stage revealed several challenges. Surveys are usually done for a purpose so even before planning how to do them, it is advisable to think about the final product and the dissemination of results. This is the route I followed.
Resumo:
Forest fires can cause extensive damage to natural resources and properties. They can also destroy wildlife habitat, affect the forest ecosystem and threaten human lives. In this paper extreme wildland fires are analysed using a point process model for extremes. The model based on a generalised Pareto distribution is used to model data on acres of wildland burnt by extreme fire in the US since 1825. A semi-parametric smoothing approach is adapted with maximum likelihood method to estimate model parameters.
Resumo:
The author's approach to teaching an integrative unit to a small group of master’s level Applied Statistics students in 2000-2001 is described. Details of the various activities such as data analysis, reading and discussion of papers, and training in consultancy skills are given, as also are details of the assessment. The students’ and lecturer’s views of the unit are discussed.
Resumo:
This paper describes the role of the Royal Statistical Society in shaping statistical education within the UK and further afield. Until 2001 the Society had four agencies concerned with education at all levels. The work of these is discussed and recent new arrangements are outlined. The Society’s efforts to disseminate good practice through organising meetings and running a network of Associate Schools and College are explored in some detail.
Resumo:
The SB distributional model of Johnson's 1949 paper was introduced by a transformation to normality, that is, z ~ N(0, 1), consisting of a linear scaling to the range (0, 1), a logit transformation, and an affine transformation, z = γ + δu. The model, in its original parameterization, has often been used in forest diameter distribution modelling. In this paper, we define the SB distribution in terms of the inverse transformation from normality, including an initial linear scaling transformation, u = γ′ + δ′z (δ′ = 1/δ and γ′ = �γ/δ). The SB model in terms of the new parameterization is derived, and maximum likelihood estimation schema are presented for both model parameterizations. The statistical properties of the two alternative parameterizations are compared empirically on 20 data sets of diameter distributions of Changbai larch (Larix olgensis Henry). The new parameterization is shown to be statistically better than Johnson's original parameterization for the data sets considered here.
Resumo:
Johnson's SB and the logit-logistic are four-parameter distribution models that may be obtained from the standard normal and logistic distributions by a four-parameter transformation. For relatively small data sets, such as diameter at breast height measurements obtained from typical sample plots, distribution models with four or less parameters have been found to be empirically adequate. However, in situations in which the distributions are complex, for example in mixed stands or when the stand has been thinned or when working with aggregated data, then distribution models with more shape parameters may prove to be necessary. By replacing the symmetric standard logistic distribution of the logit-logistic with a one-parameter “standard Richards” distribution and transforming by a five-parameter Richards function, we obtain a new six-parameter distribution model, the “Richit-Richards”. The Richit-Richards includes the “logit-Richards”, the “Richit-logistic”, and the logit-logistic as submodels. Maximum likelihood estimation is used to fit the model, and some problems in the maximum likelihood estimation of bounding parameters are discussed. An empirical case study of the Richit-Richards and its submodels is conducted on pooled diameter at breast height data from 107 sample plots of Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.). It is found that the new models provide significantly better fits than the four-parameter logit-logistic for large data sets.