884 resultados para Inference module
Resumo:
Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.
Resumo:
The module of a quadrilateral is a positive real number which divides quadrilaterals into conformal equivalence classes. This is an introductory text to the module of a quadrilateral with some historical background and some numerical aspects. This work discusses the following topics: 1. Preliminaries 2. The module of a quadrilateral 3. The Schwarz-Christoffel Mapping 4. Symmetry properties of the module 5. Computational results 6. Other numerical methods Appendices include: Numerical evaluation of the elliptic integrals of the first kind. Matlab programs and scripts and possible topics for future research. Numerical results section covers additive quadrilaterals and the module of a quadrilateral under the movement of one of its vertex.
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
Receptor guanylyl cyclases are multidomain proteins, and ligand binding to the extracellular domain increases the levels of intracellular cGMP. The intracellular domain of these receptors is composed of a kinase homology domain (KHD), a linker of similar to 70 amino acids, followed by the C-terminal guanylyl cyclase domain. Mechanisms by which these receptors are allosterically regulated by ligand binding to the extracellular domain and ATP binding to the KHD are not completely understood. Here we examine the role of the linker region in receptor guanylyl cyclases by a series of point mutations in receptor guanylyl cyclase C. The linker region is predicted to adopt a coiled coil structure and aid in dimerization, but we find that the effects of mutations neither follow a pattern predicted for a coiled coil peptide nor abrogate dimerization. Importantly, this region is critical for repressing the guanylyl cyclase activity of the receptor in the absence of ligand and permitting ligand-mediated activation of the cyclase domain. Mutant receptors with high basal guanylyl cyclase activity show no further activation in the presence of non-ionic detergents, suggesting that hydrophobic interactions in the basal and inactive conformation of the guanylyl cyclase domain are disrupted by mutation. Equivalent mutations in the linker region of guanylyl cyclase A also elevated the basal activity and abolished ligand-and detergent-mediated activation. We, therefore, have defined a key regulatory role for the linker region of receptor guanylyl cyclases which serves as a transducer of information from the extracellular domain via the KHD to the catalytic domain.
Resumo:
In the thesis we consider inference for cointegration in vector autoregressive (VAR) models. The thesis consists of an introduction and four papers. The first paper proposes a new test for cointegration in VAR models that is directly based on the eigenvalues of the least squares (LS) estimate of the autoregressive matrix. In the second paper we compare a small sample correction for the likelihood ratio (LR) test of cointegrating rank and the bootstrap. The simulation experiments show that the bootstrap works very well in practice and dominates the correction factor. The tests are applied to international stock prices data, and the .nite sample performance of the tests are investigated by simulating the data. The third paper studies the demand for money in Sweden 1970—2000 using the I(2) model. In the fourth paper we re-examine the evidence of cointegration between international stock prices. The paper shows that some of the previous empirical results can be explained by the small-sample bias and size distortion of Johansen’s LR tests for cointegration. In all papers we work with two data sets. The first data set is a Swedish money demand data set with observations on the money stock, the consumer price index, gross domestic product (GDP), the short-term interest rate and the long-term interest rate. The data are quarterly and the sample period is 1970(1)—2000(1). The second data set consists of month-end stock market index observations for Finland, France, Germany, Sweden, the United Kingdom and the United States from 1980(1) to 1997(2). Both data sets are typical of the sample sizes encountered in economic data, and the applications illustrate the usefulness of the models and tests discussed in the thesis.
Resumo:
Modern sample surveys started to spread after statistician at the U.S. Bureau of the Census in the 1940s had developed a sampling design for the Current Population Survey (CPS). A significant factor was also that digital computers became available for statisticians. In the beginning of 1950s, the theory was documented in textbooks on survey sampling. This thesis is about the development of the statistical inference for sample surveys. For the first time the idea of statistical inference was enunciated by a French scientist, P. S. Laplace. In 1781, he published a plan for a partial investigation in which he determined the sample size needed to reach the desired accuracy in estimation. The plan was based on Laplace s Principle of Inverse Probability and on his derivation of the Central Limit Theorem. They were published in a memoir in 1774 which is one of the origins of statistical inference. Laplace s inference model was based on Bernoulli trials and binominal probabilities. He assumed that populations were changing constantly. It was depicted by assuming a priori distributions for parameters. Laplace s inference model dominated statistical thinking for a century. Sample selection in Laplace s investigations was purposive. In 1894 in the International Statistical Institute meeting, Norwegian Anders Kiaer presented the idea of the Representative Method to draw samples. Its idea was that the sample would be a miniature of the population. It is still prevailing. The virtues of random sampling were known but practical problems of sample selection and data collection hindered its use. Arhtur Bowley realized the potentials of Kiaer s method and in the beginning of the 20th century carried out several surveys in the UK. He also developed the theory of statistical inference for finite populations. It was based on Laplace s inference model. R. A. Fisher contributions in the 1920 s constitute a watershed in the statistical science He revolutionized the theory of statistics. In addition, he introduced a new statistical inference model which is still the prevailing paradigm. The essential idea is to draw repeatedly samples from the same population and the assumption that population parameters are constants. Fisher s theory did not include a priori probabilities. Jerzy Neyman adopted Fisher s inference model and applied it to finite populations with the difference that Neyman s inference model does not include any assumptions of the distributions of the study variables. Applying Fisher s fiducial argument he developed the theory for confidence intervals. Neyman s last contribution to survey sampling presented a theory for double sampling. This gave the central idea for statisticians at the U.S. Census Bureau to develop the complex survey design for the CPS. Important criterion was to have a method in which the costs of data collection were acceptable, and which provided approximately equal interviewer workloads, besides sufficient accuracy in estimation.
Resumo:
Our ability to infer the protein quaternary structure automatically from atom and lattice information is inadequate, especially for weak complexes, and heteromeric quaternary structures. Several approaches exist, but they have limited performance. Here, we present a new scheme to infer protein quaternary structure from lattice and protein information, with all-around coverage for strong, weak and very weak affinity homomeric and heteromeric complexes. The scheme combines naive Bayes classifier and point group symmetry under Boolean framework to detect quaternary structures in crystal lattice. It consistently produces >= 90% coverage across diverse benchmarking data sets, including a notably superior 95% coverage for recognition heteromeric complexes, compared with 53% on the same data set by current state-of-the-art method. The detailed study of a limited number of prediction-failed cases offers interesting insights into the intriguing nature of protein contacts in lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes.
Resumo:
Let K be a field of characteristic zero and let m(0),..., m(e-1) be a sequence of positive integers. Let C be an algebroid monomial curve in the affine e-space A(K)(e) defined parametrically by X-0 = T-m0,..., Xe-1 = Tme-1 and let A be the coordinate ring of C. In this paper, we assume that some e - 1 terms of m(0),..., m(e-1) form an arithmetic sequence and construct a minimal set of generators for the derivation module Der(K)(A) of A and write an explicit formula for mu (Der(K)(A)).
Resumo:
A fuzzy logic intelligent system is developed for gas-turbine fault isolation. The gas path measurements used for fault isolation are exhaust gas temperature, low and high rotor speed, and fuel flow. These four measurements are also called the cockpit parameters and are typically found in almost all older and newer jet engines. The fuzzy logic system uses rules developed from a model of performance influence coefficients to isolate engine faults while accounting for uncertainty in gas path measurements. It automates the reasoning process of an experienced powerplant engineer. Tests with simulated data show that the fuzzy system isolates faults with an accuracy of 89% with only the four cockpit measurements. However, if additional pressure and temperature probes between the compressors and before the burner, which are often found in newer jet engines, are considered, the fault isolation accuracy rises to as high as 98%. In addition, the additional sensors are useful in keeping the fault isolation system robust as quality of the measured data deteriorates.
Resumo:
Satisfiability algorithms for propositional logic have improved enormously in recently years. This improvement increases the attractiveness of satisfiability methods for first-order logic that reduce the problem to a series of ground-level satisfiability problems. R. Jeroslow introduced a partial instantiation method of this kind that differs radically from the standard resolution-based methods. This paper lays the theoretical groundwork for an extension of his method that is general enough and efficient enough for general logic programming with indefinite clauses. In particular we improve Jeroslow's approach by (1) extending it to logic with functions, (2) accelerating it through the use of satisfiers, as introduced by Gallo and Rago, and (3) simplifying it to obtain further speedup. We provide a similar development for a "dual" partial instantiation approach defined by Hooker and suggest a primal-dual strategy. We prove correctness of the primal and dual algorithms for full first-order logic with functions, as well as termination on unsatisfiable formulas. We also report some preliminary computational results.
Resumo:
Prediction of variable bit rate compressed video traffic is critical to dynamic allocation of resources in a network. In this paper, we propose a technique for preprocessing the dataset used for training a video traffic predictor. The technique involves identifying the noisy instances in the data using a fuzzy inference system. We focus on three prediction techniques, namely, linear regression, neural network and support vector regression and analyze their performance on H.264 video traces. Our experimental results reveal that data preprocessing greatly improves the performance of linear regression and neural network, but is not effective on support vector regression.
Resumo:
Effective network overload alleviation is very much essential in order to maintain security and integrity from the operational viewpoint of deregulated power systems. This paper aims at developing a methodology to reschedule the active power generation from the sources in order to manage the network congestion under normal/contingency conditions. An effective method has been proposed using fuzzy rule based inference system. Using virtual flows concept, which provides partial contributions/counter flows in the network elements is used as a basis in the proposed method to manage network congestions to the possible extent. The proposed method is illustrated on a sample 6 bus test system and on modified IEEE 39 bus system.