44 resultados para likelihood-based inference
em CentAUR: Central Archive University of Reading - UK
Resumo:
Inferring population admixture from genetic data and quantifying it is a difficult but crucial task in evolutionary and conservation biology. Unfortunately state-of-the-art probabilistic approaches are computationally demanding. Effectively exploiting the computational power of modern multiprocessor systems can thus have a positive impact to Monte Carlo-based simulation of admixture modeling. A novel parallel approach is briefly described and promising results on its message passing interface (MPI)-based C implementation are reported.
Resumo:
Microsatellites are widely used in genetic analyses, many of which require reliable estimates of microsatellite mutation rates, yet the factors determining mutation rates are uncertain. The most straightforward and conclusive method by which to study mutation is direct observation of allele transmissions in parent-child pairs, and studies of this type suggest a positive, possibly exponential, relationship between mutation rate and allele size, together with a bias toward length increase. Except for microsatellites on the Y chromosome, however, previous analyses have not made full use of available data and may have introduced bias: mutations have been identified only where child genotypes could not be generated by transmission from parents' genotypes, so that the probability that a mutation is detected depends on the distribution of allele lengths and varies with allele length. We introduce a likelihood-based approach that has two key advantages over existing methods. First, we can make formal comparisons between competing models of microsatellite evolution; second, we obtain asymptotically unbiased and efficient parameter estimates. Application to data composed of 118,866 parent-offspring transmissions of AC microsatellites supports the hypothesis that mutation rate increases exponentially with microsatellite length, with a suggestion that contractions become more likely than expansions as length increases. This would lead to a stationary distribution for allele length maintained by mutational balance. There is no evidence that contractions and expansions differ in their step size distributions.
Resumo:
The utility of an "ecologically rational" recognition-based decision rule in multichoice decision problems is analyzed, varying the type of judgment required (greater or lesser). The maximum size and range of a counterintuitive advantage associated with recognition-based judgment (the "less-is-more effect") is identified for a range of cue validity values. Greater ranges of the less-is-more effect occur when participants are asked which is the greatest of to choices (m > 2) than which is the least. Less-is-more effects also have greater range for larger values of in. This implies that the classic two-altemative forced choice task, as studied by Goldstein and Gigerenzer (2002), may not be the most appropriate test case for less-is-more effects.
Resumo:
Inference on the basis of recognition alone is assumed to occur prior to accessing further information (Pachur & Hertwig, 2006). A counterintuitive result of this is the “less-is-more” effect: a drop in the accuracy with which choices are made as to which of two or more items scores highest on a given criterion as more items are learned (Frosch, Beaman & McCloy, 2007; Goldstein & Gigerenzer, 2002). In this paper, we show that less-is-more effects are not unique to recognition-based inference but can also be observed with a knowledge-based strategy provided two assumptions, limited information and differential access, are met. The LINDA model which embodies these assumptions is presented. Analysis of the less-is-more effects predicted by LINDA and by recognition-driven inference shows that these occur for similar reasons and casts doubt upon the “special” nature of recognition-based inference. Suggestions are made for empirical tests to compare knowledge-based and recognition-based less-is-more effects
Resumo:
Genetic association analyses of family-based studies with ordered categorical phenotypes are often conducted using methods either for quantitative or for binary traits, which can lead to suboptimal analyses. Here we present an alternative likelihood-based method of analysis for single nucleotide polymorphism (SNP) genotypes and ordered categorical phenotypes in nuclear families of any size. Our approach, which extends our previous work for binary phenotypes, permits straightforward inclusion of covariate, gene-gene and gene-covariate interaction terms in the likelihood, incorporates a simple model for ascertainment and allows for family-specific effects in the hypothesis test. Additionally, our method produces interpretable parameter estimates and valid confidence intervals. We assess the proposed method using simulated data, and apply it to a polymorphism in the c-reactive protein (CRP) gene typed in families collected to investigate human systemic lupus erythematosus. By including sex interactions in the analysis, we show that the polymorphism is associated with anti-nuclear autoantibody (ANA) production in females, while there appears to be no effect in males.
Resumo:
We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.
Resumo:
We describe and evaluate a new estimator of the effective population size (N-e), a critical parameter in evolutionary and conservation biology. This new "SummStat" N-e. estimator is based upon the use of summary statistics in an approximate Bayesian computation framework to infer N-e. Simulations of a Wright-Fisher population with known N-e show that the SummStat estimator is useful across a realistic range of individuals and loci sampled, generations between samples, and N-e values. We also address the paucity of information about the relative performance of N-e estimators by comparing the SUMMStat estimator to two recently developed likelihood-based estimators and a traditional moment-based estimator. The SummStat estimator is the least biased of the four estimators compared. In 32 of 36 parameter combinations investigated rising initial allele frequencies drawn from a Dirichlet distribution, it has the lowest bias. The relative mean square error (RMSE) of the SummStat estimator was generally intermediate to the others. All of the estimators had RMSE > 1 when small samples (n = 20, five loci) were collected a generation apart. In contrast, when samples were separated by three or more generations and Ne less than or equal to 50, the SummStat and likelihood-based estimators all had greatly reduced RMSE. Under the conditions simulated, SummStat confidence intervals were more conservative than the likelihood-based estimators and more likely to include true N-e. The greatest strength of the SummStat estimator is its flexible structure. This flexibility allows it to incorporate any, potentially informative summary statistic from Population genetic data.
Resumo:
Studies of ignorance-driven decision making have been employed to analyse when ignorance should prove advantageous on theoretical grounds or else they have been employed to examine whether human behaviour is consistent with an ignorance-driven inference strategy (e. g., the recognition heuristic). In the current study we examine whether-under conditions where such inferences might be expected-the advantages that theoretical analyses predict are evident in human performance data. A single experiment shows that, when asked to make relative wealth judgements, participants reliably use recognition as a basis for their judgements. Their wealth judgements under these conditions are reliably more accurate when some of the target names are unknown than when participants recognize all of the names (a "less-is-more effect"). These results are consistent across a number of variations: the number of options given to participants and the nature of the wealth judgement. A basic model of recognition-based inference predicts these effects.
Resumo:
“Fast & frugal” heuristics represent an appealing way of implementing bounded rationality and decision-making under pressure. The recognition heuristic is the simplest and most fundamental of these heuristics. Simulation and experimental studies have shown that this ignorance-driven heuristic inference can prove superior to knowledge based inference (Borges, Goldstein, Ortman & Gigerenzer, 1999; Goldstein & Gigerenzer, 2002) and have shown how the heuristic could develop from ACT-R’s forgetting function (Schooler & Hertwig, 2005). Mathematical analyses also demonstrate that, under certain conditions, a “less-is-more effect” will always occur (Goldstein & Gigerenzer, 2002). The further analyses presented in this paper show, however, that these conditions may constitute a special case and that the less-is-more effect in decision-making is subject to the moderating influence of the number of options to be considered and the framing of the question.
Resumo:
Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference.
Resumo:
Competency management is a very important part of a well-functioning organisation. Unfortunately competency descriptions are not uniformly specified nor defined across borders: National, sectorial or organisational, leading to an opaque competency description market with a multitude of competency frameworks and competency benchmarks. An ontology is a formalised description of a domain, which enables automated reasoning engines to be built which by utilising the interrelations between entities can make “intelligent” choices in different situations within the domain. Introducing formalised competency ontologies automated tools, such as skill gap analysis, training suggestion generation, job search and recruitment, can be developed, which compare and contrast different competency descriptions on the semantic level. The major problem with defining a common formalised ontology for competencies is that there are so many viewpoints of competencies and competency frameworks. Work within the TRACE project has focused on finding common trends within different competency frameworks in order to allow an intermediate competency description to be made, which other frameworks can reference. This research has shown that competencies can be divided up into “knowledge”, “skills” and what we call “others”. An ontology has been created based on this with a simple structure of different “kinds” of “knowledges” and “skills” using semantic interrelations to define the basic semantic structure of the ontology. A prototype tool for analysing a skill gap analysis has been developed. Personal profiles can be produced using the tool and a skill gap analysis is performed on a desired competency profile by using an ontologically based inference engine, which is able to list closest fit and possible proficiency gaps
Resumo:
Inferences consistent with “recognition-based” decision-making may be drawn for various reasons other than recognition alone. We demonstrate that, for 2-alternative forced-choice decision tasks, less-is-more effects (reduced performance with additional learning) are not restricted to recognition-based inference but can also be seen in circumstances where inference is knowledge-based but item knowledge is limited. One reason why such effects may not be observed more widely is the dependence of the effect on specific values for the validity of recognition and knowledge cues. We show that both recognition and knowledge validity may vary as a function of the number of items recognized. The implications of these findings for the special nature of recognition information, and for the investigation of recognition-based inference, are discussed
Resumo:
The paper concerns the design and analysis of serial dilution assays to estimate the infectivity of a sample of tissue when it is assumed that the sample contains a finite number of indivisible infectious units such that a subsample will be infectious if it contains one or more of these units. The aim of the study is to estimate the number of infectious units in the original sample. The standard approach to the analysis of data from such a study is based on the assumption of independence of aliquots both at the same dilution level and at different dilution levels, so that the numbers of infectious units in the aliquots follow independent Poisson distributions. An alternative approach is based on calculation of the expected value of the total number of samples tested that are not infectious. We derive the likelihood for the data on the basis of the discrete number of infectious units, enabling calculation of the maximum likelihood estimate and likelihood-based confidence intervals. We use the exact probabilities that are obtained to compare the maximum likelihood estimate with those given by the other methods in terms of bias and standard error and to compare the coverage of the confidence intervals. We show that the methods have very similar properties and conclude that for practical use the method that is based on the Poisson assumption is to be recommended, since it can be implemented by using standard statistical software. Finally we consider the design of serial dilution assays, concluding that it is important that neither the dilution factor nor the number of samples that remain untested should be too large.
Resumo:
Micromorphological characters of the fruiting bodies, such as ascus-type and hymenial amyloidity, and secondary chemistry have been widely employed as key characters in Ascomycota classification. However, the evolution of these characters has yet not been studied using molecular phylogenies. We have used a combined Bayesian and maximum likelihood based approach to trace character evolution on a tree inferred from a combined analysis of nuclear and mitochondrial ribosomal DNA sequences. The maximum likelihood aspect overcomes simplifications inherent in maximum parsimony methods, whereas the Markov chain Monte Carlo aspect renders results independent of any particular phylogenetic tree. The results indicate that the evolution of the two chemical characters is quite different, being stable once developed for the medullary lecanoric acid, whereas the cortical chlorinated xanthones appear to have been lost several times. The current ascus-types and the amyloidity of the hymenial gel in Pertusariaceae appear to have been developed within the family. The basal ascus-type of pertusarialean fungi remains unknown. (c) 2006 The Linnean Society of London, Biological Journal of the Linnean Society, 2006, 89, 615-626.