937 resultados para likelihood-based inference


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Given a sample from a fully specified parametric model, let Zn be a given finite-dimensional statistic - for example, an initial estimator or a set of sample moments. We propose to (re-)estimate the parameters of the model by maximizing the likelihood of Zn. We call this the maximum indirect likelihood (MIL) estimator. We also propose a computationally tractable Bayesian version of the estimator which we refer to as a Bayesian Indirect Likelihood (BIL) estimator. In most cases, the density of the statistic will be of unknown form, and we develop simulated versions of the MIL and BIL estimators. We show that the indirect likelihood estimators are consistent and asymptotically normally distributed, with the same asymptotic variance as that of the corresponding efficient two-step GMM estimator based on the same statistic. However, our likelihood-based estimators, by taking into account the full finite-sample distribution of the statistic, are higher order efficient relative to GMM-type estimators. Furthermore, in many cases they enjoy a bias reduction property similar to that of the indirect inference estimator. Monte Carlo results for a number of applications including dynamic and nonlinear panel data models, a structural auction model and two DSGE models show that the proposed estimators indeed have attractive finite sample properties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the context of multivariate regression (MLR) and seemingly unrelated regressions (SURE) models, it is well known that commonly employed asymptotic test criteria are seriously biased towards overrejection. in this paper, we propose finite-and large-sample likelihood-based test procedures for possibly non-linear hypotheses on the coefficients of MLR and SURE systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Centralnotations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform.In this way very elaborated aspects of mathematical statistics can be understoodeasily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating,combination of likelihood and robust M-estimation functions are simple additions/perturbations in A2(Pprior). Weighting observations corresponds to a weightedaddition of the corresponding evidence.Likelihood based statistics for general exponential families turns out to have aparticularly easy interpretation in terms of A2(P). Regular exponential families formfinite dimensional linear subspaces of A2(P) and they correspond to finite dimensionalsubspaces formed by their posterior in the dual information space A2(Pprior).The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P.The discussion of A2(P) valued random variables, such as estimation functionsor likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Central notations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform. In this way very elaborated aspects of mathematical statistics can be understood easily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating, combination of likelihood and robust M-estimation functions are simple additions/ perturbations in A2(Pprior). Weighting observations corresponds to a weighted addition of the corresponding evidence. Likelihood based statistics for general exponential families turns out to have a particularly easy interpretation in terms of A2(P). Regular exponential families form finite dimensional linear subspaces of A2(P) and they correspond to finite dimensional subspaces formed by their posterior in the dual information space A2(Pprior). The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P. The discussion of A2(P) valued random variables, such as estimation functions or likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Inference on the basis of recognition alone is assumed to occur prior to accessing further information (Pachur & Hertwig, 2006). A counterintuitive result of this is the “less-is-more” effect: a drop in the accuracy with which choices are made as to which of two or more items scores highest on a given criterion as more items are learned (Frosch, Beaman & McCloy, 2007; Goldstein & Gigerenzer, 2002). In this paper, we show that less-is-more effects are not unique to recognition-based inference but can also be observed with a knowledge-based strategy provided two assumptions, limited information and differential access, are met. The LINDA model which embodies these assumptions is presented. Analysis of the less-is-more effects predicted by LINDA and by recognition-driven inference shows that these occur for similar reasons and casts doubt upon the “special” nature of recognition-based inference. Suggestions are made for empirical tests to compare knowledge-based and recognition-based less-is-more effects

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Likelihood ratio tests can be substantially size distorted in small- and moderate-sized samples. In this paper, we apply Skovgaard`s [Skovgaard, I.M., 2001. Likelihood asymptotics. Scandinavian journal of Statistics 28, 3-321] adjusted likelihood ratio statistic to exponential family nonlinear models. We show that the adjustment term has a simple compact form that can be easily implemented from standard statistical software. The adjusted statistic is approximately distributed as X(2) with high degree of accuracy. It is applicable in wide generality since it allows both the parameter of interest and the nuisance parameter to be vector-valued. Unlike the modified profile likelihood ratio statistic obtained from Cox and Reid [Cox, D.R., Reid, N., 1987. Parameter orthogonality and approximate conditional inference. journal of the Royal Statistical Society B49, 1-39], the adjusted statistic proposed here does not require an orthogonal parameterization. Numerical comparison of likelihood-based tests of varying dispersion favors the test we propose and a Bartlett-corrected version of the modified profile likelihood ratio test recently obtained by Cysneiros and Ferrari [Cysneiros, A.H.M.A., Ferrari, S.L.P., 2006. An improved likelihood ratio test for varying dispersion in exponential family nonlinear models. Statistics and Probability Letters 76 (3), 255-265]. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An extension of some standard likelihood based procedures to heteroscedastic nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. This novel class of models provides a useful generalization of the heteroscedastic symmetrical nonlinear regression models (Cysneiros et al., 2010), since the random term distributions cover both symmetric as well as asymmetric and heavy-tailed distributions such as skew-t, skew-slash, skew-contaminated normal, among others. A simple EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters is presented and the observed information matrix is derived analytically. In order to examine the performance of the proposed methods, some simulation studies are presented to show the robust aspect of this flexible class against outlying and influential observations and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. Furthermore, local influence measures and the one-step approximations of the estimates in the case-deletion model are obtained. Finally, an illustration of the methodology is given considering a data set previously analyzed under the homoscedastic skew-t nonlinear regression model. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents Bayesian solutions to inference problems for three types of social network data structures: a single observation of a social network, repeated observations on the same social network, and repeated observations on a social network developing through time. A social network is conceived as being a structure consisting of actors and their social interaction with each other. A common conceptualisation of social networks is to let the actors be represented by nodes in a graph with edges between pairs of nodes that are relationally tied to each other according to some definition. Statistical analysis of social networks is to a large extent concerned with modelling of these relational ties, which lends itself to empirical evaluation. The first paper deals with a family of statistical models for social networks called exponential random graphs that takes various structural features of the network into account. In general, the likelihood functions of exponential random graphs are only known up to a constant of proportionality. A procedure for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods is presented. The algorithm consists of two basic steps, one in which an ordinary Metropolis-Hastings up-dating step is used, and another in which an importance sampling scheme is used to calculate the acceptance probability of the Metropolis-Hastings step. In paper number two a method for modelling reports given by actors (or other informants) on their social interaction with others is investigated in a Bayesian framework. The model contains two basic ingredients: the unknown network structure and functions that link this unknown network structure to the reports given by the actors. These functions take the form of probit link functions. An intrinsic problem is that the model is not identified, meaning that there are combinations of values on the unknown structure and the parameters in the probit link functions that are observationally equivalent. Instead of using restrictions for achieving identification, it is proposed that the different observationally equivalent combinations of parameters and unknown structure be investigated a posteriori. Estimation of parameters is carried out using Gibbs sampling with a switching devise that enables transitions between posterior modal regions. The main goal of the procedures is to provide tools for comparisons of different model specifications. Papers 3 and 4, propose Bayesian methods for longitudinal social networks. The premise of the models investigated is that overall change in social networks occurs as a consequence of sequences of incremental changes. Models for the evolution of social networks using continuos-time Markov chains are meant to capture these dynamics. Paper 3 presents an MCMC algorithm for exploring the posteriors of parameters for such Markov chains. More specifically, the unobserved evolution of the network in-between observations is explicitly modelled thereby avoiding the need to deal with explicit formulas for the transition probabilities. This enables likelihood based parameter inference in a wider class of network evolution models than has been available before. Paper 4 builds on the proposed inference procedure of Paper 3 and demonstrates how to perform model selection for a class of network evolution models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we propose exact likelihood-based mean-variance efficiency tests of the market portfolio in the context of Capital Asset Pricing Model (CAPM), allowing for a wide class of error distributions which include normality as a special case. These tests are developed in the frame-work of multivariate linear regressions (MLR). It is well known however that despite their simple statistical structure, standard asymptotically justified MLR-based tests are unreliable. In financial econometrics, exact tests have been proposed for a few specific hypotheses [Jobson and Korkie (Journal of Financial Economics, 1982), MacKinlay (Journal of Financial Economics, 1987), Gib-bons, Ross and Shanken (Econometrica, 1989), Zhou (Journal of Finance 1993)], most of which depend on normality. For the gaussian model, our tests correspond to Gibbons, Ross and Shanken’s mean-variance efficiency tests. In non-gaussian contexts, we reconsider mean-variance efficiency tests allowing for multivariate Student-t and gaussian mixture errors. Our framework allows to cast more evidence on whether the normality assumption is too restrictive when testing the CAPM. We also propose exact multivariate diagnostic checks (including tests for multivariate GARCH and mul-tivariate generalization of the well known variance ratio tests) and goodness of fit tests as well as a set estimate for the intervening nuisance parameters. Our results [over five-year subperiods] show the following: (i) multivariate normality is rejected in most subperiods, (ii) residual checks reveal no significant departures from the multivariate i.i.d. assumption, and (iii) mean-variance efficiency tests of the market portfolio is not rejected as frequently once it is allowed for the possibility of non-normal errors.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper prepared for the Handbook of Statistics (Vol.14: Statistical Methods in Finance), surveys the subject of stochastic volatility. the following subjects are covered: volatility in financial markets (instantaneous volatility of asset returns, implied volatilities in option prices and related stylized facts), statistical modelling in discrete and continuous time and, finally, statistical inference (methods of moments, quasi-maximum likelihood, likelihood-based and bayesian methods and indirect inference).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Cette thèse comporte trois articles dont un est publié et deux en préparation. Le sujet central de la thèse porte sur le traitement des valeurs aberrantes représentatives dans deux aspects importants des enquêtes que sont : l’estimation des petits domaines et l’imputation en présence de non-réponse partielle. En ce qui concerne les petits domaines, les estimateurs robustes dans le cadre des modèles au niveau des unités ont été étudiés. Sinha & Rao (2009) proposent une version robuste du meilleur prédicteur linéaire sans biais empirique pour la moyenne des petits domaines. Leur estimateur robuste est de type «plugin», et à la lumière des travaux de Chambers (1986), cet estimateur peut être biaisé dans certaines situations. Chambers et al. (2014) proposent un estimateur corrigé du biais. En outre, un estimateur de l’erreur quadratique moyenne a été associé à ces estimateurs ponctuels. Sinha & Rao (2009) proposent une procédure bootstrap paramétrique pour estimer l’erreur quadratique moyenne. Des méthodes analytiques sont proposées dans Chambers et al. (2014). Cependant, leur validité théorique n’a pas été établie et leurs performances empiriques ne sont pas pleinement satisfaisantes. Ici, nous examinons deux nouvelles approches pour obtenir une version robuste du meilleur prédicteur linéaire sans biais empirique : la première est fondée sur les travaux de Chambers (1986), et la deuxième est basée sur le concept de biais conditionnel comme mesure de l’influence d’une unité de la population. Ces deux classes d’estimateurs robustes des petits domaines incluent également un terme de correction pour le biais. Cependant, ils utilisent tous les deux l’information disponible dans tous les domaines contrairement à celui de Chambers et al. (2014) qui utilise uniquement l’information disponible dans le domaine d’intérêt. Dans certaines situations, un biais non négligeable est possible pour l’estimateur de Sinha & Rao (2009), alors que les estimateurs proposés exhibent un faible biais pour un choix approprié de la fonction d’influence et de la constante de robustesse. Les simulations Monte Carlo sont effectuées, et les comparaisons sont faites entre les estimateurs proposés et ceux de Sinha & Rao (2009) et de Chambers et al. (2014). Les résultats montrent que les estimateurs de Sinha & Rao (2009) et de Chambers et al. (2014) peuvent avoir un biais important, alors que les estimateurs proposés ont une meilleure performance en termes de biais et d’erreur quadratique moyenne. En outre, nous proposons une nouvelle procédure bootstrap pour l’estimation de l’erreur quadratique moyenne des estimateurs robustes des petits domaines. Contrairement aux procédures existantes, nous montrons formellement la validité asymptotique de la méthode bootstrap proposée. Par ailleurs, la méthode proposée est semi-paramétrique, c’est-à-dire, elle n’est pas assujettie à une hypothèse sur les distributions des erreurs ou des effets aléatoires. Ainsi, elle est particulièrement attrayante et plus largement applicable. Nous examinons les performances de notre procédure bootstrap avec les simulations Monte Carlo. Les résultats montrent que notre procédure performe bien et surtout performe mieux que tous les compétiteurs étudiés. Une application de la méthode proposée est illustrée en analysant les données réelles contenant des valeurs aberrantes de Battese, Harter & Fuller (1988). S’agissant de l’imputation en présence de non-réponse partielle, certaines formes d’imputation simple ont été étudiées. L’imputation par la régression déterministe entre les classes, qui inclut l’imputation par le ratio et l’imputation par la moyenne sont souvent utilisées dans les enquêtes. Ces méthodes d’imputation peuvent conduire à des estimateurs imputés biaisés si le modèle d’imputation ou le modèle de non-réponse n’est pas correctement spécifié. Des estimateurs doublement robustes ont été développés dans les années récentes. Ces estimateurs sont sans biais si l’un au moins des modèles d’imputation ou de non-réponse est bien spécifié. Cependant, en présence des valeurs aberrantes, les estimateurs imputés doublement robustes peuvent être très instables. En utilisant le concept de biais conditionnel, nous proposons une version robuste aux valeurs aberrantes de l’estimateur doublement robuste. Les résultats des études par simulations montrent que l’estimateur proposé performe bien pour un choix approprié de la constante de robustesse.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The present study investigates the systematics and evolution of the Neotropical genus Deuterocohnia Mez (Bromeliaceae). It provides a comprehensive taxonomic revision as well as phylogenetic analyses based on chloroplast and nuclear DNA sequences and presents a hypothesis on the evolution of the genus. A broad morphological, anatomical, biogeographical and ecological overview of the genus is given in the first part of the study. For morphological character assessment more than 700 herbarium specimens from 39 herbaria as well as living plant material in the field and in the living collections of botanical gardens were carefully examined. The arid habitats, in which the species of Deuterocohnia grow, are reflected by the morphological and anatomical characters of the species. Important characters for species delimitation were identified, like the length of the inflorescence, the branching order, the density of flowers on partial inflorescences, the relation of the length of the primary bracts to that of the partial inflorescence, the sizes of floral bracts, sepals and petals, flower colour, the presence or absence of a pedicel, the curvature of the stamina and the petals during anthesis. After scrutinizing the nomenclatural history of the taxa belonging to Deuterocohnia – including the 1992 syonymized genus Abromeitiella – 17 species, 4 subspecies and 4 varieties are accepted in the present revision. Taxonomic changes were made in the following cases: (I) New combinations: A. abstrusa (A. Cast.) N. Schütz is re-established – as defined by Castellanos (1931) – and transfered to D. abstrusa; D. brevifolia (Griseb.) M.A. Spencer & L.B. Sm. includes accessions of the former D. lorentziana (Mez) M.A. Spencer & L.B. Sm., which are not assigned to D. abstrusa; D. bracteosa W. Till is synonymized to D. strobilifera Mez; D. meziana Kuntze ex Mez var. carmineo-viridiflora Rauh is classified as a subspecies of D. meziana (ssp. carmineo-viridiflora (Rauh) N. Schütz); D. pedicellata W. Till is classified as a subspecies of D. meziana (ssp. pedicellata (W. Till) N. Schütz); D. scapigera (Rauh & L. Hrom.) M.A. Spencer & L.B. Sm ssp. sanctae-crucis R. Vásquez & Ibisch is classified as a species (D. sanctae-crucis (R. Vásquez & Ibisch) N. Schütz); (II) New taxa: a new subspecies of D. meziana Kuntze ex Mez is established; a new variety of D. scapigera is established; (the new taxa will be validly published elsewhere); (III) New type: an epitype for D. longipetala was chosen. All other species were kept according to Spencer and Smith (1992) or – in the case of more recently described species – according to the protologue. Beside the nomenclatural notes and the detailed descriptions, information on distribution, habitat and ecology, etymology and taxonomic delimitation is provided for the genus and for each of its species. An key was constructed for the identification of currently accepted species, subspecies and varieties. The key is based on easily detectable morphological characters. The former synonymization of the genus Abromeitiella into Deuterocohnia (Spencer and Smith 1992) is re-evalutated in the present study. Morphological as well as molecular investigations revealed Deuterocohnia incl. Abromeitiella as being monophyletic, with some indications that a monophyletic Abromeitiella lineage arose from within Deuterocohnia. Thus the union of both genera is confirmed. The second part of the present thesis describes and discusses the molecular phylogenies and networks. Molecular analyses of three chloroplast intergenic spacers (rpl32-trnL, rps16-trnK, trnS-ycf3) were conducted with a sample set of 119 taxa. This set included 103 Deuterocohnia accessions from all 17 described species of the genus and 16 outgroup taxa from the remainder of Pitcairnioideae s.str. (Dyckia (8 sp.), Encholirium (2 sp.), Fosterella (4 sp.) and Pitcairnia (2 sp.)). With its high sampling density, the present investigation by far represents the most comprehensive molecular study of Deuterocohnia up till now. All data sets were analyzed separately as well as in combination, and various optimality criteria for phylogenetic tree construction were applied (Maximum Parsimony, Maximum Likelihood, Bayesian inferences and the distance method Neighbour Joining). Congruent topologies were generally obtained with different algorithms and optimality criteria, but individual clades received different degrees of statistical support in some analyses. The rps16-trnK locus was the most informative among the three spacer regions examined. The results of the chloroplast DNA analyses revealed a highly supported paraphyly of Deuterocohnia. Thus, the cpDNA trees divide the genus into two subclades (A and B), of which Deuterocohnia subclade B is sister to the included Dyckia and Encholirium accessions, and both together are sister to Deuterocohnia subclade A. To further examine the relationship between Deuterocohnia and Dyckia/Encholirium at the generic level, two nuclear low copy markers (PRK exon2-5 and PHYC exon1) were analysed with a reduced taxon set. This set included 22 Deuterocohnia accessions (including members of both cpDNA subclades), 2 Dyckia, 2 Encholirium and 2 Fosterella species. Phylogenetic trees were constructed as described above, and for comparison the same reduced taxon set was also analysed at the three cpDNA data loci. In contrast to the cpDNA results, the nuclear DNA data strongly supported the monophyly of Deuterocohnia, which takes a sister position to a clade of Dyckia and Encholirium samples. As morphology as well as nuclear DNA data generated in the present study and in a former AFLP analysis (Horres 2003) all corroborate the monophyly of Deuterocohnia, the apparent paraphyly displayed in cpDNA analyses is interpreted to be the consequence of a chloroplast capture event. This involves the introgression of the chloroplast genome from the common ancestor of the Dyckia/ Encholirium lineage into the ancestor of Deuterocohnia subclade B species. The chloroplast haplotypes are not species-specific in Deuterocohnia. Thus, one haplotype was sometimes shared by several species, where the same species may harbour different haplotypes. The arrangement of haplotypes followed geographical patterns rather than taxonomic boundaries, which may indicate some residual gene flow among populations from different Deuteroccohnia species. Phenotypic species coherence on the background of ongoing gene flow may then be maintained by sets of co-adapted alleles, as was suggested by the porous genome concept (Wu 2001, Palma-Silva et al. 2011). The results of the present study suggest the following scenario for the evolution of Deuterocohnia and its species. Deuterocohnia longipetala may be envisaged as a representative of the ancestral state within the genus. This is supported by (1) the wide distribution of this species; (2) the overlap in distribution area with species of Dyckia; (3) the laxly flowered inflorescences, which are also typical for Dyckia; (4) the yellow petals with a greenish tip, present in most other Deuterocohnia species. The following six extant lineages within Deuterocohnia might have independently been derived from this ancestral state with a few changes each: (I) D. meziana, D. brevispicata and D. seramisiana (Bolivia, lowland to montane areas, mostly reddish-greenish coloured, very laxly to very densely flowered); (II) D. strobilifera (Bolivia, high Andean mountains, yellow flowers, densely flowered); (III) D. glandulosa (Bolivia, montane areas, yellow-greenish flowers, densely flowered); (IV) D. haumanii, D. schreiteri, D. digitata, and D. chrysantha (Argentina, Chile, E Andean mountains and Atacama desert, yellow-greenish flowers, densely flowered); (V) D. recurvipetala (Argentina, foothills of the Andes, recurved yellow flowers, laxly flowered); (VI) D. gableana, D. scapigera, D. sanctae-crucis, D. abstrusa, D. brevifolia, D. lotteae (former Abromeitiella species, Bolivia, Argentina, higher Andean mountains, greenish-yellow flowers, inflorescence usually simple). Originating from the lower montane Andean regions, at least four lineages of the genus (I, II, IV, VI) adapted in part to higher altitudes by developing densely flowered partial inflorescences, shorter flowers and – in at least three lineages (II, IV, VI) – smaller rosettes, whereas species spreading into the lowlands (I, V) developed larger plants, laxly flowered, amply branched inflorescences and in part larger flowers (I).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Genetic association analyses of family-based studies with ordered categorical phenotypes are often conducted using methods either for quantitative or for binary traits, which can lead to suboptimal analyses. Here we present an alternative likelihood-based method of analysis for single nucleotide polymorphism (SNP) genotypes and ordered categorical phenotypes in nuclear families of any size. Our approach, which extends our previous work for binary phenotypes, permits straightforward inclusion of covariate, gene-gene and gene-covariate interaction terms in the likelihood, incorporates a simple model for ascertainment and allows for family-specific effects in the hypothesis test. Additionally, our method produces interpretable parameter estimates and valid confidence intervals. We assess the proposed method using simulated data, and apply it to a polymorphism in the c-reactive protein (CRP) gene typed in families collected to investigate human systemic lupus erythematosus. By including sex interactions in the analysis, we show that the polymorphism is associated with anti-nuclear autoantibody (ANA) production in females, while there appears to be no effect in males.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate- variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.