Biblioteca Digital

947 resultados para Unconditional maximum likelihood criterion

Estimating and evaluating the statistics of gapped local-alignment scores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.

Enhancing the selection of a model-based clustering with external categorical variables

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.

Enhancing the selection of a model-based clustering with external categorical variables

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.

The serine repeat antigen (SERA) gene family phylogeny in Plasmodium: the impact of GC content and reconciliation of gene and species trees.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.

Low-density parity-check codes for nonergodic block-fading channels

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We design powerful low-density parity-check (LDPC) codes with iterative decoding for the block-fading channel. We first study the case of maximum-likelihood decoding, and show that the design criterion is rather straightforward. Since optimal constructions for maximum-likelihood decoding do not performwell under iterative decoding, we introduce a new family of full-diversity LDPC codes that exhibit near-outage-limit performance under iterative decoding for all block-lengths. This family competes favorably with multiplexed parallel turbo codes for nonergodic channels.

Postglacial recolonization at a snail's pace (Trochulus villosus): confronting competing refugia hypotheses using model selection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The localization of Last Glacial Maximum (LGM) refugia is crucial information to understand a species' history and predict its reaction to future climate changes. However, many phylogeographical studies often lack sampling designs intensive enough to precisely localize these refugia. The hairy land snail Trochulus villosus has a small range centred on Switzerland, which could be intensively covered by sampling 455 individuals from 52 populations. Based on mitochondrial DNA sequences (COI and 16S), we identified two divergent lineages with distinct geographical distributions. Bayesian skyline plots suggested that both lineages expanded at the end of the LGM. To find where the origin populations were located, we applied the principles of ancestral character reconstruction and identified a candidate refugium for each mtDNA lineage: the French Jura and Central Switzerland, both ice-free during the LGM. Additional refugia, however, could not be excluded, as suggested by the microsatellite analysis of a population subset. Modelling the LGM niche of T. villosus, we showed that suitable climatic conditions were expected in the inferred refugia, but potentially also in the nunataks of the alpine ice shield. In a model selection approach, we compared several alternative recolonization scenarios by estimating the Akaike information criterion for their respective maximum-likelihood migration rates. The 'two refugia' scenario received by far the best support given the distribution of genetic diversity in T. villosus populations. Provided that fine-scale sampling designs and various analytical approaches are combined, it is possible to refine our necessary understanding of species responses to environmental changes.

Molecular Identification of Trichoderma spp. in Garlic and Onion Fields and In Vitro Antagonism Trials on Sclerotium cepivorum

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ABSTRACT Trichoderma species are non-pathogenic microorganisms that protect against fungal diseases and contribute to increased crop yields. However, not all Trichoderma species have the same effects on crop or a pathogen, whereby the characterization and identification of strains at the species level is the first step in the use of a microorganism. The aim of this study was the identification – at species level – of five strains of Trichoderma isolated from soil samples obtained from garlic and onion fields located in Costa Rica, through the analysis of the ITS1, 5.8S, and ITS2 ribosomal RNA regions; as well as the determination of their individual antagonistic ability over S. cepivorum Berkeley. In order to distinguish the strains, the amplified products were analyzed using MEGA v6.0 software, calculating the genetic distances through the Tamura-Nei model and building the phylogenetic tree using the Maximum Likelihood method. We established that the evaluated strains belonged to the species T. harzianum and T. asperellum; however it was not possible to identify one of the analyzed strains based on the species criterion. To evaluate their antagonistic ability, the dual culture technique, Bell’s scale, and the percentage inhibition of radial growth (PIRG) were used, evidencing that one of the T. asperellum isolates presented the best yields under standard, solid fermentation conditions.

A performance lower bound for quadratic timing recovery accounting for the symbol transition density

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The symbol transition density in a digitally modulated signal affects the performance of practical synchronization schemes designed for timing recovery. This paper focuses on the derivation of simple performance limits for the estimation of the time delay of a noisy linearly modulated signal in the presence of various degrees of symbol correlation produced by the varioustransition densities in the symbol streams. The paper develops high- and low-signal-to-noise ratio (SNR) approximations of the so-called (Gaussian) unconditional Cramér–Rao bound (UCRB),as well as general expressions that are applicable in all ranges of SNR. The derived bounds are valid only for the class of quadratic, non-data-aided (NDA) timing recovery schemes. To illustrate the validity of the derived bounds, they are compared with the actual performance achieved by some well-known quadratic NDA timing recovery schemes. The impact of the symbol transitiondensity on the classical threshold effect present in NDA timing recovery schemes is also analyzed. Previous work on performancebounds for timing recovery from various authors is generalized and unified in this contribution.

Joint array combining and MLSE for single-user receivers in multipath Gaussian multiuser channels

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The well-known structure of an array combiner along with a maximum likelihood sequence estimator (MLSE) receiveris the basis for the derivation of a space-time processor presentinggood properties in terms of co-channel and intersymbol interferencerejection. The use of spatial diversity at the receiver front-endtogether with a scalar MLSE implies a joint design of the spatialcombiner and the impulse response for the sequence detector. Thisis faced using the MMSE criterion under the constraint that thedesired user signal power is not cancelled, yielding an impulse responsefor the sequence detector that is matched to the channel andcombiner response. The procedure maximizes the signal-to-noiseratio at the input of the detector and exhibits excellent performancein realistic multipath channels.

Indirect likelihood inference (revised)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Standard indirect Inference (II) estimators take a given finite-dimensional statistic, Z_{n} , and then estimate the parameters by matching the sample statistic with the model-implied population moment. We here propose a novel estimation method that utilizes all available information contained in the distribution of Z_{n} , not just its first moment. This is done by computing the likelihood of Z_{n}, and then estimating the parameters by either maximizing the likelihood or computing the posterior mean for a given prior of the parameters. These are referred to as the maximum indirect likelihood (MIL) and Bayesian Indirect Likelihood (BIL) estimators, respectively. We show that the IL estimators are first-order equivalent to the corresponding moment-based II estimator that employs the optimal weighting matrix. However, due to higher-order features of Z_{n} , the IL estimators are higher order efficient relative to the standard II estimator. The likelihood of Z_{n} will in general be unknown and so simulated versions of IL estimators are developed. Monte Carlo results for a structural auction model and a DSGE model show that the proposed estimators indeed have attractive finite sample properties.

Estimators and Tests based on Likelihood-Depth with Application to Weibull Distribution, Gaussian and Gumbel Copula

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In dieser Arbeit werden mithilfe der Likelihood-Tiefen, eingeführt von Mizera und Müller (2004), (ausreißer-)robuste Schätzfunktionen und Tests für den unbekannten Parameter einer stetigen Dichtefunktion entwickelt. Die entwickelten Verfahren werden dann auf drei verschiedene Verteilungen angewandt. Für eindimensionale Parameter wird die Likelihood-Tiefe eines Parameters im Datensatz als das Minimum aus dem Anteil der Daten, für die die Ableitung der Loglikelihood-Funktion nach dem Parameter nicht negativ ist, und dem Anteil der Daten, für die diese Ableitung nicht positiv ist, berechnet. Damit hat der Parameter die größte Tiefe, für den beide Anzahlen gleich groß sind. Dieser wird zunächst als Schätzer gewählt, da die Likelihood-Tiefe ein Maß dafür sein soll, wie gut ein Parameter zum Datensatz passt. Asymptotisch hat der Parameter die größte Tiefe, für den die Wahrscheinlichkeit, dass für eine Beobachtung die Ableitung der Loglikelihood-Funktion nach dem Parameter nicht negativ ist, gleich einhalb ist. Wenn dies für den zu Grunde liegenden Parameter nicht der Fall ist, ist der Schätzer basierend auf der Likelihood-Tiefe verfälscht. In dieser Arbeit wird gezeigt, wie diese Verfälschung korrigiert werden kann sodass die korrigierten Schätzer konsistente Schätzungen bilden. Zur Entwicklung von Tests für den Parameter, wird die von Müller (2005) entwickelte Simplex Likelihood-Tiefe, die eine U-Statistik ist, benutzt. Es zeigt sich, dass für dieselben Verteilungen, für die die Likelihood-Tiefe verfälschte Schätzer liefert, die Simplex Likelihood-Tiefe eine unverfälschte U-Statistik ist. Damit ist insbesondere die asymptotische Verteilung bekannt und es lassen sich Tests für verschiedene Hypothesen formulieren. Die Verschiebung in der Tiefe führt aber für einige Hypothesen zu einer schlechten Güte des zugehörigen Tests. Es werden daher korrigierte Tests eingeführt und Voraussetzungen angegeben, unter denen diese dann konsistent sind. Die Arbeit besteht aus zwei Teilen. Im ersten Teil der Arbeit wird die allgemeine Theorie über die Schätzfunktionen und Tests dargestellt und zudem deren jeweiligen Konsistenz gezeigt. Im zweiten Teil wird die Theorie auf drei verschiedene Verteilungen angewandt: Die Weibull-Verteilung, die Gauß- und die Gumbel-Copula. Damit wird gezeigt, wie die Verfahren des ersten Teils genutzt werden können, um (robuste) konsistente Schätzfunktionen und Tests für den unbekannten Parameter der Verteilung herzuleiten. Insgesamt zeigt sich, dass für die drei Verteilungen mithilfe der Likelihood-Tiefen robuste Schätzfunktionen und Tests gefunden werden können. In unverfälschten Daten sind vorhandene Standardmethoden zum Teil überlegen, jedoch zeigt sich der Vorteil der neuen Methoden in kontaminierten Daten und Daten mit Ausreißern.

A likelihood ratio appropach to family-based association studies with covariates

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a procedure for association based analysis of nuclear families that allows for dichotomous and more general measurements of phenotype and inclusion of covariate information. Standard generalized linear models are used to relate phenotype and its predictors. Our test procedure, based on the likelihood ratio, unifies the estimation of all parameters through the likelihood itself and yields maximum likelihood estimates of the genetic relative risk and interaction parameters. Our method has advantages in modelling the covariate and gene-covariate interaction terms over recently proposed conditional score tests that include covariate information via a two-stage modelling approach. We apply our method in a study of human systemic lupus erythematosus and the C-reactive protein that includes sex as a covariate.

Non-stationary frequency analysis of extreme daily rainfall in Sao Paulo, Brazil

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work is an assessment of frequency of extreme values (EVs) of daily rainfall in the city of Sao Paulo. Brazil, over the period 1933-2005, based on the peaks-over-threshold (POT) and Generalized Pareto Distribution (GPD) approach. Usually. a GPD model is fitted to a sample of POT Values Selected With a constant threshold. However. in this work we use time-dependent thresholds, composed of relatively large p quantities (for example p of 0.97) of daily rainfall amounts computed from all available data. Samples of POT values were extracted with several Values of p. Four different GPD models (GPD-1, GPD-2, GPD-3. and GDP-4) were fitted to each one of these samples by the maximum likelihood (ML) method. The shape parameter was assumed constant for the four models, but time-varying covariates were incorporated into scale parameter of GPD-2. GPD-3, and GPD-4, describing annual cycle in GPD-2. linear trend in GPD-3, and both annual cycle and linear trend in GPD-4. The GPD-1 with constant scale and shape parameters is the simplest model. For identification of the best model among the four models WC used rescaled Akaike Information Criterion (AIC) with second-order bias correction. This criterion isolates GPD-3 as the best model, i.e. the one with positive linear trend in the scale parameter. The slope of this trend is significant compared to the null hypothesis of no trend, for about 98% confidence level. The non-parametric Mann-Kendall test also showed presence of positive trend in the annual frequency of excess over high thresholds. with p-value being virtually zero. Therefore. there is strong evidence that high quantiles of daily rainfall in the city of Sao Paulo have been increasing in magnitude and frequency over time. For example. 0.99 quantiles of daily rainfall amount have increased by about 40 mm between 1933 and 2005. Copyright (C) 2008 Royal Meteorological Society

Transformed symmetric models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the first time, we introduce a class of transformed symmetric models to extend the Box and Cox models to more general symmetric models. The new class of models includes all symmetric continuous distributions with a possible non-linear structure for the mean and enables the fitting of a wide range of models to several data types. The proposed methods offer more flexible alternatives to Box-Cox or other existing procedures. We derive a very simple iterative process for fitting these models by maximum likelihood, whereas a direct unconditional maximization would be more difficult. We give simple formulae to estimate the parameter that indexes the transformation of the response variable and the moments of the original dependent variable which generalize previous published results. We discuss inference on the model parameters. The usefulness of the new class of models is illustrated in one application to a real dataset.

Improved likelihood inference for the shape parameter in Weibull regression

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We obtain adjustments to the profile likelihood function in Weibull regression models with and without censoring. Specifically, we consider two different modified profile likelihoods: (i) the one proposed by Cox and Reid [Cox, D.R. and Reid, N., 1987, Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society B, 49, 1-39.], and (ii) an approximation to the one proposed by Barndorff-Nielsen [Barndorff-Nielsen, O.E., 1983, On a formula for the distribution of the maximum likelihood estimator. Biometrika, 70, 343-365.], the approximation having been obtained using the results by Fraser and Reid [Fraser, D.A.S. and Reid, N., 1995, Ancillaries and third-order significance. Utilitas Mathematica, 47, 33-53.] and by Fraser et al. [Fraser, D.A.S., Reid, N. and Wu, J., 1999, A simple formula for tail probabilities for frequentist and Bayesian inference. Biometrika, 86, 655-661.]. We focus on point estimation and likelihood ratio tests on the shape parameter in the class of Weibull regression models. We derive some distributional properties of the different maximum likelihood estimators and likelihood ratio tests. The numerical evidence presented in the paper favors the approximation to Barndorff-Nielsen`s adjustment.

«
1
2
3
4
5
6
7
8
...
63
64
»