49 resultados para Maximum likelihood estimate

em Université de Lausanne, Switzerland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

SummaryDiscrete data arise in various research fields, typically when the observations are count data.I propose a robust and efficient parametric procedure for estimation of discrete distributions. The estimation is done in two phases. First, a very robust, but possibly inefficient, estimate of the model parameters is computed and used to indentify outliers. Then the outliers are either removed from the sample or given low weights, and a weighted maximum likelihood estimate (WML) is computed.The weights are determined via an adaptive process such that if the data follow the model, then asymptotically no observation is downweighted.I prove that the final estimator inherits the breakdown point of the initial one, and that its influence function at the model is the same as the influence function of the maximum likelihood estimator, which strongly suggests that it is asymptotically fully efficient.The initial estimator is a minimum disparity estimator (MDE). MDEs can be shown to have full asymptotic efficiency, and some MDEs have very high breakdown points and very low bias under contamination. Several initial estimators are considered, and the performances of the WMLs based on each of them are studied.It results that in a great variety of situations the WML substantially improves the initial estimator, both in terms of finite sample mean square error and in terms of bias under contamination. Besides, the performances of the WML are rather stable under a change of the MDE even if the MDEs have very different behaviors.Two examples of application of the WML to real data are considered. In both of them, the necessity for a robust estimator is clear: the maximum likelihood estimator is badly corrupted by the presence of a few outliers.This procedure is particularly natural in the discrete distribution setting, but could be extended to the continuous case, for which a possible procedure is sketched.RésuméLes données discrètes sont présentes dans différents domaines de recherche, en particulier lorsque les observations sont des comptages.Je propose une méthode paramétrique robuste et efficace pour l'estimation de distributions discrètes. L'estimation est faite en deux phases. Tout d'abord, un estimateur très robuste des paramètres du modèle est calculé, et utilisé pour la détection des données aberrantes (outliers). Cet estimateur n'est pas nécessairement efficace. Ensuite, soit les outliers sont retirés de l'échantillon, soit des faibles poids leur sont attribués, et un estimateur du maximum de vraisemblance pondéré (WML) est calculé.Les poids sont déterminés via un processus adaptif, tel qu'asymptotiquement, si les données suivent le modèle, aucune observation n'est dépondérée.Je prouve que le point de rupture de l'estimateur final est au moins aussi élevé que celui de l'estimateur initial, et que sa fonction d'influence au modèle est la même que celle du maximum de vraisemblance, ce qui suggère que cet estimateur est pleinement efficace asymptotiquement.L'estimateur initial est un estimateur de disparité minimale (MDE). Les MDE sont asymptotiquement pleinement efficaces, et certains d'entre eux ont un point de rupture très élevé et un très faible biais sous contamination. J'étudie les performances du WML basé sur différents MDEs.Le résultat est que dans une grande variété de situations le WML améliore largement les performances de l'estimateur initial, autant en terme du carré moyen de l'erreur que du biais sous contamination. De plus, les performances du WML restent assez stables lorsqu'on change l'estimateur initial, même si les différents MDEs ont des comportements très différents.Je considère deux exemples d'application du WML à des données réelles, où la nécessité d'un estimateur robuste est manifeste : l'estimateur du maximum de vraisemblance est fortement corrompu par la présence de quelques outliers.La méthode proposée est particulièrement naturelle dans le cadre des distributions discrètes, mais pourrait être étendue au cas continu.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We extend PML theory to account for information on the conditional moments up to order four, but without assuming a parametric model, to avoid a risk of misspecification of the conditional distribution. The key statistical tool is the quartic exponential family, which allows us to generalize the PML2 and QGPML1 methods proposed in Gourieroux et al. (1984) to PML4 and QGPML2 methods, respectively. An asymptotic theory is developed. The key numerical tool that we use is the Gauss-Freud integration scheme that solves a computational problem that has previously been raised in several fields. Simulation exercises demonstrate the feasibility and robustness of the methods [Authors]

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Robust estimators for accelerated failure time models with asymmetric (or symmetric) error distribution and censored observations are proposed. It is assumed that the error model belongs to a log-location-scale family of distributions and that the mean response is the parameter of interest. Since scale is a main component of mean, scale is not treated as a nuisance parameter. A three steps procedure is proposed. In the first step, an initial high breakdown point S estimate is computed. In the second step, observations that are unlikely under the estimated model are rejected or down weighted. Finally, a weighted maximum likelihood estimate is computed. To define the estimates, functions of censored residuals are replaced by their estimated conditional expectation given that the response is larger than the observed censored value. The rejection rule in the second step is based on an adaptive cut-off that, asymptotically, does not reject any observation when the data are generat ed according to the model. Therefore, the final estimate attains full efficiency at the model, with respect to the maximum likelihood estimate, while maintaining the breakdown point of the initial estimator. Asymptotic results are provided. The new procedure is evaluated with the help of Monte Carlo simulations. Two examples with real data are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) are conducted with the promise to discover novel genetic variants associated with diverse traits. For most traits, associated markers individually explain just a modest fraction of the phenotypic variation, but their number can well be in the hundreds. We developed a maximum likelihood method that allows us to infer the distribution of associated variants even when many of them were missed by chance. Compared to previous approaches, the novelty of our method is that it (a) does not require having an independent (unbiased) estimate of the effect sizes; (b) makes use of the complete distribution of P-values while allowing for the false discovery rate; (c) takes into account allelic heterogeneity and the SNP pruning strategy. We applied our method to the latest GWAS meta-analysis results of the GIANT consortium. It revealed that while the explained variance of genome-wide (GW) significant SNPs is around 1% for waist-hip ratio (WHR), the observed P-values provide evidence for the existence of variants explaining 10% (CI=[8.5-11.5%]) of the phenotypic variance in total. Similarly, the total explained variance likely to exist for height is estimated to be 29% (CI=[28-30%]), three times higher than what the observed GW significant SNPs give rise to. This methodology also enables us to predict the benefit of future GWA studies that aim to reveal more associated genetic markers via increased sample size.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The antiretroviral protein TRIM5alpha is known to have evolved different restriction capacities against various retroviruses, driven by positive Darwinian selection. However, how these different specificities have evolved in the primate lineages is not fully understood. Here we used ancestral protein resurrection to estimate the evolution of antiviral restriction specificities of TRIM5alpha on the primate lineage leading to humans. We used TRIM5alpha coding sequences from 24 primates for the reconstruction of ancestral TRIM5alpha sequences using maximum-likelihood and Bayesian approaches. Ancestral sequences were transduced into HeLa and CRFK cells. Stable cell lines were generated and used to test restriction of a panel of extant retroviruses (human immunodeficiency virus type 1 [HIV-1] and HIV-2, simian immunodeficiency virus [SIV] variants SIV(mac) and SIV(agm), and murine leukemia virus [MLV] variants N-MLV and B-MLV). The resurrected TRIM5alpha variant from the common ancestor of Old World primates (Old World monkeys and apes, approximately 25 million years before present) was effective against present day HIV-1. In contrast to the HIV-1 restriction pattern, we show that the restriction efficacy against other retroviruses, such as a murine oncoretrovirus (N-MLV), is higher for more recent resurrected hominoid variants. Ancestral TRIM5alpha variants have generally limited efficacy against HIV-2, SIV(agm), and SIV(mac). Our study sheds new light on the evolution of the intrinsic antiviral defense machinery and illustrates the utility of functional evolutionary reconstruction for characterizing recently emerged protein differences.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Nonlinear regression problems can often be reduced to linearity by transforming the response variable (e.g., using the Box-Cox family of transformations). The classic estimates of the parameter defining the transformation as well as of the regression coefficients are based on the maximum likelihood criterion, assuming homoscedastic normal errors for the transformed response. These estimates are nonrobust in the presence of outliers and can be inconsistent when the errors are nonnormal or heteroscedastic. This article proposes new robust estimates that are consistent and asymptotically normal for any unimodal and homoscedastic error distribution. For this purpose, a robust version of conditional expectation is introduced for which the prediction mean squared error is replaced with an M scale. This concept is then used to develop a nonparametric criterion to estimate the transformation parameter as well as the regression coefficients. A finite sample estimate of this criterion based on a robust version of smearing is also proposed. Monte Carlo experiments show that the new estimates compare favorably with respect to the available competitors.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Predictive groundwater modeling requires accurate information about aquifer characteristics. Geophysical imaging is a powerful tool for delineating aquifer properties at an appropriate scale and resolution, but it suffers from problems of ambiguity. One way to overcome such limitations is to adopt a simultaneous multitechnique inversion strategy. We have developed a methodology for aquifer characterization based on structural joint inversion of multiple geophysical data sets followed by clustering to form zones and subsequent inversion for zonal parameters. Joint inversions based on cross-gradient structural constraints require less restrictive assumptions than, say, applying predefined petro-physical relationships and generally yield superior results. This approach has, for the first time, been applied to three geophysical data types in three dimensions. A classification scheme using maximum likelihood estimation is used to determine the parameters of a Gaussian mixture model that defines zonal geometries from joint-inversion tomograms. The resulting zones are used to estimate representative geophysical parameters of each zone, which are then used for field-scale petrophysical analysis. A synthetic study demonstrated how joint inversion of seismic and radar traveltimes and electrical resistance tomography (ERT) data greatly reduces misclassification of zones (down from 21.3% to 3.7%) and improves the accuracy of retrieved zonal parameters (from 1.8% to 0.3%) compared to individual inversions. We applied our scheme to a data set collected in northeastern Switzerland to delineate lithologic subunits within a gravel aquifer. The inversion models resolve three principal subhorizontal units along with some important 3D heterogeneity. Petro-physical analysis of the zonal parameters indicated approximately 30% variation in porosity within the gravel aquifer and an increasing fraction of finer sediments with depth.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Aims: Plasma concentrations of imatinib differ largely between patients despite same dosage, owing to large inter-individual variability in pharmacokinetic (PK) parameters. As the drug concentration at the end of the dosage interval (Cmin) correlates with treatment response and tolerability, monitoring of Cmin is suggested for therapeutic drug monitoring (TDM) of imatinib. Due to logistic difficulties, random sampling during the dosage interval is however often performed in clinical practice, thus rendering the respective results not informative regarding Cmin values.Objectives: (I) To extrapolate randomly measured imatinib concentrations to more informative Cmin using classical Bayesian forecasting. (II) To extend the classical Bayesian method to account for correlation between PK parameters. (III) To evaluate the predictive performance of both methods.Methods: 31 paired blood samples (random and trough levels) were obtained from 19 cancer patients under imatinib. Two Bayesian maximum a posteriori (MAP) methods were implemented: (A) a classical method ignoring correlation between PK parameters, and (B) an extended one accounting for correlation. Both methods were applied to estimate individual PK parameters, conditional on random observations and covariate-adjusted priors from a population PK model. The PK parameter estimates were used to calculate trough levels. Relative prediction errors (PE) were analyzed to evaluate accuracy (one-sample t-test) and to compare precision between the methods (F-test to compare variances).Results: Both Bayesian MAP methods allowed non-biased predictions of individual Cmin compared to observations: (A) - 7% mean PE (CI95% - 18 to 4 %, p = 0.15) and (B) - 4% mean PE (CI95% - 18 to 10 %, p = 0.69). Relative standard deviations of actual observations from predictions were 22% (A) and 30% (B), i.e. comparable to the intraindividual variability reported. Precision was not improved by taking into account correlation between PK parameters (p = 0.22).Conclusion: Clinical interpretation of randomly measured imatinib concentrations can be assisted by Bayesian extrapolation to maximum likelihood Cmin. Classical Bayesian estimation can be applied for TDM without the need to include correlation between PK parameters. Both methods could be adapted in the future to evaluate other individual pharmacokinetic measures correlated to clinical outcomes, such as area under the curve(AUC).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider robust parametric procedures for univariate discrete distributions, focusing on the negative binomial model. The procedures are based on three steps: ?First, a very robust, but possibly inefficient, estimate of the model parameters is computed. ?Second, this initial model is used to identify outliers, which are then removed from the sample. ?Third, a corrected maximum likelihood estimator is computed with the remaining observations. The final estimate inherits the breakdown point (bdp) of the initial one and its efficiency can be significantly higher. Analogous procedures were proposed in [1], [2], [5] for the continuous case. A comparison of the asymptotic bias of various estimates under point contamination points out the minimum Neyman's chi-squared disparity estimate as a good choice for the initial step. Various minimum disparity estimators were explored by Lindsay [4], who showed that the minimum Neyman's chi-squared estimate has a 50% bdp under point contamination; in addition, it is asymptotically fully efficient at the model. However, the finite sample efficiency of this estimate under the uncontaminated negative binomial model is usually much lower than 100% and the bias can be strong. We show that its performance can then be greatly improved using the three step procedure outlined above. In addition, we compare the final estimate with the procedure described in

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The genus Prunus L. is large and economically important. However, phylogenetic relationships within Prunus at low taxonomic level, particularly in the subgenus Amygdalus L. s.l., remain poorly investigated. This paper attempts to document the evolutionary history of Amygdalus s.l. and establishes a temporal framework, by assembling molecular data from conservative and variable molecular markers. The nuclear s6pdh gene in combination with the plastid trnSG spacer are analyzed with bayesian and maximum likelihood methods. Since previous phylogenetic analysis with these markers lacked resolution, we additionally analyzed 13 nuclear SSR loci with the δµ2 distance, followed by an unweighted pair group method using arithmetic averages algorithm. Our phylogenetic analysis with both sequence and SSR loci confirms the split between sections Amygdalus and Persica, comprising almonds and peaches, respectively. This result is in agreement with biogeographic data showing that each of the two sections is naturally distributed on each side of the Central Asian Massif chain. Using coalescent based estimations, divergence times between the two sections strongly varied when considering sequence data only or combined with SSR. The sequence-only based estimate (5 million years ago) was congruent with the Central Asian Massif orogeny and subsequent climate change. Given the low level of differentiation within the two sections using both marker types, the utility of combining microsatellites and data sequences to address phylogenetic relationships at low taxonomic level within Amygdalus is discussed. The recent evolutionary histories of almond and peach are discussed in view of the domestication processes that arose in these two phenotypically-diverging gene pools: almonds and peaches were domesticated from the Amygdalus s.s. and Persica sections, respectively. Such economically important crops may serve as good model to study divergent domestication process in close genetic pool.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

OBJECTIVES: In this population-based study, reference values were generated for renal length, and the heritability and factors associated with kidney length were assessed. METHODS: Anthropometric parameters and renal ultrasound measurements were assessed in randomly selected nuclear families of European ancestry (Switzerland). The adjusted narrow sense heritability of kidney size parameters was estimated by maximum likelihood assuming multivariate normality after power transformation. Gender-specific reference centiles were generated for renal length according to body height in the subset of non-diabetic non-obese participants with normal renal function. RESULTS: We included 374 men and 419 women (mean ± SD, age 47 ± 18 and 48 ± 17 years, BMI 26.2 ± 4 and 24.5 ± 5 kg/m(2), respectively) from 205 families. Renal length was 11.4 ± 0.8 cm in men and 10.7 ± 0.8 cm in women; there was no difference between right and left renal length. Body height, weight and estimated glomerular filtration rate (eGFR) were positively associated with renal length, kidney function negatively, age quadratically, whereas gender and hypertension were not. The adjusted heritability estimates of renal length and volume were 47.3 ± 8.5 % and 45.5 ± 8.8 %, respectively (P < 0.001). CONCLUSION: The significant heritability of renal length and volume highlights the familial aggregation of this trait, independently of age and body size. Population-based references for renal length provide a useful guide for clinicians. KEY POINTS: • Renal length and volume are heritable traits, independent of age and size. • Based on a European population, gender-specific reference values/percentiles are provided for renal length. • Renal length correlates positively with body length and weight. • There was no difference between right and left renal lengths in this study. • This negates general teaching that the left kidney is larger and longer.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

C4 photosynthesis is an adaptation derived from the more common C3 photosynthetic pathway that confers a higher productivity under warm temperature and low atmospheric CO2 concentration [1, 2]. C4 evolution has been seen as a consequence of past atmospheric CO2 decline, such as the abrupt CO2 fall 32-25 million years ago (Mya) [3-6]. This relationship has never been tested rigorously, mainly because of a lack of accurate estimates of divergence times for the different C4 lineages [3]. In this study, we inferred a large phylogenetic tree for the grass family and estimated, through Bayesian molecular dating, the ages of the 17 to 18 independent grass C4 lineages. The first transition from C3 to C4 photosynthesis occurred in the Chloridoideae subfamily, 32.0-25.0 Mya. The link between CO2 decrease and transition to C4 photosynthesis was tested by a novel maximum likelihood approach. We showed that the model incorporating the atmospheric CO2 levels was significantly better than the null model, supporting the importance of CO2 decline on C4 photosynthesis evolvability. This finding is relevant for understanding the origin of C4 photosynthesis in grasses, which is one of the most successful ecological and evolutionary innovations in plant history.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: In vitro aggregating brain cell cultures containing all types of brain cells have been shown to be useful for neurotoxicological investigations. The cultures are used for the detection of nervous system-specific effects of compounds by measuring multiple endpoints, including changes in enzyme activities. Concentration-dependent neurotoxicity is determined at several time points. METHODS: A Markov model was set up to describe the dynamics of brain cell populations exposed to potentially neurotoxic compounds. Brain cells were assumed to be either in a healthy or stressed state, with only stressed cells being susceptible to cell death. Cells may have switched between these states or died with concentration-dependent transition rates. Since cell numbers were not directly measurable, intracellular lactate dehydrogenase (LDH) activity was used as a surrogate. Assuming that changes in cell numbers are proportional to changes in intracellular LDH activity, stochastic enzyme activity models were derived. Maximum likelihood and least squares regression techniques were applied for estimation of the transition rates. Likelihood ratio tests were performed to test hypotheses about the transition rates. Simulation studies were used to investigate the performance of the transition rate estimators and to analyze the error rates of the likelihood ratio tests. The stochastic time-concentration activity model was applied to intracellular LDH activity measurements after 7 and 14 days of continuous exposure to propofol. The model describes transitions from healthy to stressed cells and from stressed cells to death. RESULTS: The model predicted that propofol would affect stressed cells more than healthy cells. Increasing propofol concentration from 10 to 100 μM reduced the mean waiting time for transition to the stressed state by 50%, from 14 to 7 days, whereas the mean duration to cellular death reduced more dramatically from 2.7 days to 6.5 hours. CONCLUSION: The proposed stochastic modeling approach can be used to discriminate between different biological hypotheses regarding the effect of a compound on the transition rates. The effects of different compounds on the transition rate estimates can be quantitatively compared. Data can be extrapolated at late measurement time points to investigate whether costs and time-consuming long-term experiments could possibly be eliminated.