51 resultados para robust estimator
em Université de Lausanne, Switzerland
Resumo:
SummaryDiscrete data arise in various research fields, typically when the observations are count data.I propose a robust and efficient parametric procedure for estimation of discrete distributions. The estimation is done in two phases. First, a very robust, but possibly inefficient, estimate of the model parameters is computed and used to indentify outliers. Then the outliers are either removed from the sample or given low weights, and a weighted maximum likelihood estimate (WML) is computed.The weights are determined via an adaptive process such that if the data follow the model, then asymptotically no observation is downweighted.I prove that the final estimator inherits the breakdown point of the initial one, and that its influence function at the model is the same as the influence function of the maximum likelihood estimator, which strongly suggests that it is asymptotically fully efficient.The initial estimator is a minimum disparity estimator (MDE). MDEs can be shown to have full asymptotic efficiency, and some MDEs have very high breakdown points and very low bias under contamination. Several initial estimators are considered, and the performances of the WMLs based on each of them are studied.It results that in a great variety of situations the WML substantially improves the initial estimator, both in terms of finite sample mean square error and in terms of bias under contamination. Besides, the performances of the WML are rather stable under a change of the MDE even if the MDEs have very different behaviors.Two examples of application of the WML to real data are considered. In both of them, the necessity for a robust estimator is clear: the maximum likelihood estimator is badly corrupted by the presence of a few outliers.This procedure is particularly natural in the discrete distribution setting, but could be extended to the continuous case, for which a possible procedure is sketched.RésuméLes données discrètes sont présentes dans différents domaines de recherche, en particulier lorsque les observations sont des comptages.Je propose une méthode paramétrique robuste et efficace pour l'estimation de distributions discrètes. L'estimation est faite en deux phases. Tout d'abord, un estimateur très robuste des paramètres du modèle est calculé, et utilisé pour la détection des données aberrantes (outliers). Cet estimateur n'est pas nécessairement efficace. Ensuite, soit les outliers sont retirés de l'échantillon, soit des faibles poids leur sont attribués, et un estimateur du maximum de vraisemblance pondéré (WML) est calculé.Les poids sont déterminés via un processus adaptif, tel qu'asymptotiquement, si les données suivent le modèle, aucune observation n'est dépondérée.Je prouve que le point de rupture de l'estimateur final est au moins aussi élevé que celui de l'estimateur initial, et que sa fonction d'influence au modèle est la même que celle du maximum de vraisemblance, ce qui suggère que cet estimateur est pleinement efficace asymptotiquement.L'estimateur initial est un estimateur de disparité minimale (MDE). Les MDE sont asymptotiquement pleinement efficaces, et certains d'entre eux ont un point de rupture très élevé et un très faible biais sous contamination. J'étudie les performances du WML basé sur différents MDEs.Le résultat est que dans une grande variété de situations le WML améliore largement les performances de l'estimateur initial, autant en terme du carré moyen de l'erreur que du biais sous contamination. De plus, les performances du WML restent assez stables lorsqu'on change l'estimateur initial, même si les différents MDEs ont des comportements très différents.Je considère deux exemples d'application du WML à des données réelles, où la nécessité d'un estimateur robuste est manifeste : l'estimateur du maximum de vraisemblance est fortement corrompu par la présence de quelques outliers.La méthode proposée est particulièrement naturelle dans le cadre des distributions discrètes, mais pourrait être étendue au cas continu.
Resumo:
Pulse wave velocity (PWV) is a surrogate of arterial stiffness and represents a non-invasive marker of cardiovascular risk. The non-invasive measurement of PWV requires tracking the arrival time of pressure pulses recorded in vivo, commonly referred to as pulse arrival time (PAT). In the state of the art, PAT is estimated by identifying a characteristic point of the pressure pulse waveform. This paper demonstrates that for ambulatory scenarios, where signal-to-noise ratios are below 10 dB, the performance in terms of repeatability of PAT measurements through characteristic points identification degrades drastically. Hence, we introduce a novel family of PAT estimators based on the parametric modeling of the anacrotic phase of a pressure pulse. In particular, we propose a parametric PAT estimator (TANH) that depicts high correlation with the Complior(R) characteristic point D1 (CC = 0.99), increases noise robustness and reduces by a five-fold factor the number of heartbeats required to obtain reliable PAT measurements.
Resumo:
To test whether quantitative traits are under directional or homogenizing selection, it is common practice to compare population differentiation estimates at molecular markers (F(ST)) and quantitative traits (Q(ST)). If the trait is neutral and its determinism is additive, then theory predicts that Q(ST) = F(ST), while Q(ST) > F(ST) is predicted under directional selection for different local optima, and Q(ST) < F(ST) is predicted under homogenizing selection. However, nonadditive effects can alter these predictions. Here, we investigate the influence of dominance on the relation between Q(ST) and F(ST) for neutral traits. Using analytical results and computer simulations, we show that dominance generally deflates Q(ST) relative to F(ST). Under inbreeding, the effect of dominance vanishes, and we show that for selfing species, a better estimate of Q(ST) is obtained from selfed families than from half-sib families. We also compare several sampling designs and find that it is always best to sample many populations (>20) with few families (five) rather than few populations with many families. Provided that estimates of Q(ST) are derived from individuals originating from many populations, we conclude that the pattern Q(ST) > F(ST), and hence the inference of directional selection for different local optima, is robust to the effect of nonadditive gene actions.
Resumo:
Robust estimators for accelerated failure time models with asymmetric (or symmetric) error distribution and censored observations are proposed. It is assumed that the error model belongs to a log-location-scale family of distributions and that the mean response is the parameter of interest. Since scale is a main component of mean, scale is not treated as a nuisance parameter. A three steps procedure is proposed. In the first step, an initial high breakdown point S estimate is computed. In the second step, observations that are unlikely under the estimated model are rejected or down weighted. Finally, a weighted maximum likelihood estimate is computed. To define the estimates, functions of censored residuals are replaced by their estimated conditional expectation given that the response is larger than the observed censored value. The rejection rule in the second step is based on an adaptive cut-off that, asymptotically, does not reject any observation when the data are generat ed according to the model. Therefore, the final estimate attains full efficiency at the model, with respect to the maximum likelihood estimate, while maintaining the breakdown point of the initial estimator. Asymptotic results are provided. The new procedure is evaluated with the help of Monte Carlo simulations. Two examples with real data are discussed.
Resumo:
We consider the problem of estimating the mean hospital cost of stays of a class of patients (e.g., a diagnosis-related group) as a function of patient characteristics. The statistical analysis is complicated by the asymmetry of the cost distribution, the possibility of censoring on the cost variable, and the occurrence of outliers. These problems have often been treated separately in the literature, and a method offering a joint solution to all of them is still missing. Indirect procedures have been proposed, combining an estimate of the duration distribution with an estimate of the conditional cost for a given duration. We propose a parametric version of this approach, allowing for asymmetry and censoring in the cost distribution and providing a mean cost estimator that is robust in the presence of extreme values. In addition, the new method takes covariate information into account.
Resumo:
We propose robust estimators of the generalized log-gamma distribution and, more generally, of location-shape-scale families of distributions. A (weighted) Q tau estimator minimizes a tau scale of the differences between empirical and theoretical quantiles. It is n(1/2) consistent; unfortunately, it is not asymptotically normal and, therefore, inconvenient for inference. However, it is a convenient starting point for a one-step weighted likelihood estimator, where the weights are based on a disparity measure between the model density and a kernel density estimate. The one-step weighted likelihood estimator is asymptotically normal and fully efficient under the model. It is also highly robust under outlier contamination. Supplementary materials are available online.
Resumo:
We consider robust parametric procedures for univariate discrete distributions, focusing on the negative binomial model. The procedures are based on three steps: ?First, a very robust, but possibly inefficient, estimate of the model parameters is computed. ?Second, this initial model is used to identify outliers, which are then removed from the sample. ?Third, a corrected maximum likelihood estimator is computed with the remaining observations. The final estimate inherits the breakdown point (bdp) of the initial one and its efficiency can be significantly higher. Analogous procedures were proposed in [1], [2], [5] for the continuous case. A comparison of the asymptotic bias of various estimates under point contamination points out the minimum Neyman's chi-squared disparity estimate as a good choice for the initial step. Various minimum disparity estimators were explored by Lindsay [4], who showed that the minimum Neyman's chi-squared estimate has a 50% bdp under point contamination; in addition, it is asymptotically fully efficient at the model. However, the finite sample efficiency of this estimate under the uncontaminated negative binomial model is usually much lower than 100% and the bias can be strong. We show that its performance can then be greatly improved using the three step procedure outlined above. In addition, we compare the final estimate with the procedure described in
Resumo:
Microsatellite instability (MSI) occurs in 10-20% of colorectal tumours and is associated with good prognosis. Here we describe the development and validation of a genomic signature that identifies colorectal cancer patients with MSI caused by DNA mismatch repair deficiency with high accuracy. Microsatellite status for 276 stage II and III colorectal tumours has been determined. Full-genome expression data was used to identify genes that correlate with MSI status. A subset of these samples (n = 73) had sequencing data for 615 genes available. An MSI gene signature of 64 genes was developed and validated in two independent validation sets: the first consisting of frozen samples from 132 stage II patients; and the second consisting of FFPE samples from the PETACC-3 trial (n = 625). The 64-gene MSI signature identified MSI patients in the first validation set with a sensitivity of 90.3% and an overall accuracy of 84.8%, with an AUC of 0.942 (95% CI, 0.888-0.975). In the second validation, the signature also showed excellent performance, with a sensitivity 94.3% and an overall accuracy of 90.6%, with an AUC of 0.965 (95% CI, 0.943-0.988). Besides correct identification of MSI patients, the gene signature identified a group of MSI-like patients that were MSS by standard assessment but MSI by signature assessment. The MSI-signature could be linked to a deficient MMR phenotype, as both MSI and MSI-like patients showed a high mutation frequency (8.2% and 6.4% of 615 genes assayed, respectively) as compared to patients classified as MSS (1.6% mutation frequency). The MSI signature showed prognostic power in stage II patients (n = 215) with a hazard ratio of 0.252 (p = 0.0145). Patients with an MSI-like phenotype had also an improved survival when compared to MSS patients. The MSI signature was translated to a diagnostic microarray and technically and clinically validated in FFPE and frozen samples.
Resumo:
Humans can recognize categories of environmental sounds, including vocalizations produced by humans and animals and the sounds of man-made objects. Most neuroimaging investigations of environmental sound discrimination have studied subjects while consciously perceiving and often explicitly recognizing the stimuli. Consequently, it remains unclear to what extent auditory object processing occurs independently of task demands and consciousness. Studies in animal models have shown that environmental sound discrimination at a neural level persists even in anesthetized preparations, whereas data from anesthetized humans has thus far provided null results. Here, we studied comatose patients as a model of environmental sound discrimination capacities during unconsciousness. We included 19 comatose patients treated with therapeutic hypothermia (TH) during the first 2 days of coma, while recording nineteen-channel electroencephalography (EEG). At the level of each individual patient, we applied a decoding algorithm to quantify the differential EEG responses to human vs. animal vocalizations as well as to sounds of living vocalizations vs. man-made objects. Discrimination between vocalization types was accurate in 11 patients and discrimination between sounds from living and man-made sources in 10 patients. At the group level, the results were significant only for the comparison between vocalization types. These results lay the groundwork for disentangling truly preferential activations in response to auditory categories, and the contribution of awareness to auditory category discrimination.
Resumo:
Recently a new measure of the cooperative behavior of simultaneous time series was introduced (Carmeli et al. NeuroImage 2005). This measure called S-estimator is defined from the embedding dimension in a state space. S-estimator quantifies the amount of synchronization within a data set by comparing the actual dimensionality of the set with the expected full dimensionality of the asynchronous set. It has the advantage of being a multivariate measure over traditionally used in systems neuroscience bivariate measures of synchronization. Multivariate measures of synchronization are of particular interest for applications in the field of modern multichannel EEG research, since they easily allow mapping of local and/or regional synchronization and are compatible with other imaging techniques. We applied Sestimator to the analysis of EEG synchronization in schizophrenia patients vs. matched controls. The whole-head mapping with S-estimator revealed a specific pattern of local synchronization in schizophrenia patients. The differences in the landscape of synchronization included decreased local synchronization in the territories over occipital and midline areas and increased synchronization over temporal areas. In frontal areas, the S-estimator revealed a tendency for an asymmetry: decreased S-values over the left hemisphere were adjacent to increased values over the right hemisphere. Separate calculations showed reproducibility of this pattern across the main EEG frequency bands. The maintenance of the same synchronization landscape across EEG frequencies probably implies the structural changes in the cortical circuitry of schizophrenia patients. These changes are regionally specific and suggest that schizophrenia is a misconnectivity rather than hypo- or hyper-connectivity disorder.
Resumo:
Robust Huber type regression and testing of linear hypotheses are adapted to statistical analysis of parallel line and slope ratio assays. They are applied in the evaluation of results of several experiments carried out in order to compare and validate alternatives to animal experimentation based on embryo and cell cultures. Computational procedures necessary for the application of robust methods of analysis used the conversational statistical package ROBSYS. Special commands for the analysis of parallel line and slope ratio assays have been added to ROBSYS.
Resumo:
PURPOSE: Studies of diffuse large B-cell lymphoma (DLBCL) are typically evaluated by using a time-to-event approach with relapse, re-treatment, and death commonly used as the events. We evaluated the timing and type of events in newly diagnosed DLBCL and compared patient outcome with reference population data. PATIENTS AND METHODS: Patients with newly diagnosed DLBCL treated with immunochemotherapy were prospectively enrolled onto the University of Iowa/Mayo Clinic Specialized Program of Research Excellence Molecular Epidemiology Resource (MER) and the North Central Cancer Treatment Group NCCTG-N0489 clinical trial from 2002 to 2009. Patient outcomes were evaluated at diagnosis and in the subsets of patients achieving event-free status at 12 months (EFS12) and 24 months (EFS24) from diagnosis. Overall survival was compared with age- and sex-matched population data. Results were replicated in an external validation cohort from the Groupe d'Etude des Lymphomes de l'Adulte (GELA) Lymphome Non Hodgkinien 2003 (LNH2003) program and a registry based in Lyon, France. RESULTS: In all, 767 patients with newly diagnosed DLBCL who had a median age of 63 years were enrolled onto the MER and NCCTG studies. At a median follow-up of 60 months (range, 8 to 116 months), 299 patients had an event and 210 patients had died. Patients achieving EFS24 had an overall survival equivalent to that of the age- and sex-matched general population (standardized mortality ratio [SMR], 1.18; P = .25). This result was confirmed in 820 patients from the GELA study and registry in Lyon (SMR, 1.09; P = .71). Simulation studies showed that EFS24 has comparable power to continuous EFS when evaluating clinical trials in DLBCL. CONCLUSION: Patients with DLBCL who achieve EFS24 have a subsequent overall survival equivalent to that of the age- and sex-matched general population. EFS24 will be useful in patient counseling and should be considered as an end point for future studies of newly diagnosed DLBCL.
Resumo:
Background: Glutathione (GSH) dysregulation at the gene, protein and functional levels observed in schizophrenia patients, and schizophrenia-like anomalies in GSH deficit experimental models, suggest that genetic glutathione synthesis impairments represent one major risk factor for the disease (Do et al., 2009). In a randomized, double blind, placebo controlled, add-on clinical trial of 140 patients, the GSH precursor N-Acetyl-Cysteine (NAC, 2g/day, 6 months) significantly improved the negative symptoms and reduced sideeffects due to antipsychotics (Berk et al., 2008). In a subset of patients (n=7), NAC (2g/day, 2 months, cross-over design) also improved auditory evoked potentials, the NMDA-dependent mismatch negativity (Lavoie et al, 2008). Methods: To determine whether increased GSH levels would modulate the topography of functional brain connectivity, we applied a multivariate phase synchronization (MPS) estimator (Knyazeva et al, 2008) to dense-array EEGs recorded during rest with eyes closed at the protocol onset, the point of crossover, and at its end. Results: The whole-head imaging revealed a specific synchronization landscape in NAC compared to placebo condition. In particular, NAC increased MPS over frontal and left temporal regions in a frequency-specific manner. The topography and direction of MPS changes were similar and robust in all 7 patients. Moreover, these changes correlated with the changes in the Liddle's score of disorganization, thus linking EEG synchronization to the improvement of the clinical picture. Conclusions: The data suggest an important pathway towards new therapeutic strategies that target GSH dysregulation in schizophrenia. They also show the utility of MPS mapping as a marker of treatment efficacy.