999 resultados para Truncated Data


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background & Aims: An elevated transferrin saturation is the earliest phenotypic abnormality in hereditary hemochromatosis. Determination of transferrin saturation remains the most useful noninvasive screening test for affected individuals, but there is debate as to the appropriate screening level. The aims of this study were to estimate the mean transferrin saturation in hemochromatosis heterozygotes and normal individuals and to evaluate potential transferrin saturation screening levels. Methods: Statistical mixture modeling was applied to data from a survey of asymptomatic Australians to estimate the mean transferrin saturation in hemochromatosis heterozygotes and normal individuals. To evaluate potential transferrin saturation screening levels, modeling results were compared with data from identified hemochromatosis heterozygotes and homozygotes. Results: After removal of hemochromatosis homozygotes, two populations of transferrin saturation were identified in asymptomatic Australians (P < 0.01). In men, 88.2% of the truncated sample had a lower mean transferrin saturation of 24.1%, whereas 11.8% had an increased mean transferrin saturation of 37.3%. Similar results were found in women, A transferrin saturation threshold of 45% identified 98% of homozygotes without misidentifying any normal individuals. Conclusions: The results confirm that hemochromatosis heterozygotes form a distinct transferrin saturation subpopulation and support the use of transferrin saturation as an inexpensive screening test for hemochromatosis. In practice, a fasting transferrin saturation of greater than or equal to 45% identifies virtually all affected homozygous subjects without necessitating further investigation of unaffected normal individuals.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Laboratory safety data are routinely collected in clinical studies for safety monitoring and assessment. We have developed a truncated robust multivariate outlier detection method for identifying subjects with clinically relevant abnormal laboratory measurements. The proposed method can be applied to historical clinical data to establish a multivariate decision boundary that can then be used for future clinical trial laboratory safety data monitoring and assessment. Simulations demonstrate that the proposed method has the ability to detect relevant outliers while automatically excluding irrelevant outliers. Two examples from actual clinical studies are used to illustrate the use of this method for identifying clinically relevant outliers.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Prevalent sampling is an efficient and focused approach to the study of the natural history of disease. Right-censored time-to-event data observed from prospective prevalent cohort studies are often subject to left-truncated sampling. Left-truncated samples are not randomly selected from the population of interest and have a selection bias. Extensive studies have focused on estimating the unbiased distribution given left-truncated samples. However, in many applications, the exact date of disease onset was not observed. For example, in an HIV infection study, the exact HIV infection time is not observable. However, it is known that the HIV infection date occurred between two observable dates. Meeting these challenges motivated our study. We propose parametric models to estimate the unbiased distribution of left-truncated, right-censored time-to-event data with uncertain onset times. We first consider data from a length-biased sampling, a specific case in left-truncated samplings. Then we extend the proposed method to general left-truncated sampling. With a parametric model, we construct the full likelihood, given a biased sample with unobservable onset of disease. The parameters are estimated through the maximization of the constructed likelihood by adjusting the selection bias and unobservable exact onset. Simulations are conducted to evaluate the finite sample performance of the proposed methods. We apply the proposed method to an HIV infection study, estimating the unbiased survival function and covariance coefficients. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of the interaction between hair filaments and formulations or peptides is of utmost importance in fields like cosmetic research. Keratin intermediate filaments structure is not fully described, limiting the molecular dynamics (MD) studies in this field although its high potential to improve the area. We developed a computational model of a truncated protofibril, simulated its behavior in alcoholic based formulations and with one peptide. The simulations showed a strong interaction between the benzyl alcohol molecules of the formulations and the model, leading to the disorganization of the keratin chains, which regress with the removal of the alcohol molecules. This behavior can explain the increase of peptide uptake in hair shafts evidenced in fluorescence microscopy pictures. The model developed is valid to computationally reproduce the interaction between hair and alcoholic formulations and provide a robust base for new MD studies about hair properties. It is shown that the MD simulations can improve hair cosmetic research, improving the uptake of a compound of interest.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It has been argued that by truncating the sample space of the negative binomial and of the inverse Gaussian-Poisson mixture models at zero, one is allowed to extend the parameter space of the model. Here that is proved to be the case for the more general three parameter Tweedie-Poisson mixture model. It is also proved that the distributions in the extended part of the parameter space are not the zero truncation of mixed poisson distributions and that, other than for the negative binomial, they are not mixtures of zero truncated Poisson distributions either. By extending the parameter space one can improve the fit when the frequency of one is larger and the right tail is heavier than is allowed by the unextended model. Considering the extended model also allows one to use the basic maximum likelihood based inference tools when parameter estimates fall in the extended part of the parameter space, and hence when the m.l.e. does not exist under the unextended model. This extended truncated Tweedie-Poisson model is proved to be useful in the analysis of words and species frequency count data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates a simple procedure to estimate robustly the mean of an asymmetric distribution. The procedure removes the observations which are larger or smaller than certain limits and takes the arithmetic mean of the remaining observations, the limits being determined with the help of a parametric model, e.g., the Gamma, the Weibull or the Lognormal distribution. The breakdown point, the influence function, the (asymptotic) variance, and the contamination bias of this estimator are explored and compared numerically with those of competing estimates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The bacterial insertion sequence IS21 contains two genes, istA and istB, which are organized as an operon. IS21 spontaneously forms tandem repeats designated (IS21)2. Plasmids carrying (IS21)2 react efficiently with other replicons, producing cointegrates via a cut-and-paste mechanism. Here we show that transposition of a single IS21 element (simple insertion) and cointegrate formation involving (IS21)2 result from two distinct non-replicative pathways, which are essentially due to two differentiated IstA proteins, transposase and cointegrase. In Escherichia coli, transposase was characterized as the full-length, 46 kDa product of the istA gene, whereas the 45 kDa cointegrase was expressed, in-frame, from a natural internal translation start of istA. The istB gene, which could be experimentally disconnected from istA, provided a helper protein that strongly stimulated the transposase and cointegrase-driven reactions. Site-directed mutagenesis was used to express either cointegrase or transposase from the istA gene. Cointegrase promoted replicon fusion at high frequencies by acting on IS21 ends which were linked by 2, 3, or 4 bp junction sequences in (IS21)2. By contrast, cointegrase poorly catalyzed simple insertion of IS21 elements. Transposase had intermediate, uniform activity in both pathways. The ability of transposase to synapse two widely spaced IS21 ends may reside in the eight N-terminal amino acid residues which are absent from cointegrase. Given the 2 or 3 bp spacing in naturally occurring IS21 tandems and the specialization of cointegrase, the fulminant spread of IS21 via cointegration can now be understood.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fasting-induced adipose factor (FIAF, ANGPTL4, PGAR, HFARP) was previously identified as a novel adipocytokine that was up-regulated by fasting, by peroxisome proliferator-activated receptor agonists, and by hypoxia. To further characterize FIAF, we studied regulation of FIAF mRNA and protein in liver and adipose cell lines as well as in human and mouse plasma. Expression of FIAF mRNA was up-regulated by peroxisome proliferator-activated receptor alpha (PPARalpha) and PPARbeta/delta agonists in rat and human hepatoma cell lines and by PPARgamma and PPARbeta/delta agonists in mouse and human adipocytes. Transactivation, chromatin immunoprecipitation, and gel shift experiments identified a functional PPAR response element within intron 3 of the FIAF gene. At the protein level, in human and mouse blood plasma, FIAF was found to be present both as the native protein and in a truncated form. Differentiation of mouse 3T3-L1 adipocytes was associated with the production of truncated FIAF, whereas in human white adipose tissue and SGBS adipocytes, only native FIAF could be detected. Interestingly, truncated FIAF was produced by human liver. Treatment with fenofibrate, a potent PPARalpha agonist, markedly increased plasma levels of truncated FIAF, but not native FIAF, in humans. Levels of both truncated and native FIAF showed marked interindividual variation but were not associated with body mass index and were not influenced by prolonged semistarvation. Together, these data suggest that FIAF, similar to other adipocytokines such as adiponectin, may partially exert its function via a truncated form.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Azospirillum brasilense is a nitrogen-fixing bacterium associated with important agricultural crops such as rice, wheat and maize. The expression of genes responsible for nitrogen fixation (nif genes) in this bacterium is dependent on the transcriptional activator NifA. This protein contains three structural domains: the N-terminal domain is responsible for the negative control by fixed nitrogen; the central domain interacts with the RNA polymerase σ54 co-factor and the C-terminal domain is involved in DNA binding. The central and C-terminal domains are linked by the interdomain linker (IDL). A conserved four-cysteine motif encompassing the end of the central domain and the IDL is probably involved in the oxygen-sensitivity of NifA. In the present study, we have expressed, purified and characterized an N-truncated form of A. brasilense NifA. The protein expression was carried out in Escherichia coli and the N-truncated NifA protein was purified by chromatography using an affinity metal-chelating resin followed by a heparin-bound resin. Protein homogeneity was determined by densitometric analysis. The N-truncated protein activated in vivo nifH::lacZ transcription regardless of fixed nitrogen concentration (absence or presence of 20 mM NH4Cl) but only under low oxygen levels. On the other hand, the aerobically purified N-truncated NifA protein bound to the nifB promoter, as demonstrated by an electrophoretic mobility shift assay, implying that DNA-binding activity is not strictly controlled by oxygen levels. Our data show that, while the N-truncated NifA is inactive in vivo under aerobic conditions, it still retains DNA-binding activity, suggesting that the oxidized form of NifA bound to DNA is not competent to activate transcription.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While over-dispersion in capture–recapture studies is well known to lead to poor estimation of population size, current diagnostic tools to detect the presence of heterogeneity have not been specifically developed for capture–recapture studies. To address this, a simple and efficient method of testing for over-dispersion in zero-truncated count data is developed and evaluated. The proposed method generalizes an over-dispersion test previously suggested for un-truncated count data and may also be used for testing residual over-dispersion in zero-inflation data. Simulations suggest that the asymptotic distribution of the test statistic is standard normal and that this approximation is also reasonable for small sample sizes. The method is also shown to be more efficient than an existing test for over-dispersion adapted for the capture–recapture setting. Studies with zero-truncated and zero-inflated count data are used to illustrate the test procedures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we consider the estimation of population size from onesource capture–recapture data, that is, a list in which individuals can potentially be found repeatedly and where the question is how many individuals are missed by the list. As a typical example, we provide data from a drug user study in Bangkok from 2001 where the list consists of drug users who repeatedly contact treatment institutions. Drug users with 1, 2, 3, . . . contacts occur, but drug users with zero contacts are not present, requiring the size of this group to be estimated. Statistically, these data can be considered as stemming from a zero-truncated count distribution.We revisit an estimator for the population size suggested by Zelterman that is known to be robust under potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a locally truncated Poisson likelihood which is equivalent to a binomial likelihood. This result allows the extension of the Zelterman estimator by means of logistic regression to include observed heterogeneity in the form of covariates. We also review an estimator proposed by Chao and explain why we are not able to obtain similar results for this estimator. The Zelterman estimator is applied in two case studies, the first a drug user study from Bangkok, the second an illegal immigrant study in the Netherlands. Our results suggest the new estimator should be used, in particular, if substantial unobserved heterogeneity is present.