920 resultados para Random imputation


Relevância:

70.00% 70.00%

Publicador:

Resumo:

L’imputation simple est très souvent utilisée dans les enquêtes pour compenser pour la non-réponse partielle. Dans certaines situations, la variable nécessitant l’imputation prend des valeurs nulles un très grand nombre de fois. Ceci est très fréquent dans les enquêtes entreprises qui collectent les variables économiques. Dans ce mémoire, nous étudions les propriétés de deux méthodes d’imputation souvent utilisées en pratique et nous montrons qu’elles produisent des estimateurs imputés biaisés en général. Motivé par un modèle de mélange, nous proposons trois méthodes d’imputation et étudions leurs propriétés en termes de biais. Pour ces méthodes d’imputation, nous considérons un estimateur jackknife de la variance convergent vers la vraie variance, sous l’hypothèse que la fraction de sondage est négligeable. Finalement, nous effectuons une étude par simulation pour étudier la performance des estimateurs ponctuels et de variance en termes de biais et d’erreur quadratique moyenne.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Imputation is commonly used to compensate for item non-response in sample surveys. If we treat the imputed values as if they are true values, and then compute the variance estimates by using standard methods, such as the jackknife, we can seriously underestimate the true variances. We propose a modified jackknife variance estimator which is defined for any without-replacement unequal probability sampling design in the presence of imputation and non-negligible sampling fraction. Mean, ratio and random-imputation methods will be considered. The practical advantage of the method proposed is its breadth of applicability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 families were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two complementary approaches for analyzing correlated time-to-onset family data were considered: the generalized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to changing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression parameters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for estimating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In large epidemiological studies missing data can be a problem, especially if information is sought on a sensitive topic or when a composite measure is calculated from several variables each affected by missing values. Multiple imputation is the method of choice for 'filling in' missing data based on associations among variables. Using an example about body mass index from the Australian Longitudinal Study on Women's Health, we identify a subset of variables that are particularly useful for imputing values for the target variables. Then we illustrate two uses of multiple imputation. The first is to examine and correct for bias when data are not missing completely at random. The second is to impute missing values for an important covariate; in this case omission from the imputation process of variables to be used in the analysis may introduce bias. We conclude with several recommendations for handling issues of missing data. Copyright (C) 2004 John Wiley Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Credible spatial information characterizing the structure and site quality of forests is critical to sustainable forest management and planning, especially given the increasing demands and threats to forest products and services. Forest managers and planners are required to evaluate forest conditions over a broad range of scales, contingent on operational or reporting requirements. Traditionally, forest inventory estimates are generated via a design-based approach that involves generalizing sample plot measurements to characterize an unknown population across a larger area of interest. However, field plot measurements are costly and as a consequence spatial coverage is limited. Remote sensing technologies have shown remarkable success in augmenting limited sample plot data to generate stand- and landscape-level spatial predictions of forest inventory attributes. Further enhancement of forest inventory approaches that couple field measurements with cutting edge remotely sensed and geospatial datasets are essential to sustainable forest management. We evaluated a novel Random Forest based k Nearest Neighbors (RF-kNN) imputation approach to couple remote sensing and geospatial data with field inventory collected by different sampling methods to generate forest inventory information across large spatial extents. The forest inventory data collected by the FIA program of US Forest Service was integrated with optical remote sensing and other geospatial datasets to produce biomass distribution maps for a part of the Lake States and species-specific site index maps for the entire Lake State. Targeting small-area application of the state-of-art remote sensing, LiDAR (light detection and ranging) data was integrated with the field data collected by an inexpensive method, called variable plot sampling, in the Ford Forest of Michigan Tech to derive standing volume map in a cost-effective way. The outputs of the RF-kNN imputation were compared with independent validation datasets and extant map products based on different sampling and modeling strategies. The RF-kNN modeling approach was found to be very effective, especially for large-area estimation, and produced results statistically equivalent to the field observations or the estimates derived from secondary data sources. The models are useful to resource managers for operational and strategic purposes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract In this paper, we address the problem of picking a subset of bids in a general combinatorial auction so as to maximize the overall profit using the first-price model. This winner determination problem assumes that a single bidding round is held to determine both the winners and prices to be paid. We introduce six variants of biased random-key genetic algorithms for this problem. Three of them use a novel initialization technique that makes use of solutions of intermediate linear programming relaxations of an exact mixed integer-linear programming model as initial chromosomes of the population. An experimental evaluation compares the effectiveness of the proposed algorithms with the standard mixed linear integer programming formulation, a specialized exact algorithm, and the best-performing heuristics proposed for this problem. The proposed algorithms are competitive and offer strong results, mainly for large-scale auctions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Isosorbide succinate moieties were incorporated into poly(L-lactide) (PLLA) backbone in order to obtain a new class of biodegradable polymer with enhanced properties. This paper describes the synthesis and characterization of four types of low molecular weight copolymers. Copolymer I was obtained from monomer mixtures of L-lactide, isosorbide, and succinic anhydride; II from oligo(L-lactide) (PLLA), isosorbide, and succinic anhydride; III from oligo(isosorbide succinate) (PIS) and L-lactide; and IV from transesterification reactions between PLLA and PIS. MALDI-TOFMS and 13C-NMR analyses gave evidence that co-oligomerization was successfully attained in all cases. The data suggested that the product I is a random co-oligomer and the products II-IV are block co-oligomers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Consider a random medium consisting of N points randomly distributed so that there is no correlation among the distances separating them. This is the random link model, which is the high dimensionality limit (mean-field approximation) for the Euclidean random point structure. In the random link model, at discrete time steps, a walker moves to the nearest point, which has not been visited in the last mu steps (memory), producing a deterministic partially self-avoiding walk (the tourist walk). We have analytically obtained the distribution of the number n of points explored by the walker with memory mu=2, as well as the transient and period joint distribution. This result enables us to explain the abrupt change in the exploratory behavior between the cases mu=1 (memoryless walker, driven by extreme value statistics) and mu=2 (walker with memory, driven by combinatorial statistics). In the mu=1 case, the mean newly visited points in the thermodynamic limit (N >> 1) is just < n >=e=2.72... while in the mu=2 case, the mean number < n > of visited points grows proportionally to N(1/2). Also, this result allows us to establish an equivalence between the random link model with mu=2 and random map (uncorrelated back and forth distances) with mu=0 and the abrupt change between the probabilities for null transient time and subsequent ones.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. Results: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10(-5) for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. Conclusions: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: The aim of this study was to assess the effects of 830 and 670 nm laser on malondialdehyde (MDA) concentration in random skin-flap survival. Background Data: Low-level laser therapy (LLLT) has been reported to be successful in stimulating the formation of new blood vessels and activating superoxide-dismutase delivery, thus helping the inhibition of free-radical action and consequently reducing necrosis. Materials and Methods: Thirty Wistar rats were used and divided into three groups, with 10 rats in each one. A random skin flap was raised on the dorsum of each animal. Group 1 was the control group; group 2 received 830 nm laser radiation; and group 3 was submitted to 670 nm laser radiation. The animals underwent laser therapy with 36 J/cm(2) energy density immediately after surgery and on the 4 days subsequent to surgery. The application site of the laser radiation was 1 point, 2.5 cm from the flap's cranial base. The percentage of the skin-flap necrosis area was calculated 7 days postoperative using the paper-template method, and a skin sample was collected immediately after as a way of determining the MDA concentration. Results: Statistically significant differences were found between the necrosis percentages, with higher values seen in group 1 compared with groups 2 and 3. Groups 2 and 3 did not present statistically significant differences (p > 0.05). Group 3 had a lower concentration of MDA values compared to the control group (p < 0.05). Conclusion: LLLT was effective in increasing the random skin-flap viability in rats, and the 670 nm laser was efficient in reducing the MDA concentration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mature weight breeding values were estimated using a multi-trait animal model (MM) and a random regression animal model (RRM). Data consisted of 82 064 weight records from 8 145 animals, recorded from birth to eight years of age. Weights at standard ages were considered in the MM. All models included contemporary groups as fixed effects, and age of dam (linear and quadratic effects) and animal age as covariates. In the RRM, mean trends were modelled through a cubic regression on orthogonal polynomials of animal age and genetic maternal and direct and maternal permanent environmental effects were also included as random. Legendre polynomials of orders 4, 3, 6 and 3 were used for animal and maternal genetic and permanent environmental effects, respectively, considering five classes of residual variances. Mature weight (five years) direct heritability estimates were 0.35 (MM) and 0.38 (RRM). Rank correlation between sires' breeding values estimated by MM and RRM was 0.82. However, selecting the top 2% (12) or 10% (62) of the young sires based on the MM predicted breeding values, respectively 71% and 80% of the same sires would be selected if RRM estimates were used instead. The RRM modelled the changes in the (co) variances with age adequately and larger breeding value accuracies can be expected using this model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Imprinted inactivation of the paternal X chromosome in marsupials is the primordial mechanism of dosage compensation for X-linked genes between females and males in Therians. In Eutherian mammals, X chromosome inactivation (XCI) evolved into a random process in cells from the embryo proper, where either the maternal or paternal X can be inactivated. However, species like mouse and bovine maintained imprinted XCI exclusively in extraembryonic tissues. The existence of imprinted XCI in humans remains controversial, with studies based on the analyses of only one or two X-linked genes in different extraembryonic tissues. Here we readdress this issue in human term placenta by performing a robust analysis of allele-specific expression of 22 X-linked genes, including XIST, using 27 SNPs in transcribed regions. We show that XCI is random in human placenta, and that this organ is arranged in relatively large patches of cells with either maternal or paternal inactive X. In addition, this analysis indicated heterogeneous maintenance of gene silencing along the inactive X, which combined with the extensive mosaicism found in placenta, can explain the lack of agreement among previous studies. Our results illustrate the differences of XCI mechanism between humans and mice, and highlight the importance of addressing the issue of imprinted XCI in other species in order to understand the evolution of dosage compensation in placental mammals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is shown that the families of generalized matrix ensembles recently considered which give rise to an orthogonal invariant stable Levy ensemble can be generated by the simple procedure of dividing Gaussian matrices by a random variable. The nonergodicity of this kind of disordered ensembles is investigated. It is shown that the same procedure applied to random graphs gives rise to a family that interpolates between the Erdos-Renyi and the scale free models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A photoluminescence (PL) study of the individual electron states localized in a random potential is performed in artificially disordered superlattices embedded in a wide parabolic well. The valence band bowing of the parabolic potential provides a variation of the emission energies which splits the optical transitions corresponding to different wells within the random potential. The blueshift of the PL lines emitted by individual random wells, observed with increasing disorder strength, is demonstrated. The variation of temperature and magnetic field allowed for the behavior of the electrons localized in individual wells of the random potential to be distinguished.