213 resultados para Imputation


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sequence analysis and optimal matching are useful heuristic tools for the descriptive analysis of heterogeneous individual pathways such as educational careers, job sequences or patterns of family formation. However, to date it remains unclear how to handle the inevitable problems caused by missing values with regard to such analysis. Multiple Imputation (MI) offers a possible solution for this problem but it has not been tested in the context of sequence analysis. Against this background, we contribute to the literature by assessing the potential of MI in the context of sequence analyses using an empirical example. Methodologically, we draw upon the work of Brendan Halpin and extend it to additional types of missing value patterns. Our empirical case is a sequence analysis of panel data with substantial attrition that examines the typical patterns and the persistence of sex segregation in school-to-work transitions in Switzerland. The preliminary results indicate that MI is a valuable methodology for handling missing values due to panel mortality in the context of sequence analysis. MI is especially useful in facilitating a sound interpretation of the resulting sequence types.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND A cost-effective strategy to increase the density of available markers within a population is to sequence a small proportion of the population and impute whole-genome sequence data for the remaining population. Increased densities of typed markers are advantageous for genome-wide association studies (GWAS) and genomic predictions. METHODS We obtained genotypes for 54 602 SNPs (single nucleotide polymorphisms) in 1077 Franches-Montagnes (FM) horses and Illumina paired-end whole-genome sequencing data for 30 FM horses and 14 Warmblood horses. After variant calling, the sequence-derived SNP genotypes (~13 million SNPs) were used for genotype imputation with the software programs Beagle, Impute2 and FImpute. RESULTS The mean imputation accuracy of FM horses using Impute2 was 92.0%. Imputation accuracy using Beagle and FImpute was 74.3% and 77.2%, respectively. In addition, for Impute2 we determined the imputation accuracy of all individual horses in the validation population, which ranged from 85.7% to 99.8%. The subsequent inclusion of Warmblood sequence data further increased the correlation between true and imputed genotypes for most horses, especially for horses with a high level of admixture. The final imputation accuracy of the horses ranged from 91.2% to 99.5%. CONCLUSIONS Using Impute2, the imputation accuracy was higher than 91% for all horses in the validation population, which indicates that direct imputation of 50k SNP-chip data to sequence level genotypes is feasible in the FM population. The individual imputation accuracy depended mainly on the applied software and the level of admixture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The fuzzy min–max neural network classifier is a supervised learning method. This classifier takes the hybrid neural networks and fuzzy systems approach. All input variables in the network are required to correspond to continuously valued variables, and this can be a significant constraint in many real-world situations where there are not only quantitative but also categorical data. The usual way of dealing with this type of variables is to replace the categorical by numerical values and treat them as if they were continuously valued. But this method, implicitly defines a possibly unsuitable metric for the categories. A number of different procedures have been proposed to tackle the problem. In this article, we present a new method. The procedure extends the fuzzy min–max neural network input to categorical variables by introducing new fuzzy sets, a new operation, and a new architecture. This provides for greater flexibility and wider application. The proposed method is then applied to missing data imputation in voting intention polls. The micro data—the set of the respondents’ individual answers to the questions—of this type of poll are especially suited for evaluating the method since they include a large number of numerical and categorical attributes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson’s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

National Highway Traffic Safety Administration, Washington, D.C.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A dividend imputation tax system provides shareholders with a credit (for corporate tax paid) that can be used to offset personal tax on dividend income. This paper shows how to infer the value of imputation tax credits from the prices of derivative securities that are unique to Australian retail markets. We also test whether a tax law amendment that was designed to prevent the trading of imputation credits affected their economic value. Before the amendment, tax credits were worth up to 50% of face value in large, high-yielding companies, but Subsequently it is difficult to detect any value at all. (C) 2003 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In large epidemiological studies missing data can be a problem, especially if information is sought on a sensitive topic or when a composite measure is calculated from several variables each affected by missing values. Multiple imputation is the method of choice for 'filling in' missing data based on associations among variables. Using an example about body mass index from the Australian Longitudinal Study on Women's Health, we identify a subset of variables that are particularly useful for imputing values for the target variables. Then we illustrate two uses of multiple imputation. The first is to examine and correct for bias when data are not missing completely at random. The second is to impute missing values for an important covariate; in this case omission from the imputation process of variables to be used in the analysis may introduce bias. We conclude with several recommendations for handling issues of missing data. Copyright (C) 2004 John Wiley Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objectives: To estimate differences in self-rated health by mode of administration and to assess the value of multiple imputation to make self-rated health comparable for telephone and mail. Methods: In 1996, Survey 1 of the Australian Longitudinal Study on Women's Health was answered by mail. In 1998, 706 and 11,595 mid-age women answered Survey 2 by telephone and mail respectively. Self-rated health was measured by the physical and mental health scores of the SF-36. Mean change in SF-36 scores between Surveys 1 and 2 were compared for telephone and mail respondents to Survey 2, before and after adjustment for socio-demographic and health characteristics. Missing values and SF-36 scores for telephone respondents at Survey 2 were imputed from SF-36 mail responses and telephone and mail responses to socio-demographic and health questions. Results: At Survey 2, self-rated health improved for telephone respondents but not mail respondents. After adjustment, mean changes in physical health and mental health scores remained higher (0.4 and 1.6 respectively) for telephone respondents compared with mail respondents (-1.2 and 0.1 respectively). Multiple imputation yielded adjusted changes in SF-36 scores that were similar for telephone and mail respondents. Conclusions and Implications: The effect of mode of administration on the change in mental health is important given that a difference of two points in SF-36 scores is accepted as clinically meaningful. Health evaluators should be aware of and adjust for the effects of mode of administration on self-rated health. Multiple imputation is one method that may be used to adjust SF-36 scores for mode of administration bias.