Biblioteca Digital

859 resultados para Imputed Values

Inférence doublement robuste en présence de données imputées dans les enquêtes

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L'imputation est souvent utilisée dans les enquêtes pour traiter la non-réponse partielle. Il est bien connu que traiter les valeurs imputées comme des valeurs observées entraîne une sous-estimation importante de la variance des estimateurs ponctuels. Pour remédier à ce problème, plusieurs méthodes d'estimation de la variance ont été proposées dans la littérature, dont des méthodes adaptées de rééchantillonnage telles que le Bootstrap et le Jackknife. Nous définissons le concept de double-robustesse pour l'estimation ponctuelle et de variance sous l'approche par modèle de non-réponse et l'approche par modèle d'imputation. Nous mettons l'emphase sur l'estimation de la variance à l'aide du Jackknife qui est souvent utilisé dans la pratique. Nous étudions les propriétés de différents estimateurs de la variance à l'aide du Jackknife pour l'imputation par la régression déterministe ainsi qu'aléatoire. Nous nous penchons d'abord sur le cas de l'échantillon aléatoire simple. Les cas de l'échantillonnage stratifié et à probabilités inégales seront aussi étudiés. Une étude de simulation compare plusieurs méthodes d'estimation de variance à l'aide du Jackknife en terme de biais et de stabilité relative quand la fraction de sondage n'est pas négligeable. Finalement, nous établissons la normalité asymptotique des estimateurs imputés pour l'imputation par régression déterministe et aléatoire.

Estimation de la variance en présence de données imputées pour des plans de sondage à grande entropie

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Les travaux portent sur l’estimation de la variance dans le cas d’une non- réponse partielle traitée par une procédure d’imputation. Traiter les valeurs imputées comme si elles avaient été observées peut mener à une sous-estimation substantielle de la variance des estimateurs ponctuels. Les estimateurs de variance usuels reposent sur la disponibilité des probabilités d’inclusion d’ordre deux, qui sont parfois difficiles (voire impossibles) à calculer. Nous proposons d’examiner les propriétés d’estimateurs de variance obtenus au moyen d’approximations des probabilités d’inclusion d’ordre deux. Ces approximations s’expriment comme une fonction des probabilités d’inclusion d’ordre un et sont généralement valides pour des plans à grande entropie. Les résultats d’une étude de simulation, évaluant les propriétés des estimateurs de variance proposés en termes de biais et d’erreur quadratique moyenne, seront présentés.

Méthodes de rééchantillonnage en méthodologie d'enquête

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Le sujet principal de cette thèse porte sur l'étude de l'estimation de la variance d'une statistique basée sur des données d'enquête imputées via le bootstrap (ou la méthode de Cyrano). L'application d'une méthode bootstrap conçue pour des données d'enquête complètes (en absence de non-réponse) en présence de valeurs imputées et faire comme si celles-ci étaient de vraies observations peut conduire à une sous-estimation de la variance. Dans ce contexte, Shao et Sitter (1996) ont introduit une procédure bootstrap dans laquelle la variable étudiée et l'indicateur de réponse sont rééchantillonnés ensemble et les non-répondants bootstrap sont imputés de la même manière qu'est traité l'échantillon original. L'estimation bootstrap de la variance obtenue est valide lorsque la fraction de sondage est faible. Dans le chapitre 1, nous commençons par faire une revue des méthodes bootstrap existantes pour les données d'enquête (complètes et imputées) et les présentons dans un cadre unifié pour la première fois dans la littérature. Dans le chapitre 2, nous introduisons une nouvelle procédure bootstrap pour estimer la variance sous l'approche du modèle de non-réponse lorsque le mécanisme de non-réponse uniforme est présumé. En utilisant seulement les informations sur le taux de réponse, contrairement à Shao et Sitter (1996) qui nécessite l'indicateur de réponse individuelle, l'indicateur de réponse bootstrap est généré pour chaque échantillon bootstrap menant à un estimateur bootstrap de la variance valide même pour les fractions de sondage non-négligeables. Dans le chapitre 3, nous étudions les approches bootstrap par pseudo-population et nous considérons une classe plus générale de mécanismes de non-réponse. Nous développons deux procédures bootstrap par pseudo-population pour estimer la variance d'un estimateur imputé par rapport à l'approche du modèle de non-réponse et à celle du modèle d'imputation. Ces procédures sont également valides même pour des fractions de sondage non-négligeables.

Adjusted jackknife for imputation under unequal probability sampling without replacement

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Imputation is commonly used to compensate for item non-response in sample surveys. If we treat the imputed values as if they are true values, and then compute the variance estimates by using standard methods, such as the jackknife, we can seriously underestimate the true variances. We propose a modified jackknife variance estimator which is defined for any without-replacement unequal probability sampling design in the presence of imputation and non-negligible sampling fraction. Mean, ratio and random-imputation methods will be considered. The practical advantage of the method proposed is its breadth of applicability.

On the influence of imputation in classification: practical issues

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.

On the statistical analysis of the GS-NS0 cell proteome: Imputation, clustering and variability testing

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation. (c) 2006 Elsevier B.V. All rights reserved.

Multivariate analysis of regional-scale geochemical data for environmental monitoring

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.

Multiple Imputation of missing values in exploratory factor analysis of multidimensional scales: estimating latent trait scores

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ABSTRACT Researchers frequently have to analyze scales in which some participants have failed to respond to some items. In this paper we focus on the exploratory factor analysis of multidimensional scales (i.e., scales that consist of a number of subscales) where each subscale is made up of a number of Likert-type items, and the aim of the analysis is to estimate participants' scores on the corresponding latent traits. We propose a new approach to deal with missing responses in such a situation that is based on (1) multiple imputation of non-responses and (2) simultaneous rotation of the imputed datasets. We applied the approach in a real dataset where missing responses were artificially introduced following a real pattern of non-responses, and a simulation study based on artificial datasets. The results show that our approach (specifically, Hot-Deck multiple imputation followed of Consensus Promin rotation) was able to successfully compute factor score estimates even for participants that have missing data.

Reference Values For High-density Lipoprotein Particle Size And Volume By Dynamic Light Scattering In A Brazilian Population Sample And Their Relationships With Metabolic Parameters.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current data indicate that the size of high-density lipoprotein (HDL) may be considered an important marker for cardiovascular disease risk. We established reference values of mean HDL size and volume in an asymptomatic representative Brazilian population sample (n=590) and their associations with metabolic parameters by gender. Size and volume were determined in HDL isolated from plasma by polyethyleneglycol precipitation of apoB-containing lipoproteins and measured using the dynamic light scattering (DLS) technique. Although the gender and age distributions agreed with other studies, the mean HDL size reference value was slightly lower than in some other populations. Both HDL size and volume were influenced by gender and varied according to age. HDL size was associated with age and HDL-C (total population); non- white ethnicity and CETP inversely (females); HDL-C and PLTP mass (males). On the other hand, HDL volume was determined only by HDL-C (total population and in both genders) and by PLTP mass (males). The reference values for mean HDL size and volume using the DLS technique were established in an asymptomatic and representative Brazilian population sample, as well as their related metabolic factors. HDL-C was a major determinant of HDL size and volume, which were differently modulated in females and in males.

Are torque values of preadjusted brackets precise?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: The aim of the present study was to verify the torque precision of metallic brackets with MBT prescription using the canine brackets as the representative sample of six commercial brands. MATERIAL AND METHODS: Twenty maxillary and 20 mandibular canine brackets of one of the following commercial brands were selected: 3M Unitek, Abzil, American Orthodontics, TP Orthodontics, Morelli and Ortho Organizers. The torque angle, established by reference points and lines, was measured by an operator using an optical microscope coupled to a computer. The values were compared to those established by the MBT prescription. RESULTS: The results showed that for the maxillary canine brackets, only the Morelli torque (-3.33º) presented statistically significant difference from the proposed values (-7º). For the mandibular canines, American Orthodontics (-6.34º) and Ortho Organizers (-6.25º) presented statistically significant differences from the standards (-6º). Comparing the brands, Morelli presented statistically significant differences in comparison with all the other brands for maxillary canine brackets. For the mandibular canine brackets, there was no statistically significant difference between the brands. CONCLUSIONS: There are significant variations in torque values of some of the brackets assessed, which would clinically compromise the buccolingual positioning of the tooth at the end of orthodontic treatment.

Estimation and prediction of parameters and breeding values in soybean using REML/BLUP and Least Squares

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this study was to compare REML/BLUP and Least Square procedures in the prediction and estimation of genetic parameters and breeding values in soybean progenies. F(2:3) and F(4:5) progenies were evaluated in the 2005/06 growing season and the F(2:4) and F(4:6) generations derived thereof were evaluated in 2006/07. These progenies were originated from two semi-early, experimental lines that differ in grain yield. The experiments were conducted in a lattice design and plots consisted of a 2 m row, spaced 0.5 m apart. The trait grain yield per plot was evaluated. It was observed that early selection is more efficient for the discrimination of the best lines from the F(4) generation onwards. No practical differences were observed between the least square and REML/BLUP procedures in the case of the models and simplifications for REML/BLUP used here.

Predictive values of aspartate aminotransferase and gamma-glutamyl transferase for the hepatic accumulation of copper in cattle and buffalo

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ten cattle and 10 buffalo were divided into 2 groups (control [n = 8] and experimental [n = 12]) that received daily administration of copper. Three hepatic biopsies and blood samples were performed on days 0, 45, and 105. The concentration of hepatic copper was determined by spectrophotometric atomic absorption, and the activities of aspartate aminotransferase (AST) and gamma-glutamyl transferase (GGT) were analyzed. Regression analyses were done to verify the possible existing relationship between enzymatic activity and concentration of hepatic copper. Sensitivity, specificity, accuracy, and positive and negative predictive values were determined. The serum activities of AST and GGT had coefficients of determination that were excellent predictive indicators of hepatic copper accumulation in cattle, while only GGT serum activity was predictive of hepatic copper accumulation in buffalo. Elevated serum GGT activity may be indicative of increased concentrations of hepatic copper even in cattle and buffalo that appear to be clinically healthy. Thus, prophylactic measures can be implemented to prevent the onset of a hemolytic crisis that is characteristic of copper intoxication.

Both accurate and precise gf-values for Fe II lines

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a new set of oscillator strengths for 142 Fe II lines in the wavelength range 4000-8000 angstrom. Our gf-values are both accurate and precise, because each multiplet was globally normalized using laboratory data ( accuracy), while the relative gf-values of individual lines within a given multiplet were obtained from theoretical calculations ( precision). Our line list was tested with the Sun and high-resolution (R approximate to 10(5)), high-S/N (approximate to 700-900) Keck+HIRES spectra of the metal-poor stars HD 148816 and HD 140283, for which line-to-line scatter (sigma) in the iron abundances from Fe II lines as low as 0.03, 0.04, and 0.05 dex are found, respectively. For these three stars the standard error in the mean iron abundance from Fe II lines is negligible (sigma(mean) <= 0.01 dex). The mean solar iron abundance obtained using our gf-values and different model atmospheres is A(Fe) = 7.45(sigma = 0.02).

Neutron Activation Analysis: A Primary (Ratio) Method to Determine SI-Traceable Values of Element Content in Complex Samples

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The metrological principles of neutron activation analysis are discussed. It has been demonstrated that this method can provide elemental amount of substance with values fully traceable to the SI. The method has been used by several laboratories worldwide in a number of CCQM key comparisons - interlaboratory comparison tests at the highest metrological level - supplying results equivalent to values from other methods for elemental or isotopic analysis in complex samples without the need to perform chemical destruction and dissolution of these samples. The CCOM accepted therefore in April 2007 the claim that neutron activation analysis should have the similar status as the methods originally listed by the CCOM as `primary methods of measurement`. Analytical characteristics and scope of application are given.

Profiles of xylose reductase, xylitol dehydrogenase and xylitol production under different oxygen transfer volumetric coefficient values

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Xylitol is a sugar alcohol (polyalcohol) with many interesting properties for pharmaceutical and food products. It is currently produced by a chemical process, which has some disadvantages such as high energy requirement. Therefore microbiological production of xylitol has been studied as an alternative, but its viability is dependent on optimisation of the fermentation variables. Among these, aeration is fundamental, because xylitol is produced only under adequate oxygen availability. In most experiments with xylitol-producing yeasts, low oxygen transfer volumetric coefficient (K(L)a) values are used to maintain microaerobic conditions. However, in the present study the use of relatively high K(L)a values resulted in high xylitol production. The effect of aeration was also evaluated via the profiles of xylose reductase (XR) and xylitol clehydrogenase (XD) activities during the experiments. RESULTS: The highest XR specific activity (1.45 +/- 0.21 U mg(protein)(-1)) was achieved during the experiment with the lowest K(L)a value (12 h(-1)), while the highest XD specific activity (0.19 +/- 0.03 U mg(protein)(-1)) was observed with a K(L)a value of 25 h(-1). Xylitol production was enhanced when K(L)a was increased from 12 to 50 h(-1), which resulted in the best condition observed, corresponding to a xylitol volumetric productivity of 1.50 +/- 0.08 g(xylitol) L(-1) h(-1) and an efficiency of 71 +/- 6.0%. CONCLUSION: The results showed that the enzyme activities during xylitol bioproduction depend greatly on the initial KLa value (oxygen availability). This finding supplies important information for further studies in molecular biology and genetic engineering aimed at improving xylitol bioproduction. (C) 2008 Society of Chemical Industry

«
1
2
3
4
5
6
7
8
...
57
58
»