973 resultados para MISSING VALUE ESTIMATION


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Attrition in longitudinal studies can lead to biased results. The study is motivated by the unexpected observation that alcohol consumption decreased despite increased availability, which may be due to sample attrition of heavy drinkers. Several imputation methods have been proposed, but rarely compared in longitudinal studies of alcohol consumption. The imputation of consumption level measurements is computationally particularly challenging due to alcohol consumption being a semi-continuous variable (dichotomous drinking status and continuous volume among drinkers), and the non-normality of data in the continuous part. Data come from a longitudinal study in Denmark with four waves (2003-2006) and 1771 individuals at baseline. Five techniques for missing data are compared: Last value carried forward (LVCF) was used as a single, and Hotdeck, Heckman modelling, multivariate imputation by chained equations (MICE), and a Bayesian approach as multiple imputation methods. Predictive mean matching was used to account for non-normality, where instead of imputing regression estimates, "real" observed values from similar cases are imputed. Methods were also compared by means of a simulated dataset. The simulation showed that the Bayesian approach yielded the most unbiased estimates for imputation. The finding of no increase in consumption levels despite a higher availability remained unaltered. Copyright (C) 2011 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning disability (LD) is a neurological condition that affects a child’s brain and impairs his ability to carry out one or many specific tasks. LD affects about 10% of children enrolled in schools. There is no cure for learning disabilities and they are lifelong. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Just as there are many different types of LDs, there are a variety of tests that may be done to pinpoint the problem The information gained from an evaluation is crucial for finding out how the parents and the school authorities can provide the best possible learning environment for child. This paper proposes a new approach in artificial neural network (ANN) for identifying LD in children at early stages so as to solve the problems faced by them and to get the benefits to the students, their parents and school authorities. In this study, we propose a closest fit algorithm data preprocessing with ANN classification to handle missing attribute values. This algorithm imputes the missing values in the preprocessing stage. Ignoring of missing attribute values is a common trend in all classifying algorithms. But, in this paper, we use an algorithm in a systematic approach for classification, which gives a satisfactory result in the prediction of LD. It acts as a tool for predicting the LD accurately, and good information of the child is made available to the concerned

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The substitution of missing values, also called imputation, is an important data preparation task for many domains. Ideally, the substitution of missing values should not insert biases into the dataset. This aspect has been usually assessed by some measures of the prediction capability of imputation methods. Such measures assume the simulation of missing entries for some attributes whose values are actually known. These artificially missing values are imputed and then compared with the original values. Although this evaluation is useful, it does not allow the influence of imputed values in the ultimate modelling task (e.g. in classification) to be inferred. We argue that imputation cannot be properly evaluated apart from the modelling task. Thus, alternative approaches are needed. This article elaborates on the influence of imputed values in classification. In particular, a practical procedure for estimating the inserted bias is described. As an additional contribution, we have used such a procedure to empirically illustrate the performance of three imputation methods (majority, naive Bayes and Bayesian networks) in three datasets. Three classifiers (decision tree, naive Bayes and nearest neighbours) have been used as modelling tools in our experiments. The achieved results illustrate a variety of situations that can take place in the data preparation practice.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson’s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Oral itraconazole (ITRA) is used for the treatment of allergic bronchopulmonary aspergillosis in patients with cystic fibrosis (CF) because of its antifungal activity against Aspergillus species. ITRA has an active hydroxy-metabolite (OH-ITRA) which has similar antifungal activity. ITRA is a highly lipophilic drug which is available in two different oral formulations, a capsule and an oral solution. It is reported that the oral solution has a 60% higher relative bioavailability. The influence of altered gastric physiology associated with CF on the pharmacokinetics (PK) of ITRA and its metabolite has not been previously evaluated. Objectives: 1) To estimate the population (pop) PK parameters for ITRA and its active metabolite OH-ITRA including relative bioavailability of the parent after administration of the parent by both capsule and solution and 2) to assess the performance of the optimal design. Methods: The study was a cross-over design in which 30 patients received the capsule on the first occasion and 3 days later the solution formulation. The design was constrained to have a maximum of 4 blood samples per occasion for estimation of the popPK of both ITRA and OH-ITRA. The sampling times for the population model were optimized previously using POPT v.2.0.[1] POPT is a series of applications that run under MATLAB and provide an evaluation of the information matrix for a nonlinear mixed effects model given a particular design. In addition it can be used to optimize the design based on evaluation of the determinant of the information matrix. The model details for the design were based on prior information obtained from the literature, which suggested that ITRA may have either linear or non-linear elimination. The optimal sampling times were evaluated to provide information for both competing models for the parent and metabolite and for both capsule and solution simultaneously. Blood samples were assayed by validated HPLC.[2] PopPK modelling was performed using FOCE with interaction under NONMEM, version 5 (level 1.1; GloboMax LLC, Hanover, MD, USA). The PK of ITRA and OH‑ITRA was modelled simultaneously using ADVAN 5. Subsequently three methods were assessed for modelling concentrations less than the LOD (limit of detection). These methods (corresponding to methods 5, 6 & 4 from Beal[3], respectively) were (a) where all values less than LOD were assigned to half of LOD, (b) where the closest missing value that is less than LOD was assigned to half the LOD and all previous (if during absorption) or subsequent (if during elimination) missing samples were deleted, and (c) where the contribution of the expectation of each missing concentration to the likelihood is estimated. The LOD was 0.04 mg/L. The final model evaluation was performed via bootstrap with re-sampling and a visual predictive check. The optimal design and the sampling windows of the study were evaluated for execution errors and for agreement between the observed and predicted standard errors. Dosing regimens were simulated for the capsules and the oral solution to assess their ability to achieve ITRA target trough concentration (Cmin,ss of 0.5-2 mg/L) or a combined Cmin,ss for ITRA and OH-ITRA above 1.5mg/L. Results and Discussion: A total of 241 blood samples were collected and analysed, 94% of them were taken within the defined optimal sampling windows, of which 31% where taken within 5 min of the exact optimal times. Forty six per cent of the ITRA values and 28% of the OH-ITRA values were below LOD. The entire profile after administration of the capsule for five patients was below LOD and therefore the data from this occasion was omitted from estimation. A 2-compartment model with 1st order absorption and elimination best described ITRA PK, with 1st order metabolism of the parent to OH-ITRA. For ITRA the clearance (ClItra/F) was 31.5 L/h; apparent volumes of central and peripheral compartments were 56.7 L and 2090 L, respectively. Absorption rate constants for capsule (kacap) and solution (kasol) were 0.0315 h-1 and 0.125 h-1, respectively. Comparative bioavailability of the capsule was 0.82. There was no evidence of nonlinearity in the popPK of ITRA. No screened covariate significantly improved the fit to the data. The results of the parameter estimates from the final model were comparable between the different methods for accounting for missing data, (M4,5,6)[3] and provided similar parameter estimates. The prospective application of an optimal design was found to be successful. Due to the sampling windows, most of the samples could be collected within the daily hospital routine, but still at times that were near optimal for estimating the popPK parameters. The final model was one of the potential competing models considered in the original design. The asymptotic standard errors provided by NONMEM for the final model and empirical values from bootstrap were similar in magnitude to those predicted from the Fisher Information matrix associated with the D-optimal design. Simulations from the final model showed that the current dosing regimen of 200 mg twice daily (bd) would provide a target Cmin,ss (0.5-2 mg/L) for only 35% of patients when administered as the solution and 31% when administered as capsules. The optimal dosing schedule was 500mg bd for both formulations. The target success for this dosing regimen was 87% for the solution with an NNT=4 compared to capsules. This means, for every 4 patients treated with the solution one additional patient will achieve a target success compared to capsule but at an additional cost of AUD $220 per day. The therapeutic target however is still doubtful and potential risks of these dosing schedules need to be assessed on an individual basis. Conclusion: A model was developed which described the popPK of ITRA and its main active metabolite OH-ITRA in adult CF after administration of both capsule and solution. The relative bioavailability of ITRA from the capsule was 82% that of the solution, but considerably more variable. To incorporate missing data, using the simple Beal method 5 (using half LOD for all samples below LOD) provided comparable results to the more complex but theoretically better Beal method 4 (integration method). The optimal sparse design performed well for estimation of model parameters and provided a good fit to the data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose – Quantitative instruments to assess patient safety culture have been developed recently and a few review articles have been published. Measuring safety culture enables healthcare managers and staff to improve safety behaviours and outcomes for patients and staff. The study aims to determine the AHRQ Hospital Survey on Patient Safety Culture (HSPSC) Portuguese version's validity and reliability. Design/methodology/approach – A missing-value analysis and item analysis was performed to identify problematic items. Reliability analysis, inter-item correlations and inter-scale correlations were done to check internal consistency, composite scores. Inter-correlations were examined to assess construct validity. A confirmatory factor analysis was performed to investigate the observed data's fit to the dimensional structure proposed in the AHRQ HSPSC Portuguese version. To analyse differences between hospitals concerning composites scores, an ANOVA analysis and multiple comparisons were done. Findings – Eight of 12 dimensions had Cronbach's alphas higher than 0.7. The instrument as a whole achieved a high Cronbach's alpha (0.91). Inter-correlations showed that there is no dimension with redundant items, however dimension 10 increased its internal consistency when one item is removed. Originality/value – This study is the first to evaluate an American patient safety culture survey using Portuguese data. The survey has satisfactory reliability and construct validity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Les logiciels utilisés sont Splus et R.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Learning Disability (LD) is a classification including several disorders in which a child has difficulty in learning in a typical manner, usually caused by an unknown factor or factors. LD affects about 15% of children enrolled in schools. The prediction of learning disability is a complicated task since the identification of LD from diverse features or signs is a complicated problem. There is no cure for learning disabilities and they are life-long. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. The aim of this paper is to develop a new algorithm for imputing missing values and to determine the significance of the missing value imputation method and dimensionality reduction method in the performance of fuzzy and neuro fuzzy classifiers with specific emphasis on prediction of learning disabilities in school age children. In the basic assessment method for prediction of LD, checklists are generally used and the data cases thus collected fully depends on the mood of children and may have also contain redundant as well as missing values. Therefore, in this study, we are proposing a new algorithm, viz. the correlation based new algorithm for imputing the missing values and Principal Component Analysis (PCA) for reducing the irrelevant attributes. After the study, it is found that, the preprocessing methods applied by us improves the quality of data and thereby increases the accuracy of the classifiers. The system is implemented in Math works Software Mat Lab 7.10. The results obtained from this study have illustrated that the developed missing value imputation method is very good contribution in prediction system and is capable of improving the performance of a classifier.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Durante o processo de extração do conhecimento em bases de dados, alguns problemas podem ser encontrados como por exemplo, a ausência de determinada instância de um atributo. A ocorrência de tal problemática pode causar efeitos danosos nos resultados finais do processo, pois afeta diretamente a qualidade dos dados a ser submetido a um algoritmo de aprendizado de máquina. Na literatura, diversas propostas são apresentadas a fim de contornar tal dano, dentre eles está a de imputação de dados, a qual estima um valor plausível para substituir o ausente. Seguindo essa área de solução para o problema de valores ausentes, diversos trabalhos foram analisados e algumas observações foram realizadas como, a pouca utilização de bases sintéticas que simulem os principais mecanismos de ausência de dados e uma recente tendência a utilização de algoritmos bio-inspirados como tratamento do problema. Com base nesse cenário, esta dissertação apresenta um método de imputação de dados baseado em otimização por enxame de partículas, pouco explorado na área, e o aplica para o tratamento de bases sinteticamente geradas, as quais consideram os principais mecanismos de ausência de dados, MAR, MCAR e NMAR. Os resultados obtidos ao comprar diferentes configurações do método à outros dois conhecidos na área (KNNImpute e SVMImpute) são promissores para sua utilização na área de tratamento de valores ausentes uma vez que alcançou os melhores valores na maioria dos experimentos realizados.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This study aims to show some methods of environmental assets valuation. These methods are intended to assist in the economic value estimation to environmental resources by the simulation of a hypothetical market, even if there are no market prices related to them. It is about an individual preference measure against the environmental changes. Thus, the environmental valuation methods do not convert a natural resource into a market product. The present report evaluates the applicability of the valuation methods to determine the economic value for assets or resources. The challenge for all is to understand the current economic and ecologic thinking and their limitations, so as to seek improvements in the perception of natural phenomena and in the economic-oriented goal, which is the sustainable development.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Five different methods were critically examined to characterize the pore structure of the silica monoliths. The mesopore characterization was performed using: a) the classical BJH method of nitrogen sorption data, which showed overestimated values in the mesopore distribution and was improved by using the NLDFT method, b) the ISEC method implementing the PPM and PNM models, which were especially developed for monolithic silicas, that contrary to the particulate supports, demonstrate the two inflection points in the ISEC curve, enabling the calculation of pore connectivity, a measure for the mass transfer kinetics in the mesopore network, c) the mercury porosimetry using a new recommended mercury contact angle values. rnThe results of the characterization of mesopores of monolithic silica columns by the three methods indicated that all methods were useful with respect to the pore size distribution by volume, but only the ISEC method with implemented PPM and PNM models gave the average pore size and distribution based on the number average and the pore connectivity values.rnThe characterization of the flow-through pore was performed by two different methods: a) the mercury porosimetry, which was used not only for average flow-through pore value estimation, but also the assessment of entrapment. It was found that the mass transfer from the flow-through pores to mesopores was not hindered in case of small sized flow-through pores with a narrow distribution, b) the liquid penetration where the average flow-through pore values were obtained via existing equations and improved by the additional methods developed according to Hagen-Poiseuille rules. The result was that not the flow-through pore size influences the column bock pressure, but the surface area to volume ratio of silica skeleton is most decisive. Thus the monolith with lowest ratio values will be the most permeable. rnThe flow-through pore characterization results obtained by mercury porosimetry and liquid permeability were compared with the ones from imaging and image analysis. All named methods enable a reliable characterization of the flow-through pore diameters for the monolithic silica columns, but special care should be taken about the chosen theoretical model.rnThe measured pore characterization parameters were then linked with the mass transfer properties of monolithic silica columns. As indicated by the ISEC results, no restrictions in mass transfer resistance were noticed in mesopores due to their high connectivity. The mercury porosimetry results also gave evidence that no restrictions occur for mass transfer from flow-through pores to mesopores in the small scaled silica monoliths with narrow distribution. rnThe prediction of the optimum regimes of the pore structural parameters for the given target parameters in HPLC separations was performed. It was found that a low mass transfer resistance in the mesopore volume is achieved when the nominal diameter of the number average size distribution of the mesopores is appr. an order of magnitude larger that the molecular radius of the analyte. The effective diffusion coefficient of an analyte molecule in the mesopore volume is strongly dependent on the value of the nominal pore diameter of the number averaged pore size distribution. The mesopore size has to be adapted to the molecular size of the analyte, in particular for peptides and proteins. rnThe study on flow-through pores of silica monoliths demonstrated that the surface to volume of the skeletons ratio and external porosity are decisive for the column efficiency. The latter is independent from the flow-through pore diameter. The flow-through pore characteristics by direct and indirect approaches were assessed and theoretical column efficiency curves were derived. The study showed that next to the surface to volume ratio, the total porosity and its distribution of the flow-through pores and mesopores have a substantial effect on the column plate number, especially as the extent of adsorption increases. The column efficiency is increasing with decreasing flow through pore diameter, decreasing with external porosity, and increasing with total porosity. Though this tendency has a limit due to heterogeneity of the studied monolithic samples. We found that the maximum efficiency of the studied monolithic research columns could be reached at a skeleton diameter of ~ 0.5 µm. Furthermore when the intention is to maximize the column efficiency, more homogeneous monoliths should be prepared.rn

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sequence analysis and optimal matching are useful heuristic tools for the descriptive analysis of heterogeneous individual pathways such as educational careers, job sequences or patterns of family formation. However, to date it remains unclear how to handle the inevitable problems caused by missing values with regard to such analysis. Multiple Imputation (MI) offers a possible solution for this problem but it has not been tested in the context of sequence analysis. Against this background, we contribute to the literature by assessing the potential of MI in the context of sequence analyses using an empirical example. Methodologically, we draw upon the work of Brendan Halpin and extend it to additional types of missing value patterns. Our empirical case is a sequence analysis of panel data with substantial attrition that examines the typical patterns and the persistence of sex segregation in school-to-work transitions in Switzerland. The preliminary results indicate that MI is a valuable methodology for handling missing values due to panel mortality in the context of sequence analysis. MI is especially useful in facilitating a sound interpretation of the resulting sequence types.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^