949 resultados para missing data imputation


Relevância:

90.00% 90.00%

Publicador:

Resumo:

OBJECTIVE: To describe the electronic medical databases used in antiretroviral therapy (ART) programmes in lower-income countries and assess the measures such programmes employ to maintain and improve data quality and reduce the loss of patients to follow-up. METHODS: In 15 countries of Africa, South America and Asia, a survey was conducted from December 2006 to February 2007 on the use of electronic medical record systems in ART programmes. Patients enrolled in the sites at the time of the survey but not seen during the previous 12 months were considered lost to follow-up. The quality of the data was assessed by computing the percentage of missing key variables (age, sex, clinical stage of HIV infection, CD4+ lymphocyte count and year of ART initiation). Associations between site characteristics (such as number of staff members dedicated to data management), measures to reduce loss to follow-up (such as the presence of staff dedicated to tracing patients) and data quality and loss to follow-up were analysed using multivariate logit models. FINDINGS: Twenty-one sites that together provided ART to 50 060 patients were included (median number of patients per site: 1000; interquartile range, IQR: 72-19 320). Eighteen sites (86%) used an electronic database for medical record-keeping; 15 (83%) such sites relied on software intended for personal or small business use. The median percentage of missing data for key variables per site was 10.9% (IQR: 2.0-18.9%) and declined with training in data management (odds ratio, OR: 0.58; 95% confidence interval, CI: 0.37-0.90) and weekly hours spent by a clerk on the database per 100 patients on ART (OR: 0.95; 95% CI: 0.90-0.99). About 10 weekly hours per 100 patients on ART were required to reduce missing data for key variables to below 10%. The median percentage of patients lost to follow-up 1 year after starting ART was 8.5% (IQR: 4.2-19.7%). Strategies to reduce loss to follow-up included outreach teams, community-based organizations and checking death registry data. Implementation of all three strategies substantially reduced losses to follow-up (OR: 0.17; 95% CI: 0.15-0.20). CONCLUSION: The quality of the data collected and the retention of patients in ART treatment programmes are unsatisfactory for many sites involved in the scale-up of ART in resource-limited settings, mainly because of insufficient staff trained to manage data and trace patients lost to follow-up.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

OBJECTIVE: To explore the feasibility and psychometric properties of a self-administered version of the 24-item Geriatric Pain Measure (GPM-24-SA). DESIGN: Secondary analysis of baseline data from the Prevention in Older People-Assessment in Generalists' practices trial, an international multi-center study of a health-risk appraisal system. PARTICIPANTS: One thousand seventy-two community dwelling nondisabled older adults self-reporting pain from London, UK; Hamburg, Germany; and Solothurn, Switzerland. OUTCOME MEASURES: GPM-24-SA as part of a multidimensional Health Risk Appraisal Questionnaire including self-reported demographic and health-related information. RESULTS: Among the 1,072 subjects, 655 had complete GPM-24-SA data, 404 had missing GPM-24-SA data, and 13 had >30% missing GPM-24-SA data. In psychometric analyses across the three European populations with complete GPM-24-SA data, the measure exhibited stable internal consistency, good convergent, divergent and discriminant validity, and produced stable pain measurements. However, factor analysis indicated differences in the GPM-24-SA across sites with discrepancies mainly related to items of a single subscale that failed to load appropriately. Analyses including imputation for subjects with missing data demonstrated psychometric properties comparable to complete data analyses suggesting that imputation in cases with missing GPM-24-SA data provides sufficient information to generate a valid score. CONCLUSION: The GPM-24-SA is a promising tool for self-administered assessment of pain in community dwelling older adults. However, because of incomplete response and uncertainty in factor structure, further refinement and psychometric evaluation of the GPM-24-SA is needed before it could be recommended for widespread use.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Principal Component Analysis (PCA) is a popular method for dimension reduction that can be used in many fields including data compression, image processing, exploratory data analysis, etc. However, traditional PCA method has several drawbacks, since the traditional PCA method is not efficient for dealing with high dimensional data and cannot be effectively applied to compute accurate enough principal components when handling relatively large portion of missing data. In this report, we propose to use EM-PCA method for dimension reduction of power system measurement with missing data, and provide a comparative study of traditional PCA and EM-PCA methods. Our extensive experimental results show that EM-PCA method is more effective and more accurate for dimension reduction of power system measurement data than traditional PCA method when dealing with large portion of missing data set.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND Low-grade gliomas (LGGs) are rare brain neoplasms, with survival spanning up to a few decades. Thus, accurate evaluations on how biomarkers impact survival among patients with LGG require long-term studies on samples prospectively collected over a long period. METHODS The 210 adult LGGs collected in our databank were screened for IDH1 and IDH2 mutations (IDHmut), MGMT gene promoter methylation (MGMTmet), 1p/19q loss of heterozygosity (1p19qloh), and nuclear TP53 immunopositivity (TP53pos). Multivariate survival analyses with multiple imputation of missing data were performed using either histopathology or molecular markers. Both models were compared using Akaike's information criterion (AIC). The molecular model was reduced by stepwise model selection to filter out the most critical predictors. A third model was generated to assess for various marker combinations. RESULTS Molecular parameters were better survival predictors than histology (ΔAIC = 12.5, P< .001). Forty-five percent of studied patients died. MGMTmet was positively associated with IDHmut (P< .001). In the molecular model with marker combinations, IDHmut/MGMTmet combined status had a favorable impact on overall survival, compared with IDHwt (hazard ratio [HR] = 0.33, P< .01), and even more so the triple combination, IDHmut/MGMTmet/1p19qloh (HR = 0.18, P< .001). Furthermore, IDHmut/MGMTmet/TP53pos triple combination was a significant risk factor for malignant transformation (HR = 2.75, P< .05). CONCLUSION By integrating networks of activated molecular glioma pathways, the model based on genotype better predicts prognosis than histology and, therefore, provides a more reliable tool for standardizing future treatment strategies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Loss to follow-up (LTFU) is a common problem in many epidemiological studies. In antiretroviral treatment (ART) programs for patients with human immunodeficiency virus (HIV), mortality estimates can be biased if the LTFU mechanism is non-ignorable, that is, mortality differs between lost and retained patients. In this setting, routine procedures for handling missing data may lead to biased estimates. To appropriately deal with non-ignorable LTFU, explicit modeling of the missing data mechanism is needed. This can be based on additional outcome ascertainment for a sample of patients LTFU, for example, through linkage to national registries or through survey-based methods. In this paper, we demonstrate how this additional information can be used to construct estimators based on inverse probability weights (IPW) or multiple imputation. We use simulations to contrast the performance of the proposed estimators with methods widely used in HIV cohort research for dealing with missing data. The practical implications of our approach are illustrated using South African ART data, which are partially linkable to South African national vital registration data. Our results demonstrate that while IPWs and proper imputation procedures can be easily constructed from additional outcome ascertainment to obtain valid overall estimates, neglecting non-ignorable LTFU can result in substantial bias. We believe the proposed estimators are readily applicable to a growing number of studies where LTFU is appreciable, but additional outcome data are available through linkage or surveys of patients LTFU. Copyright © 2013 John Wiley & Sons, Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: Prognostic models for children starting antiretroviral therapy (ART) in Africa are lacking. We developed models to estimate the probability of death during the first year receiving ART in Southern Africa. METHODS: We analyzed data from children ≤10 years old who started ART in Malawi, South Africa, Zambia or Zimbabwe from 2004-2010. Children lost to follow-up or transferred were excluded. The primary outcome was all-cause mortality in the first year of ART. We used Weibull survival models to construct two prognostic models: one with CD4%, age, WHO clinical stage, weight-for-age z-score (WAZ) and anemia and one without CD4%, because it is not routinely measured in many programs. We used multiple imputation to account for missing data. RESULTS: Among 12655 children, 877 (6.9%) died in the first year of ART. 1780 children were lost to follow-up/transferred and excluded from main analyses; 10875 children were included. With the CD4% model probability of death at 1 year ranged from 1.8% (95% CI: 1.5-2.3) in children 5-10 years with CD4% ≥10%, WHO stage I/II, WAZ ≥-2 and without severe anemia to 46.3% (95% CI: 38.2-55.2) in children <1 year with CD4% <5%, stage III/IV, WAZ< -3 and severe anemia. The corresponding range for the model without CD4% was 2.2% (95% CI: 1.8-2.7) to 33.4% (95% CI: 28.2-39.3). Agreement between predicted and observed mortality was good (C-statistics=0.753 and 0.745 for models with and without CD4% respectively). CONCLUSION: These models may be useful to counsel children/caregivers, for program planning and to assess program outcomes after allowing for differences in patient disease severity characteristics.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Geralmente, nos experimentos genótipo por ambiente (G × E) é comum observar o comportamento dos genótipos em relação a distintos atributos nos ambientes considerados. A análise deste tipo de experimentos tem sido abordada amplamente para o caso de um único atributo. Nesta tese são apresentadas algumas alternativas de análise considerando genótipos, ambientes e atributos simultaneamente. A primeira, é baseada no método de mistura de máxima verossimilhança de agrupamento - Mixclus e a análise de componentes principais de 3 modos - 3MPCA, que permitem a análise de tabelas de tripla entrada, estes dois métodos têm sido muito usados na área da psicologia e da química, mas pouco na agricultura. A segunda, é uma metodologia que combina, o modelo de efeitos aditivos com interação multiplicativa - AMMI, modelo eficiente para a análise de experimentos (G × E) com um atributo e a análise de procrustes generalizada, que permite comparar configurações de pontos e proporcionar uma medida numérica de quanto elas diferem. Finalmente, é apresentada uma alternativa para realizar imputação de dados nos experimentos (G × E), pois, uma situação muito frequente nestes experimentos, é a presença de dados faltantes. Conclui-se que as metodologias propostas constituem ferramentas úteis para a análise de experimentos (G × E) multiatributo.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

As análises biplot que utilizam os modelos de efeitos principais aditivos com inter- ação multiplicativa (AMMI) requerem matrizes de dados completas, mas, frequentemente os ensaios multiambientais apresentam dados faltantes. Nesta tese são propostas novas metodologias de imputação simples e múltipla que podem ser usadas para analisar da- dos desbalanceados em experimentos com interação genótipo por ambiente (G×E). A primeira, é uma nova extensão do método de validação cruzada por autovetor (Bro et al, 2008). A segunda, corresponde a um novo algoritmo não-paramétrico obtido por meio de modificações no método de imputação simples desenvolvido por Yan (2013). Também é incluído um estudo que considera sistemas de imputação recentemente relatados na literatura e os compara com o procedimento clássico recomendado para imputação em ensaios (G×E), ou seja, a combinação do algoritmo de Esperança-Maximização com os modelos AMMI ou EM-AMMI. Por último, são fornecidas generalizações da imputação simples descrita por Arciniegas-Alarcón et al. (2010) que mistura regressão com aproximação de posto inferior de uma matriz. Todas as metodologias têm como base a decomposição por valores singulares (DVS), portanto, são livres de pressuposições distribucionais ou estruturais. Para determinar o desempenho dos novos esquemas de imputação foram realizadas simulações baseadas em conjuntos de dados reais de diferentes espécies, com valores re- tirados aleatoriamente em diferentes porcentagens e a qualidade das imputações avaliada com distintas estatísticas. Concluiu-se que a DVS constitui uma ferramenta útil e flexível na construção de técnicas eficientes que contornem o problema de perda de informação em matrizes experimentais.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

IMPORTANCE Obesity is a risk factor for deep vein thrombosis of the leg and pulmonary embolism. To date, however, whether obesity is associated with adult cerebral venous thrombosis (CVT) has not been assessed. OBJECTIVE To assess whether obesity is a risk factor for CVT. DESIGN, SETTING, AND PARTICIPANTS A case-control study was performed in consecutive adult patients with CVT admitted from July 1, 2006 (Amsterdam), and October 1, 2009 (Berne), through December 31, 2014, to the Academic Medical Center in Amsterdam, the Netherlands, or Inselspital University Hospital in Berne, Switzerland. The control group was composed of individuals from the control population of the Multiple Environmental and Genetic Assessment of Risk Factors for Venous Thrombosis study, which was a large Dutch case-control study performed from March 1, 1999, to September 31, 2004, and in which risk factors for deep vein thrombosis and pulmonary embolism were assessed. Data analysis was performed from January 2 to July 12, 2015. MAIN OUTCOMES AND MEASURES Obesity was determined by body mass index (BMI). A BMI of 30 or greater was considered to indicate obesity, and a BMI of 25 to 29.99 was considered to indicate overweight. A multiple imputation procedure was used for missing data. We adjusted for sex, age, history of cancer, ethnicity, smoking status, and oral contraceptive use. Individuals with normal weight (BMI <25) were the reference category. RESULTS The study included 186 cases and 6134 controls. Cases were younger (median age, 40 vs 48 years), more often female (133 [71.5%] vs 3220 [52.5%]), more often used oral contraceptives (97 [72.9%] vs 758 [23.5%] of women), and more frequently had a history of cancer (17 [9.1%] vs 235 [3.8%]) compared with controls. Obesity (BMI ≥30) was associated with an increased risk of CVT (adjusted odds ratio [OR], 2.63; 95% CI, 1.53-4.54). Stratification by sex revealed a strong association between CVT and obesity in women (adjusted OR, 3.50; 95% CI, 2.00-6.14) but not in men (adjusted OR, 1.16; 95% CI, 0.25-5.30). Further stratification revealed that, in women who used oral contraceptives, overweight and obesity were associated with an increased risk of CVT in a dose-dependent manner (BMI 25.0-29.9: adjusted OR, 11.87; 95% CI, 5.94-23.74; BMI ≥30: adjusted OR, 29.26; 95% CI, 13.47-63.60). No association was found in women who did not use oral contraceptives. CONCLUSIONS AND RELEVANCE Obesity is a strong risk factor for CVT in women who use oral contraceptives.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 families were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two complementary approaches for analyzing correlated time-to-onset family data were considered: the generalized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to changing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression parameters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for estimating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective Comparisons of the changing patterns of inequalities in occupational mortality provide one way to monitor the achievement of equity goals. However, previous comparisons have not corrected for numerator/denominator bias, which is a consequence of the different ways in which occupational details are recorded on death certificates and on census forms. The objective of this study was to measure the impact of this bias on mortality rates and ratios over time. Methods Using data provided by the Australian Bureau of Statistics, we examined the evidence for bias over the period 1981-2002, and used imputation methods to adjust for this bias. We compared unadjusted with imputed rates of mortality for manual/non-manual workers. Findings Unadjusted data indicate increasing inequality in the age-adjusted rates of mortality for manual/non-manual workers during 1981-2002, Imputed data suggest that there have been modest fluctuations in the ratios of mortality for manual/non-manual workers during this time, but with evidence that inequalities have increased only in recent years and are now at historic highs. Conclusion We found that imputation for missing data leads to changes in estimates of inequalities related to social class in mortality for some years but not for others. Occupational class comparisons should be imputed or otherwise adjusted for missing data on census or death certificates.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Heterogeneous and incomplete datasets are common in many real-world visualisation applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which was originally developed for complete continuous data, can be extended to model heterogeneous (i.e. containing both continuous and discrete values) and missing data. This paper describes and assesses the resulting model on both synthetic and real-world heterogeneous data with missing values.