949 resultados para missing data imputation
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
BACKGROUND: Health-related quality of life (HRQL) assessment is an important measure of the impact of a wide range of disease process on an individual. To date, no HRQL tool has been evaluated in an Iranian population with cardiovascular disorders, specifically myocardial infarction, a major cause of mortality and morbidity. The MacNew Heart Disease Health-related Quality of Life instrument is a disease-specific HRQL questionnaire with satisfactory validity and reliability when applied cross-culturally. METHOD: A Persian version of MacNew was prepared by both forward and backward translation by bilinguals after which a feasibility test was performed. Consecutive patients (n = 51) admitted to a coronary care unit with acute myocardial infarction were recruited for measurement of their HRQL with retest one month after discharge in the follow-up clinic. Principal components analysis, intra-class correlation reliability, internal consistency, and test-retest reliability were assessed. RESULTS: Trivial rates of missing data confirmed the acceptability of the tool. Principal component analysis revealed that the three domains, emotional, social and physical, performed as well as in the original studies. Internal consistency was high and comparable to other studies, ranging from 0.92 for the emotional and physical domains, to 0.94 for the social domain, and to 0.95 for the Global score. Domain means of 5, 5.3 and 4.9 for emotional, physical and social respectively indicate that our Iranian population has similar emotional and physical but worse social HRQL scores. Test-retest analysis showed significant correlation in emotional and physical domains (P < 0.05). CONCLUSION: The Persian version of the MacNew questionnaire is comparable to the English version. It has high internal consistency and reasonable reproducibility, making it an appropriate specific quality of life tool for population-based studies and clinical practice in Iran in patients who have survived an acute myocardial infraction. Further studies are needed to confirm its validity in larger populations with cardiovascular disease
Resumo:
Causal inference with a continuous treatment is a relatively under-explored problem. In this dissertation, we adopt the potential outcomes framework. Potential outcomes are responses that would be seen for a unit under all possible treatments. In an observational study where the treatment is continuous, the potential outcomes are an uncountably infinite set indexed by treatment dose. We parameterize this unobservable set as a linear combination of a finite number of basis functions whose coefficients vary across units. This leads to new techniques for estimating the population average dose-response function (ADRF). Some techniques require a model for the treatment assignment given covariates, some require a model for predicting the potential outcomes from covariates, and some require both. We develop these techniques using a framework of estimating functions, compare them to existing methods for continuous treatments, and simulate their performance in a population where the ADRF is linear and the models for the treatment and/or outcomes may be misspecified. We also extend the comparisons to a data set of lottery winners in Massachusetts. Next, we describe the methods and functions in the R package causaldrf using data from the National Medical Expenditure Survey (NMES) and Infant Health and Development Program (IHDP) as examples. Additionally, we analyze the National Growth and Health Study (NGHS) data set and deal with the issue of missing data. Lastly, we discuss future research goals and possible extensions.
Resumo:
The objective of this study was to gain an understanding of the effects of population heterogeneity, missing data, and causal relationships on parameter estimates from statistical models when analyzing change in medication use. From a public health perspective, two timely topics were addressed: the use and effects of statins in populations in primary prevention of cardiovascular disease and polypharmacy in older population. Growth mixture models were applied to characterize the accumulation of cardiovascular and diabetes medications among apparently healthy population of statin initiators. The causal effect of statin adherence on the incidence of acute cardiovascular events was estimated using marginal structural models in comparison with discrete-time hazards models. The impact of missing data on the growth estimates of evolution of polypharmacy was examined comparing statistical models under different assumptions for missing data mechanism. The data came from Finnish administrative registers and from the population-based Geriatric Multidisciplinary Strategy for the Good Care of the Elderly study conducted in Kuopio, Finland, during 2004–07. Five distinct patterns of accumulating medications emerged among the population of apparently healthy statin initiators during two years after statin initiation. Proper accounting for time-varying dependencies between adherence to statins and confounders using marginal structural models produced comparable estimation results with those from a discrete-time hazards model. Missing data mechanism was shown to be a key component when estimating the evolution of polypharmacy among older persons. In conclusion, population heterogeneity, missing data and causal relationships are important aspects in longitudinal studies that associate with the study question and should be critically assessed when performing statistical analyses. Analyses should be supplemented with sensitivity analyses towards model assumptions.
Resumo:
Three types of forecasts of the total Australian production of macadamia nuts (t nut-in-shell) have been produced early each year since 2001. The first is a long-term forecast, based on the expected production from the tree census data held by the Australian Macadamia Society, suitably scaled up for missing data and assumed new plantings each year. These long-term forecasts range out to 10 years in the future, and form a basis for industry and market planning. Secondly, a statistical adjustment (termed the climate-adjusted forecast) is made annually for the coming crop. As the name suggests, climatic influences are the dominant factors in this adjustment process, however, other terms such as bienniality of bearing, prices and orchard aging are also incorporated. Thirdly, industry personnel are surveyed early each year, with their estimates integrated into a growers and pest-scouts forecast. Initially conducted on a 'whole-country' basis, these models are now constructed separately for the six main production regions of Australia, with these being combined for national totals. Ensembles or suites of step-forward regression models using biologically-relevant variables have been the major statistical method adopted, however, developing methodologies such as nearest-neighbour techniques, general additive models and random forests are continually being evaluated in parallel. The overall error rates average 14% for the climate forecasts, and 12% for the growers' forecasts. These compare with 7.8% for USDA almond forecasts (based on extensive early-crop sampling) and 6.8% for coconut forecasts in Sri Lanka. However, our somewhatdisappointing results were mainly due to a series of poor crops attributed to human reasons, which have now been factored into the models. Notably, the 2012 and 2013 forecasts averaged 7.8 and 4.9% errors, respectively. Future models should also show continuing improvement, as more data-years become available.
Resumo:
Introduction: The inadequate reporting of cross-sectional studies, as in the case of the prevalence of metabolic syndrome, could cause problems in the synthesis of new evidence and lead to errors in the formulation of public policies. Objective: To evaluate the reporting quality of the articles regarding metabolic syndrome prevalence in Peruvian adults using the STROBE recommendations. Methods: We conducted a thorough literature search with the terms "Metabolic Syndrome", "Sindrome Metabolico" and "Peru" in MEDLINE/PubMed, LILACS, SciELO, LIPECS and BVS-Peru until December 2014. We selected those who were populationbased observational studies with randomized sampling that reported prevalence of metabolic syndrome in adults aged 18 or more of both sexes. Information was analysed through the STROBE score per item and recommendation. Results: Seventeen articles were included in this study. All articles met the recommendations related to the report of the study’s rationale, design, and provision of summary measures. The recommendations with the lowest scores were those related to the sensitivity analysis (8%, n= 1/17), participant flowchart (18%, n= 3/17), missing data analysis (24%, n= 4/17), and number of participants in each study phase (24%, n= 4/17). Conclusion: Cross-sectional studies regarding the prevalence of metabolic syndrome in peruvian adults have an inadequate reporting on the methods and results sections. We identified a clear need to improve the quality of such studies.
Resumo:
Introduction: The inadequate reporting of cross-sectional studies, as in the case of the prevalence of metabolic syndrome, could cause problems in the synthesis of new evidence and lead to errors in the formulation of public policies. Objective: To evaluate the reporting quality of the articles regarding metabolic syndrome prevalence in Peruvian adults using the STROBE recommendations. Methods: We conducted a thorough literature search with the terms "Metabolic Syndrome", "Sindrome Metabolico" and "Peru" in MEDLINE/PubMed, LILACS, SciELO, LIPECS and BVS-Peru until December 2014. We selected those who were population-based observational studies with randomized sampling that reported prevalence of metabolic syndrome in adults aged 18 or more of both sexes. Information was analysed through the STROBE score per item and recommendation. Results: Seventeen articles were included in this study. All articles met the recommendations related to the report of the study’s rationale, design, and provision of summary measures. The recommendations with the lowest scores were those related to the sensitivity analysis (8%, n= 1/17), participant flowchart (18%, n= 3/17), missing data analysis (24%, n= 4/17), and number of participants in each study phase (24%, n= 4/17). Conclusion: Cross-sectional studies regarding the prevalence of metabolic syndrome in peruvian adults have an inadequate reporting on the methods and results sections. We identified a clear need to improve the quality of such studies.
Resumo:
Let (X, Y) be bivariate normal random vectors which represent the responses as a result of Treatment 1 and Treatment 2. The statistical inference about the bivariate normal distribution parameters involving missing data with both treatment samples is considered. Assuming the correlation coefficient ρ of the bivariate population is known, the MLE of population means and variance (ξ, η, and σ2) are obtained. Inferences about these parameters are presented. Procedures of constructing confidence interval for the difference of population means ξ – η and testing hypothesis about ξ – η are established. The performances of the new estimators and testing procedure are compared numerically with the method proposed in Looney and Jones (2003) on the basis of extensive Monte Carlo simulation. Simulation studies indicate that the testing power of the method proposed in this thesis study is higher.
Resumo:
My study investigated internal consistency estimates of psychometric surveys as an operationalization of the state of measurement precision of constructs in industrial and organizational (I/O) psychology. Analyses were conducted of samples used in research articles published in the Journal of Applied Psychology between 1975 and 2010 in five year intervals (K = 934) from 480 articles yielding 1427 coefficients. Articles and their respective samples were coded for test-taker characteristics (e.g., age, gender, and ethnicity), research settings (e.g., lab and field studies), and actual tests (e.g., number of items and scale anchor points). A reliability and inter-item correlations depository was developed for I/O variables and construct groups. Personality measures had significantly lower inter-item correlations than other construct groups. Also, internal consistency estimates and reporting practices were evaluated over time, demonstrating an improvement in measurement precision and missing data.
Resumo:
The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.
Resumo:
Denna studie omfattar en undersökning om hur ett hybridnät har fungerat, i detta fall Ihushi Development Center som ligger I Tanzania. De mätningar som har gjorts, har följt en standard för att kunna universellt användas vid en fortsatt studie eller direkt kunna användas för att jämföras med andra hybridnät med liknande uppsättning och förutsättningar. Under arbetets gång så har en ny modell tagits fram för att smidigt kunna analysera rådata och uträkning av de nödvändiga parametrarna. Detta underlättar även kommande arbeten kring detta hybridnät. Det har också blivit en typ av simulering då det har funnits många olika typer av utspridda och kontinuerliga fel som har behövts hanteras. Dessa värden har då behövts uppskattats utifrån olika källor och metoder, för att sedan användas. Det har sedan räknats ut effektiviteter och prestanda på olika delar i systemet som sedan kommenteras och kan direkt användas för en utvärdering i en framtida studie. Dessa resultat har bitvis jämförts med tidigare utvärdering men då det saknats information från föregående rapport så har en fullständig jämförelse och slutsats inte varit möjlig.
Resumo:
Solar radiation data is crucial for the design of energy systems based on the solar resource. Since diffuse radiation measurements are not always available in the archive data series, either due to the inexistence of measuring equipment, shading device misplacement or missing data, models to generate these data are needed. In this work, one year of hourly and daily horizontal solar global and diffuse irradiation measurements in Évora are used to establish a new relation between the diffuse radiation and the clearness index. The proposed model includes a fitting parameter, which was adjusted through a simple optimization procedure to minimize the Least Square Error as compared to measurements. A comparison against several other fitting models presented in the literature was also carried out using the Root Mean Square Error as statistical indicator, and it was found that the present model is more accurate than the previous fitting models for the diffuse radiation data in Évora.
Resumo:
New powertrain design is highly influenced by CO2 and pollutant limits defined by legislations, the demand of fuel economy in for real conditions, high performances and acceptable cost. To reach the requirements coming from both end-users and legislations, several powertrain architectures and engine technologies are possible (e.g. SI or CI engines), with many new technologies, new fuels, and different degree of electrification. The benefits and costs given by the possible architectures and technology mix must be accurately evaluated by means of objective procedures and tools in order to choose among the best alternatives. This work presents a basic design methodology and a comparison at concept level of the main powertrain architectures and technologies that are currently being developed, considering technical benefits and their cost effectiveness. The analysis is carried out on the basis of studies from the technical literature, integrating missing data with evaluations performed by means of powertrain-vehicle simplified models, considering the most important powertrain architectures. Technology pathways for passenger cars up to 2025 and beyond have been defined. After that, with support of more detailed models and experimentations, the investigation has been focused on the more promising technologies to improve internal combustion engine, such as: water injection, low temperature combustions and heat recovery systems.
Resumo:
In this thesis, the viability of the Dynamic Mode Decomposition (DMD) as a technique to analyze and model complex dynamic real-world systems is presented. This method derives, directly from data, computationally efficient reduced-order models (ROMs) which can replace too onerous or unavailable high-fidelity physics-based models. Optimizations and extensions to the standard implementation of the methodology are proposed, investigating diverse case studies related to the decoding of complex flow phenomena. The flexibility of this data-driven technique allows its application to high-fidelity fluid dynamics simulations, as well as time series of real systems observations. The resulting ROMs are tested against two tasks: (i) reduction of the storage requirements of high-fidelity simulations or observations; (ii) interpolation and extrapolation of missing data. The capabilities of DMD can also be exploited to alleviate the cost of onerous studies that require many simulations, such as uncertainty quantification analysis, especially when dealing with complex high-dimensional systems. In this context, a novel approach to address parameter variability issues when modeling systems with space and time-variant response is proposed. Specifically, DMD is merged with another model-reduction technique, namely the Polynomial Chaos Expansion, for uncertainty quantification purposes. Useful guidelines for DMD deployment result from the study, together with the demonstration of its potential to ease diagnosis and scenario analysis when complex flow processes are involved.