910 resultados para Missing Data


Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Health-related quality of life (HRQL) assessment is an important measure of the impact of a wide range of disease process on an individual. To date, no HRQL tool has been evaluated in an Iranian population with cardiovascular disorders, specifically myocardial infarction, a major cause of mortality and morbidity. The MacNew Heart Disease Health-related Quality of Life instrument is a disease-specific HRQL questionnaire with satisfactory validity and reliability when applied cross-culturally. METHOD: A Persian version of MacNew was prepared by both forward and backward translation by bilinguals after which a feasibility test was performed. Consecutive patients (n = 51) admitted to a coronary care unit with acute myocardial infarction were recruited for measurement of their HRQL with retest one month after discharge in the follow-up clinic. Principal components analysis, intra-class correlation reliability, internal consistency, and test-retest reliability were assessed. RESULTS: Trivial rates of missing data confirmed the acceptability of the tool. Principal component analysis revealed that the three domains, emotional, social and physical, performed as well as in the original studies. Internal consistency was high and comparable to other studies, ranging from 0.92 for the emotional and physical domains, to 0.94 for the social domain, and to 0.95 for the Global score. Domain means of 5, 5.3 and 4.9 for emotional, physical and social respectively indicate that our Iranian population has similar emotional and physical but worse social HRQL scores. Test-retest analysis showed significant correlation in emotional and physical domains (P < 0.05). CONCLUSION: The Persian version of the MacNew questionnaire is comparable to the English version. It has high internal consistency and reasonable reproducibility, making it an appropriate specific quality of life tool for population-based studies and clinical practice in Iran in patients who have survived an acute myocardial infraction. Further studies are needed to confirm its validity in larger populations with cardiovascular disease

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Causal inference with a continuous treatment is a relatively under-explored problem. In this dissertation, we adopt the potential outcomes framework. Potential outcomes are responses that would be seen for a unit under all possible treatments. In an observational study where the treatment is continuous, the potential outcomes are an uncountably infinite set indexed by treatment dose. We parameterize this unobservable set as a linear combination of a finite number of basis functions whose coefficients vary across units. This leads to new techniques for estimating the population average dose-response function (ADRF). Some techniques require a model for the treatment assignment given covariates, some require a model for predicting the potential outcomes from covariates, and some require both. We develop these techniques using a framework of estimating functions, compare them to existing methods for continuous treatments, and simulate their performance in a population where the ADRF is linear and the models for the treatment and/or outcomes may be misspecified. We also extend the comparisons to a data set of lottery winners in Massachusetts. Next, we describe the methods and functions in the R package causaldrf using data from the National Medical Expenditure Survey (NMES) and Infant Health and Development Program (IHDP) as examples. Additionally, we analyze the National Growth and Health Study (NGHS) data set and deal with the issue of missing data. Lastly, we discuss future research goals and possible extensions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of this study was to gain an understanding of the effects of population heterogeneity, missing data, and causal relationships on parameter estimates from statistical models when analyzing change in medication use. From a public health perspective, two timely topics were addressed: the use and effects of statins in populations in primary prevention of cardiovascular disease and polypharmacy in older population. Growth mixture models were applied to characterize the accumulation of cardiovascular and diabetes medications among apparently healthy population of statin initiators. The causal effect of statin adherence on the incidence of acute cardiovascular events was estimated using marginal structural models in comparison with discrete-time hazards models. The impact of missing data on the growth estimates of evolution of polypharmacy was examined comparing statistical models under different assumptions for missing data mechanism. The data came from Finnish administrative registers and from the population-based Geriatric Multidisciplinary Strategy for the Good Care of the Elderly study conducted in Kuopio, Finland, during 2004–07. Five distinct patterns of accumulating medications emerged among the population of apparently healthy statin initiators during two years after statin initiation. Proper accounting for time-varying dependencies between adherence to statins and confounders using marginal structural models produced comparable estimation results with those from a discrete-time hazards model. Missing data mechanism was shown to be a key component when estimating the evolution of polypharmacy among older persons. In conclusion, population heterogeneity, missing data and causal relationships are important aspects in longitudinal studies that associate with the study question and should be critically assessed when performing statistical analyses. Analyses should be supplemented with sensitivity analyses towards model assumptions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Snapper (Pagrus auratus) is widely distributed throughout subtropical and temperate southern oceans and forms a significant recreational and commercial fishery in Queensland, Australia. Using data from government reports, media sources, popular publications and a government fisheries survey carried out in 1910, we compiled information on individual snapper fishing trips that took place prior to the commencement of fisherywide organized data collection, from 1871 to 1939. In addition to extracting all available quantitative data, we translated qualitative information into bounded estimates and used multiple imputation to handle missing values, forming 287 records for which catch rate (snapper fisher−1 h−1) could be derived. Uncertainty was handled through a parametric maximum likelihood framework (a transformed trivariate Gaussian), which facilitated statistical comparisons between data sources. No statistically significant differences in catch rates were found among media sources and the government fisheries survey. Catch rates remained stable throughout the time series, averaging 3.75 snapper fisher−1 h−1 (95% confidence interval, 3.42–4.09) as the fishery expanded into new grounds. In comparison, a contemporary (1993–2002) south-east Queensland charter fishery produced an average catch rate of 0.4 snapper fisher−1 h−1 (95% confidence interval, 0.31–0.58). These data illustrate the productivity of a fishery during its earliest years of development and represent the earliest catch rate data globally for this species. By adopting a formalized approach to address issues common to many historical records – missing data, a lack of quantitative information and reporting bias – our analysis demonstrates the potential for historical narratives to contribute to contemporary fisheries management.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Three types of forecasts of the total Australian production of macadamia nuts (t nut-in-shell) have been produced early each year since 2001. The first is a long-term forecast, based on the expected production from the tree census data held by the Australian Macadamia Society, suitably scaled up for missing data and assumed new plantings each year. These long-term forecasts range out to 10 years in the future, and form a basis for industry and market planning. Secondly, a statistical adjustment (termed the climate-adjusted forecast) is made annually for the coming crop. As the name suggests, climatic influences are the dominant factors in this adjustment process, however, other terms such as bienniality of bearing, prices and orchard aging are also incorporated. Thirdly, industry personnel are surveyed early each year, with their estimates integrated into a growers and pest-scouts forecast. Initially conducted on a 'whole-country' basis, these models are now constructed separately for the six main production regions of Australia, with these being combined for national totals. Ensembles or suites of step-forward regression models using biologically-relevant variables have been the major statistical method adopted, however, developing methodologies such as nearest-neighbour techniques, general additive models and random forests are continually being evaluated in parallel. The overall error rates average 14% for the climate forecasts, and 12% for the growers' forecasts. These compare with 7.8% for USDA almond forecasts (based on extensive early-crop sampling) and 6.8% for coconut forecasts in Sri Lanka. However, our somewhatdisappointing results were mainly due to a series of poor crops attributed to human reasons, which have now been factored into the models. Notably, the 2012 and 2013 forecasts averaged 7.8 and 4.9% errors, respectively. Future models should also show continuing improvement, as more data-years become available.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Introduction: The inadequate reporting of cross-sectional studies, as in the case of the prevalence of metabolic syndrome, could cause problems in the synthesis of new evidence and lead to errors in the formulation of public policies. Objective: To evaluate the reporting quality of the articles regarding metabolic syndrome prevalence in Peruvian adults using the STROBE recommendations. Methods: We conducted a thorough literature search with the terms "Metabolic Syndrome", "Sindrome Metabolico" and "Peru" in MEDLINE/PubMed, LILACS, SciELO, LIPECS and BVS-Peru until December 2014. We selected those who were populationbased observational studies with randomized sampling that reported prevalence of metabolic syndrome in adults aged 18 or more of both sexes. Information was analysed through the STROBE score per item and recommendation. Results: Seventeen articles were included in this study. All articles met the recommendations related to the report of the study’s rationale, design, and provision of summary measures. The recommendations with the lowest scores were those related to the sensitivity analysis (8%, n= 1/17), participant flowchart (18%, n= 3/17), missing data analysis (24%, n= 4/17), and number of participants in each study phase (24%, n= 4/17). Conclusion: Cross-sectional studies regarding the prevalence of metabolic syndrome in peruvian adults have an inadequate reporting on the methods and results sections. We identified a clear need to improve the quality of such studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Introduction: The inadequate reporting of cross-sectional studies, as in the case of the prevalence of metabolic syndrome, could cause problems in the synthesis of new evidence and lead to errors in the formulation of public policies. Objective: To evaluate the reporting quality of the articles regarding metabolic syndrome prevalence in Peruvian adults using the STROBE recommendations. Methods: We conducted a thorough literature search with the terms "Metabolic Syndrome", "Sindrome Metabolico" and "Peru" in MEDLINE/PubMed, LILACS, SciELO, LIPECS and BVS-Peru until December 2014. We selected those who were population-based observational studies with randomized sampling that reported prevalence of metabolic syndrome in adults aged 18 or more of both sexes. Information was analysed through the STROBE score per item and recommendation. Results: Seventeen articles were included in this study. All articles met the recommendations related to the report of the study’s rationale, design, and provision of summary measures. The recommendations with the lowest scores were those related to the sensitivity analysis (8%, n= 1/17), participant flowchart (18%, n= 3/17), missing data analysis (24%, n= 4/17), and number of participants in each study phase (24%, n= 4/17). Conclusion: Cross-sectional studies regarding the prevalence of metabolic syndrome in peruvian adults have an inadequate reporting on the methods and results sections. We identified a clear need to improve the quality of such studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Let (X, Y) be bivariate normal random vectors which represent the responses as a result of Treatment 1 and Treatment 2. The statistical inference about the bivariate normal distribution parameters involving missing data with both treatment samples is considered. Assuming the correlation coefficient ρ of the bivariate population is known, the MLE of population means and variance (ξ, η, and σ2) are obtained. Inferences about these parameters are presented. Procedures of constructing confidence interval for the difference of population means ξ – η and testing hypothesis about ξ – η are established. The performances of the new estimators and testing procedure are compared numerically with the method proposed in Looney and Jones (2003) on the basis of extensive Monte Carlo simulation. Simulation studies indicate that the testing power of the method proposed in this thesis study is higher.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

My study investigated internal consistency estimates of psychometric surveys as an operationalization of the state of measurement precision of constructs in industrial and organizational (I/O) psychology. Analyses were conducted of samples used in research articles published in the Journal of Applied Psychology between 1975 and 2010 in five year intervals (K = 934) from 480 articles yielding 1427 coefficients. Articles and their respective samples were coded for test-taker characteristics (e.g., age, gender, and ethnicity), research settings (e.g., lab and field studies), and actual tests (e.g., number of items and scale anchor points). A reliability and inter-item correlations depository was developed for I/O variables and construct groups. Personality measures had significantly lower inter-item correlations than other construct groups. Also, internal consistency estimates and reporting practices were evaluated over time, demonstrating an improvement in measurement precision and missing data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Denna studie omfattar en undersökning om hur ett hybridnät har fungerat, i detta fall Ihushi Development Center som ligger I Tanzania. De mätningar som har gjorts, har följt en standard för att kunna universellt användas vid en fortsatt studie eller direkt kunna användas för att jämföras med andra hybridnät med liknande uppsättning och förutsättningar. Under arbetets gång så har en ny modell tagits fram för att smidigt kunna analysera rådata och uträkning av de nödvändiga parametrarna. Detta underlättar även kommande arbeten kring detta hybridnät. Det har också blivit en typ av simulering då det har funnits många olika typer av utspridda och kontinuerliga fel som har behövts hanteras. Dessa värden har då behövts uppskattats utifrån olika källor och metoder, för att sedan användas. Det har sedan räknats ut effektiviteter och prestanda på olika delar i systemet som sedan kommenteras och kan direkt användas för en utvärdering i en framtida studie. Dessa resultat har bitvis jämförts med tidigare utvärdering men då det saknats information från föregående rapport så har en fullständig jämförelse och slutsats inte varit möjlig.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Solar radiation data is crucial for the design of energy systems based on the solar resource. Since diffuse radiation measurements are not always available in the archive data series, either due to the inexistence of measuring equipment, shading device misplacement or missing data, models to generate these data are needed. In this work, one year of hourly and daily horizontal solar global and diffuse irradiation measurements in Évora are used to establish a new relation between the diffuse radiation and the clearness index. The proposed model includes a fitting parameter, which was adjusted through a simple optimization procedure to minimize the Least Square Error as compared to measurements. A comparison against several other fitting models presented in the literature was also carried out using the Root Mean Square Error as statistical indicator, and it was found that the present model is more accurate than the previous fitting models for the diffuse radiation data in Évora.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One cannot help but be impressed by the inroads that digital oilfield technologies have made into the exploration and production (E&P) industry in the past decade. Today’s production systems can be monitored by “smart” sensors that allow engineers to observe almost any aspect of performance in real time. Our understanding of how reservoirs are behaving has improved considerably since the dawn of this revolution, and the industry has been able to move away from point answers to more holistic “big picture” integrated solutions. Indeed, the industry has already reaped the rewards of many of these kinds of investments. Many billions of dollars of value have been delivered by this heightened awareness of what is going on within our assets and the world around them (Van Den Berg et al. 2010).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Troxel, Lipsitz, and Brennan (1997, Biometrics 53, 857-869) considered parameter estimation from survey data with nonignorable nonresponse and proposed weighted estimating equations to remove the biases in the complete-case analysis that ignores missing observations. This paper suggests two alternative modifications for unbiased estimation of regression parameters when a binary outcome is potentially observed at successive time points. The weighting approach of Robins, Rotnitzky, and Zhao (1995, Journal of the American Statistical Association 90, 106-121) is also modified to obtain unbiased estimating functions. The suggested estimating functions are unbiased only when the missingness probability is correctly specified, and misspecification of the missingness model will result in biases in the estimates. Simulation studies are carried out to assess the performance of different methods when the covariate is binary or normal. For the simulation models used, the relative efficiency of the two new methods to the weighting methods is about 3.0 for the slope parameter and about 2.0 for the intercept parameter when the covariate is continuous and the missingness probability is correctly specified. All methods produce substantial biases in the estimates when the missingness model is misspecified or underspecified. Analysis of data from a medical survey illustrates the use and possible differences of these estimating functions.