980 resultados para Missing values structures
Resumo:
Objectives: To estimate differences in self-rated health by mode of administration and to assess the value of multiple imputation to make self-rated health comparable for telephone and mail. Methods: In 1996, Survey 1 of the Australian Longitudinal Study on Women's Health was answered by mail. In 1998, 706 and 11,595 mid-age women answered Survey 2 by telephone and mail respectively. Self-rated health was measured by the physical and mental health scores of the SF-36. Mean change in SF-36 scores between Surveys 1 and 2 were compared for telephone and mail respondents to Survey 2, before and after adjustment for socio-demographic and health characteristics. Missing values and SF-36 scores for telephone respondents at Survey 2 were imputed from SF-36 mail responses and telephone and mail responses to socio-demographic and health questions. Results: At Survey 2, self-rated health improved for telephone respondents but not mail respondents. After adjustment, mean changes in physical health and mental health scores remained higher (0.4 and 1.6 respectively) for telephone respondents compared with mail respondents (-1.2 and 0.1 respectively). Multiple imputation yielded adjusted changes in SF-36 scores that were similar for telephone and mail respondents. Conclusions and Implications: The effect of mode of administration on the change in mental health is important given that a difference of two points in SF-36 scores is accepted as clinically meaningful. Health evaluators should be aware of and adjust for the effects of mode of administration on self-rated health. Multiple imputation is one method that may be used to adjust SF-36 scores for mode of administration bias.
Resumo:
We analyse how the Generative Topographic Mapping (GTM) can be modified to cope with missing values in the training data. Our approach is based on an Expectation -Maximisation (EM) method which estimates the parameters of the mixture components and at the same time deals with the missing values. We incorporate this algorithm into a hierarchical GTM. We verify the method on a toy data set (using a single GTM) and a realistic data set (using a hierarchical GTM). The results show our algorithm can help to construct informative visualisation plots, even when some of the training points are corrupted with missing values.
Resumo:
Heterogeneous and incomplete datasets are common in many real-world visualisation applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which was originally developed for complete continuous data, can be extended to model heterogeneous (i.e. containing both continuous and discrete values) and missing data. This paper describes and assesses the resulting model on both synthetic and real-world heterogeneous data with missing values.
Resumo:
Background: The aim of this study was to describe bilateral visual outcomes and the effect of incomplete follow-up after 3 years of ranibizumab therapy for neovascular age-related macular degeneration. Secondarily, the demands on service provision over a 3-year period were described. Methods: Data on visual acuity, hospital visits, and injections were collected over 36 months on consecutive patients commencing treatment over a 9-month period. Visual outcome was determined for 1) all patients, using last observation carried forward for missed visits due to early discontinuation and 2) only those patients completing full 36-month follow-up. Results: Over 3 years, 120 patients cumulatively attended hospital for 1,823 noninjection visits and 1,365 injection visits. A visual acuity loss of <15 letters (L) was experienced by 78.2% of patients. For all patients (n=120), there was a mean loss of 1.68 L using last observation carried forward for missing values. Excluding five patients who died and 30 who discontinued follow-up, mean gain was 1.47 L. In bilateral cases, final acuity was on average 9 L better in second eyes compared to first eyes. Also, 91% of better-seeing eyes continued to be the better-seeing eye. Conclusion: We have demonstrated our approach to describing the long-term service provision and visual outcomes of ranibizumab therapy for neovascular age-related macular degeneration in a consecutive cohort of patients. Although there was a heavy burden with very frequent injections and clinic visits, patients can expect a good level of visual stability and a very high chance of maintaining their better-seeing eye for up to 3 years. © 2014 Chavan et al. This work is published by Dove Medical Press Limited.
Resumo:
The paper reviews some additive and multiplicative properties of ranking procedures used for generalized tournaments with missing values and multiple comparisons. The methods analysed are the score, generalised row sum and least squares as well as fair bets and its variants. It is argued that generalised row sum should be applied not with a fixed parameter, but a variable one proportional to the number of known comparisons. It is shown that a natural additive property has strong links to independence of irrelevant matches, an axiom judged unfavourable when players have different opponents.
Resumo:
The paper reviews some axioms of additivity concerning ranking methods used for generalized tournaments with possible missing values and multiple comparisons. It is shown that one of the most natural properties, called consistency, has strong links to independence of irrelevant comparisons, an axiom judged unfavourable when players have different opponents. Therefore some directions of weakening consistency are suggested, and several ranking methods, the score, generalized row sum and least squares as well as fair bets and its two variants (one of them entirely new) are analysed whether they satisfy the properties discussed. It turns out that least squares and generalized row sum with an appropriate parameter choice preserve the relative ranking of two objects if the ranking problems added have the same comparison structure.
Resumo:
This research analyses the components of the organizational structure of the UFRN (Rio Grande do Norte Federal University) and to what extent they affect organizational performance. The study, classified as exploratory and descriptive, was conducted in two phases. The first phase consists of a pilot test to refine the research instrument and to identify the latent components of the organizational structure, and the second to characterize these components and thereby establish relationships with organizational performance. In the first phase, the research was conducted in 20 UFRN organizational units with the participation of 84 employees between technical-administrative and teachers, after considering missing values and outliers, while the second phase occurred in two stages: one conducted with 279 valid cases, consisting of technical-administrative and teachers of 37 UFRN units, and another with 112 managers of the institution in the 49 units identified in this research. The instrument adopted in the first phase was composed of 36 indicators of organizational structure, with six extracted and adapted from the instrument developed by Medeiros (2003) and 30 prepared based on the literature review, from Mintzberg (2012), Hall (1984), Vasconcellos and Hemsley (1997) and Seiffert and Costa (2007) and 7 performance indicators adapted from Fleury and Mills (2006), Vieira and Vieira (2003) and Kaplan and Norton (1997) from the self-assessment instrument in use by the university. In this stage the data were analyzed using the techniques of factor analysis and reliability analysis by means of Cronbach’s alpha, aiming to extract the factors representing the components of the organizational structure. In step 1 of the second phase, the instrument, refined and reduced in the previous phase, with 24 variables of organizational structure and 6 for performance was used, while in step 2, a semi-structured interview guide with questions, organized into nine organizational structure elements, was adopted aiming to gather information to understand the relationship of structure to performance of the UFRN. The techniques used in the second phase, as a whole, were factor analysis and reliability analysis to characterize the components extracted in the previous phase and to validate the performance variables and correlation analysis, regression and content analysis to establish and understand the relationship between structure and performance. The results showed, in the two stages, six latent components of organizational structure in the context under study: training and internalization, communication, hierarchy, decentralization, formalization and departmentalization - with high levels of Cronbach's alpha indexes - which can thereby be characterized as components of UFRN structure. Six performance indicators were validated in this study, showing them as efficient and highly reliable. Finally, it was found that the formalization, communication, decentralization, training and internalization components positively affect UFRN performance, while departmentalization has an adverse affect and hierarchy did not show a significant relationship. The results achieved in this work are important in future studies to support the development of a model structure that represents the specifics of the university
Resumo:
Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.
This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.
The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new
individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the
refreshment sample itself. As we illustrate, nonignorable unit nonresponse
can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse
in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.
The second method incorporates informative prior beliefs about
marginal probabilities into Bayesian latent class models for categorical data.
The basic idea is to append synthetic observations to the original data such that
(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.
We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.
The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
Resumo:
O presente estudo procura testar as propriedades psicométricas de um questionário que avalia (a) perceção do aluno sobre o feedback do professor; a identificação escolar do aluno; as trajetórias escolares (factos e expectativas) e; a perceção do aluno sobre o seu envolvimento escolar. O questionário foi aplicado a 1089 alunos dos 6º, 7º, 9º e 10º anos de escolaridade (M=13.4, DP=1.7), sendo que 52% são do sexo feminino. A amostra é composta por alunos essencialmente de nacionalidade portuguesa (95.9%). A partir dos resultados da análise factorial e seguindo o racional teórico, chegou-se a uma estrutura composta por oito dimensões principais. O QFITE apresenta bons índices de consistência interna, com sete das oito principais dimensões a obterem valores entre .77 e .89. Assim, as análises psicométricas realizadas revelam valores satisfatórios, concluindo-se que o QFITE é um instrumento útil e adequado para avaliar a identificação escolar dos alunos, o envolvimento comportamental escolar, e as perceções dos alunos sobre o feedback do professor.
Resumo:
Snapper (Pagrus auratus) is widely distributed throughout subtropical and temperate southern oceans and forms a significant recreational and commercial fishery in Queensland, Australia. Using data from government reports, media sources, popular publications and a government fisheries survey carried out in 1910, we compiled information on individual snapper fishing trips that took place prior to the commencement of fisherywide organized data collection, from 1871 to 1939. In addition to extracting all available quantitative data, we translated qualitative information into bounded estimates and used multiple imputation to handle missing values, forming 287 records for which catch rate (snapper fisher−1 h−1) could be derived. Uncertainty was handled through a parametric maximum likelihood framework (a transformed trivariate Gaussian), which facilitated statistical comparisons between data sources. No statistically significant differences in catch rates were found among media sources and the government fisheries survey. Catch rates remained stable throughout the time series, averaging 3.75 snapper fisher−1 h−1 (95% confidence interval, 3.42–4.09) as the fishery expanded into new grounds. In comparison, a contemporary (1993–2002) south-east Queensland charter fishery produced an average catch rate of 0.4 snapper fisher−1 h−1 (95% confidence interval, 0.31–0.58). These data illustrate the productivity of a fishery during its earliest years of development and represent the earliest catch rate data globally for this species. By adopting a formalized approach to address issues common to many historical records – missing data, a lack of quantitative information and reporting bias – our analysis demonstrates the potential for historical narratives to contribute to contemporary fisheries management.
Resumo:
Objectives: To investigate the association between effort-reward imbalance (ERI) at work and sedentary lifestyle. Methods: Cross-sectional data from the ongoing Finnish Public Sector Study related to 30 433 women and 7718 men aged 17-64 were used (n = 35 918 after exclusion of participants with missing values in covariates). From the responses to a questionnaire, an aggregated mean score for ERI in a work unit was assigned to each participant. The outcome was sedentary lifestyle defined as <2.00 metabolic equivalent task (MET) hours/day. Logistic regression with generalized estimating equations was used as an analysis method to include both individual and work unit level predictors in the models. Adjustments were made for age, marital status, occupational status, job contract, smoking, and heavy drinking. Results: Twenty five percent of women and 27% of men had a sedentary lifestyle. High individual level ERI was associated with a higher likelihood of sedentary lifestyle both among women (odds ratio (OR) = 1.08, 95% CI 1.01 to 1.16) and men (OR = 1.17, 95% CI 1.02 to 1.33). These associations were not explained by relevant confounders and they were also independent of work unit level job strain measured as a ratio of job demands and control. Conclusions: A mismatch between high occupational effort spent and low reward received in turn seems to be associated with an elevated risk of sedentary lifestyle, although this association is relatively weak.
Resumo:
National Highway Traffic Safety Administration, Washington, D.C.