990 resultados para complex survey weights


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

90.00% 90.00%

Publicador:

Resumo:

I introduce the new mgof command to compute distributional tests for discrete (categorical, multinomial) variables. The command supports largesample tests for complex survey designs and exact tests for small samples as well as classic large-sample x2-approximation tests based on Pearson’s X2, the likelihood ratio, or any other statistic from the power-divergence family (Cressie and Read, 1984, Journal of the Royal Statistical Society, Series B (Methodological) 46: 440–464). The complex survey correction is based on the approach by Rao and Scott (1981, Journal of the American Statistical Association 76: 221–230) and parallels the survey design correction used for independence tests in svy: tabulate. mgof computes the exact tests by using Monte Carlo methods or exhaustive enumeration. mgof also provides an exact one-sample Kolmogorov–Smirnov test for discrete data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In recent years, disaster preparedness through assessment of medical and special needs persons (MSNP) has taken a center place in public eye in effect of frequent natural disasters such as hurricanes, storm surge or tsunami due to climate change and increased human activity on our planet. Statistical methods complex survey design and analysis have equally gained significance as a consequence. However, there exist many challenges still, to infer such assessments over the target population for policy level advocacy and implementation. ^ Objective. This study discusses the use of some of the statistical methods for disaster preparedness and medical needs assessment to facilitate local and state governments for its policy level decision making and logistic support to avoid any loss of life and property in future calamities. ^ Methods. In order to obtain precise and unbiased estimates for Medical Special Needs Persons (MSNP) and disaster preparedness for evacuation in Rio Grande Valley (RGV) of Texas, a stratified and cluster-randomized multi-stage sampling design was implemented. US School of Public Health, Brownsville surveyed 3088 households in three counties namely Cameron, Hidalgo, and Willacy. Multiple statistical methods were implemented and estimates were obtained taking into count probability of selection and clustering effects. Statistical methods for data analysis discussed were Multivariate Linear Regression (MLR), Survey Linear Regression (Svy-Reg), Generalized Estimation Equation (GEE) and Multilevel Mixed Models (MLM) all with and without sampling weights. ^ Results. Estimated population for RGV was 1,146,796. There were 51.5% female, 90% Hispanic, 73% married, 56% unemployed and 37% with their personal transport. 40% people attained education up to elementary school, another 42% reaching high school and only 18% went to college. Median household income is less than $15,000/year. MSNP estimated to be 44,196 (3.98%) [95% CI: 39,029; 51,123]. All statistical models are in concordance with MSNP estimates ranging from 44,000 to 48,000. MSNP estimates for statistical methods are: MLR (47,707; 95% CI: 42,462; 52,999), MLR with weights (45,882; 95% CI: 39,792; 51,972), Bootstrap Regression (47,730; 95% CI: 41,629; 53,785), GEE (47,649; 95% CI: 41,629; 53,670), GEE with weights (45,076; 95% CI: 39,029; 51,123), Svy-Reg (44,196; 95% CI: 40,004; 48,390) and MLM (46,513; 95% CI: 39,869; 53,157). ^ Conclusion. RGV is a flood zone, most susceptible to hurricanes and other natural disasters. People in the region are mostly Hispanic, under-educated with least income levels in the U.S. In case of any disaster people in large are incapacitated with only 37% have their personal transport to take care of MSNP. Local and state government’s intervention in terms of planning, preparation and support for evacuation is necessary in any such disaster to avoid loss of precious human life. ^ Key words: Complex Surveys, statistical methods, multilevel models, cluster randomized, sampling weights, raking, survey regression, generalized estimation equations (GEE), random effects, Intracluster correlation coefficient (ICC).^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A new Stata command called -mgof- is introduced. The command is used to compute distributional tests for discrete (categorical, multinomial) variables. Apart from classic large sample $\chi^2$-approximation tests based on Pearson's $X^2$, the likelihood ratio, or any other statistic from the power-divergence family (Cressie and Read 1984), large sample tests for complex survey designs and exact tests for small samples are supported. The complex survey correction is based on the approach by Rao and Scott (1981) and parallels the survey design correction used for independence tests in -svy:tabulate-. The exact tests are computed using Monte Carlo methods or exhaustive enumeration. An exact Kolmogorov-Smirnov test for discrete data is also provided.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Modern sample surveys started to spread after statistician at the U.S. Bureau of the Census in the 1940s had developed a sampling design for the Current Population Survey (CPS). A significant factor was also that digital computers became available for statisticians. In the beginning of 1950s, the theory was documented in textbooks on survey sampling. This thesis is about the development of the statistical inference for sample surveys. For the first time the idea of statistical inference was enunciated by a French scientist, P. S. Laplace. In 1781, he published a plan for a partial investigation in which he determined the sample size needed to reach the desired accuracy in estimation. The plan was based on Laplace s Principle of Inverse Probability and on his derivation of the Central Limit Theorem. They were published in a memoir in 1774 which is one of the origins of statistical inference. Laplace s inference model was based on Bernoulli trials and binominal probabilities. He assumed that populations were changing constantly. It was depicted by assuming a priori distributions for parameters. Laplace s inference model dominated statistical thinking for a century. Sample selection in Laplace s investigations was purposive. In 1894 in the International Statistical Institute meeting, Norwegian Anders Kiaer presented the idea of the Representative Method to draw samples. Its idea was that the sample would be a miniature of the population. It is still prevailing. The virtues of random sampling were known but practical problems of sample selection and data collection hindered its use. Arhtur Bowley realized the potentials of Kiaer s method and in the beginning of the 20th century carried out several surveys in the UK. He also developed the theory of statistical inference for finite populations. It was based on Laplace s inference model. R. A. Fisher contributions in the 1920 s constitute a watershed in the statistical science He revolutionized the theory of statistics. In addition, he introduced a new statistical inference model which is still the prevailing paradigm. The essential idea is to draw repeatedly samples from the same population and the assumption that population parameters are constants. Fisher s theory did not include a priori probabilities. Jerzy Neyman adopted Fisher s inference model and applied it to finite populations with the difference that Neyman s inference model does not include any assumptions of the distributions of the study variables. Applying Fisher s fiducial argument he developed the theory for confidence intervals. Neyman s last contribution to survey sampling presented a theory for double sampling. This gave the central idea for statisticians at the U.S. Census Bureau to develop the complex survey design for the CPS. Important criterion was to have a method in which the costs of data collection were acceptable, and which provided approximately equal interviewer workloads, besides sufficient accuracy in estimation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND: Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

METHODS: The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

RESULTS: After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

CONCLUSION: The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Introduction Research has shown that individuals infer their group-efficacy beliefs from the groups’ abilities to perform in specific tasks. Group abilities also seem to affect team members’ performance motivation adding a psychological advantage to teams already high on task relevant abilities. In a recent study we found the effect of group abilities on individual performance motivation to be partially mediated by the team members’ individual group-efficacy beliefs which is an example of how attributes on a group-level can be affecting individual-level parameters. Objectives The study aimed at testing the possibility to reduce the direct and mediated effects of low group abilities on performance motivation by augmenting the visibility of individual contributions to group performances via the inclusion of a separate ranking on individual performances. Method Forty-seven students (M=22.83 years, SD=2.83, 34% women) of the University of Bern participated in the study. At three collection points (t1-3) subjects were provided information about fictive team members with whom they had to imagine performing a group triathlon. Three values (low, medium, high) of the other team members’ abilities to perform in their parts of the triathlon (swimming and biking) were combined in a 3x3 full factorial design yielding nine groups with different ability profiles. At t1 subjects were asked to rate their confidence that the teams would perform well in the triathlon task, at t2 and t3 subjects were asked how motivated they were to perform at their best in the respective groups. At t3 the presence of an individual performance ranking was mentioned in the cover story. Mixed linear models (SPSS) and structural equation models for complex survey data (Mplus) were specified to estimate the effects of the individual performance rankings on the relationship between group-efficacy beliefs and performance motivation. Results A significant interaction effect for individual group-efficacy beliefs and the triathlon condition on performance motivation was found; the effect of group-efficacy beliefs on performance motivation being smaller with individual performance rankings available. The partial mediation of group attributes on performance motivation by group-efficacy beliefs disappeared with the announcement of individual performance rankings. Conclusion In teams low in task relevant abilities the disadvantageous effect of group-efficacy beliefs on performance motivation might be reduced by providing means of evaluating individual performances apart from a group’s overall performance. While it is believed that a common group goal is a core criterion for a well performing sport group future studies should also aim at the possible benefit of individualized goal setting in groups.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Introduction Research has shown that individuals infer their group-efficacy beliefs from the groups’ abilities to perform in specific tasks. Group abilities also seem to affect team members’ performance motivation adding a psychological advantage to teams already high on task relevant abilities. In a recent study we found the effect of group abilities on individual performance motivation to be partially mediated by the team members’ individual group-efficacy beliefs which is an example of how attributes on a group-level can be affecting individual-level parameters. Objectives The study aimed at testing the possibility to reduce the direct and mediated effects of low group abilities on performance motivation by augmenting the visibility of individual contributions to group performances via the inclusion of a separate ranking on individual performances. Method Forty-seven students (M=22.83 years, SD=2.83, 34% women) of the University of Bern participated in the study. At three collection points (t1-3) subjects were provided information about fictive team members with whom they had to imagine performing a group triathlon. Three values (low, medium, high) of the other team members’ abilities to perform in their parts of the triathlon (swimming and biking) were combined in a 3x3 full factorial design yielding nine groups with different ability profiles. At t1 subjects were asked to rate their confidence that the teams would perform well in the triathlon task, at t2 and t3 subjects were asked how motivated they were to perform at their best in the respective groups. At t3 the presence of an individual performance ranking was mentioned in the cover story. Mixed linear models (SPSS) and structural equation models for complex survey data (Mplus) were specified to estimate the effects of the individual performance rankings on the relationship between group-efficacy beliefs and performance motivation. Results A significant interaction effect for individual group-efficacy beliefs and the triathlon condition on performance motivation was found; the effect of group-efficacy beliefs on performance motivation being smaller with individual performance rankings available. The partial mediation of group attributes on performance motivation by group-efficacy beliefs disappeared with the announcement of individual performance rankings. Conclusion In teams low in task relevant abilities the disadvantageous effect of group-efficacy beliefs on performance motivation might be reduced by providing means of evaluating individual performances apart from a group’s overall performance. While it is believed that a common group goal is a core criterion for a well performing sport group future studies should also aim at the possible benefit of individualized goal setting in groups.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Phthalates are industrial chemicals used primarily as plasticizers though they and are found in a myriad of consumer goods such as children's toys, food packaging, dental sealants, cosmetics, pharmaceuticals, perfumes, and building materials. US biomonitoring data show more than 75% of the population have exposure to mono-n-butyl phthalate (MBP), mono-ethyl phthalate (MEP), mono-(2-ethyl) hexyl phthalate (MEHP), and mono-benzyl phthalate (MBZP). Reproductive toxicity from phthalate exposure in animal models has raised concerns about similar effects on fertility in humans. This dissertation research focuses on phthalate exposures in the US population and investigates the plausibility of an exposure-response relationship between phthalates and endocrine hormones essential for ovulation among US women. The objective of this research is to determine the relationship between levels of gonadotropins, follicle stimulating hormone (FSH) and leutinizing hormone (LH), and urinary phthalate monoester metabolites: MBP, MEP, MEHP, MBZP among National Health and Nutrition Examination Survey (NHANES) 1999-2002 women aged 35 to 60 years. Using biomarker data from a one-third sub-sample of NHANES participants, log transformed serum FSH and serum LH, respectively were regressed on phthalates controlling for age, body mass index, smoking, and creatinine taking into consideration the complex survey design (n=385). Models were stratified by reproductive status: reproductive (n=185), menopause transition (n=49) and post-menopausal (n=125). A decrease in FSH associated with increasing MBzP (beta=-0.094, p<0.05) was observed for all participants but no statistical association between log FSH and MBP, MEP, or MEHP was seen. A decrease in LH (beta=-0.125, p<0.05) was also observed with increasing MBzP for all participants though there was no relationship between levels of LH and MBP, MEP, or MEHP. The observed associations between FSH, LH and MBzP did not persist when stratified by reproductive status. Thus, the present study shows a change in endocrine hormones related to ovulation with increasing urinary MBzP among a representative sample of US women from 1999-2002 though this observed exposure-response relationship does not remain after stratification by reproductive status. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Despite research showing the benefits of glycemic control, it remains suboptimal among adults with diabetes in the United States. Possible reasons include unaddressed risk factors as well as lack of awareness of its immediate and long term consequences. The objectives of this study were to, using cross-sectional data, (1) ascertain the association between suboptimal (Hemoglobin A1c (HbA1c) .7%), borderline (HbA1c 7-8.9%), and poor (HbA1c .9%) glycemic control and potentially new risk factors (e.g. work characteristics), and (2) assess whether aspects of poor health and well-being such as poor health related quality of life (HRQOL), unemployment, and missed-work are associated with glycemic control; and (3) using prospective data, assess the relationship between mortality risk and glycemic control in US adults with type 2 diabetes. Data from the 1988-1994 and 1999-2004 National Health and Nutrition Examination Surveys were used. HbA1c values were used to create dichotomous glycemic control indicators. Binary logistic regression models were used to assess relationships between risk factors, employment status and glycemic control. Multinomial logistic regression analyses were conducted to assess relationships between glycemic control and HRQOL variables. Zero-inflated Poisson regression models were used to assess relationships between missed work days and glycemic control. Cox-proportional hazard models were used to assess effects of glycemic control on mortality risk. Using STATA software, analyses were weighted to account for complex survey design and non-response. Multivariable models adjusted for socio-demographics, body mass index, among other variables. Results revealed that being a farm worker and working over 40 hours/week were risk factors for suboptimal glycemic control. Having greater days of poor mental was associated with suboptimal, borderline, and poor glycemic control. Having greater days of inactivity was associated with poor glycemic control while having greater days of poor physical health was associated with borderline glycemic control. There were no statistically significant relationships between glycemic control, self-reported general health, employment, and missed work. Finally, having an HbA1c value less than 6.5% was protective against mortality. The findings suggest that work-related factors are important in a person’s ability to reach optimal diabetes management levels. Poor glycemic control appears to have significant detrimental effects on HRQOL.^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The primary purpose of this study was to investigate agreement among five equations by which clinicians estimate water requirements (EWR) and to determine how well these equations predict total water intake (TWI). The Institute of Medicine has used TWI as a measure of water requirements. A secondary goal of this study was to develop practical equations to predict TWI. These equations could then be considered accurate predictors of an individual’s water requirement. ^ Regressions were performed to determine agreement between the five equations and between the five equations and TWI using NHANES 1999–2004. The criteria for agreement was (1) strong correlation coefficients between all comparisons and (2) regression line that was not significantly different when compared to the line of equality (x=y) i.e., the 95% CI of the slope and intercept must include one and zero, respectively. Correlations were performed to determine association between fat-free mass (FFM) and TWI. Clinically significant variables were selected to build equations for predicting TWI. All analyses were performed with SAS software and were weighted to account for the complex survey design and for oversampling. ^ Results showed that the five EWR equations were strongly correlated but did not agree with each other. Further, the EWR equations were all weakly associated to TWI and lacked agreement with TWI. The strongest agreement between the NRC equation and TWI explained only 8.1% of the variability of TWI. Fat-free mass was positively correlated to TWI. Two models were created to predict TWI. Both models included the variables, race/ethnicity, kcals, age, and height, but one model also included FFM and gender. The other model included BMI and osmolality. Neither model accounted for more than 28% of the variability of TWI. These results provide evidence that estimates of water requirements would vary depending upon which EWR equation was selected by the clinician. None of the existing EWR equations predicted TWI, nor could a prediction equation be created which explained a satisfactory amount of variance in TWI. A good estimate of water requirements may not be predicted by TWI. Future research should focus on using more valid measures to predict water requirements.^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Despite research showing the benefits of glycemic control, it remains suboptimal among adults with diabetes in the United States. Possible reasons include unaddressed risk factors as well as lack of awareness of its immediate and long term consequences. The objectives of this study were to, using cross-sectional data, 1) ascertain the association between suboptimal (Hemoglobin A1c (HbA1c) ≥7%), borderline (HbA1c 7-8.9%), and poor (HbA1c ≥9%) glycemic control and potentially new risk factors (e.g. work characteristics), and 2) assess whether aspects of poor health and well-being such as poor health related quality of life (HRQOL), unemployment, and missed-work are associated with glycemic control; and 3) using prospective data, assess the relationship between mortality risk and glycemic control in US adults with type 2 diabetes. Data from the 1988-1994 and 1999-2004 National Health and Nutrition Examination Surveys were used. HbA1c values were used to create dichotomous glycemic control indicators. Binary logistic regression models were used to assess relationships between risk factors, employment status and glycemic control. Multinomial logistic regression analyses were conducted to assess relationships between glycemic control and HRQOL variables. Zero-inflated Poisson regression models were used to assess relationships between missed work days and glycemic control. Cox-proportional hazard models were used to assess effects of glycemic control on mortality risk. Using STATA software, analyses were weighted to account for complex survey design and non-response. Multivariable models adjusted for socio-demographics, body mass index, among other variables. Results revealed that being a farm worker and working over 40 hours/week were risk factors for suboptimal glycemic control. Having greater days of poor mental was associated with suboptimal, borderline, and poor glycemic control. Having greater days of inactivity was associated with poor glycemic control while having greater days of poor physical health was associated with borderline glycemic control. There were no statistically significant relationships between glycemic control, self-reported general health, employment, and missed work. Finally, having an HbA1c value less than 6.5% was protective against mortality. The findings suggest that work-related factors are important in a person’s ability to reach optimal diabetes management levels. Poor glycemic control appears to have significant detrimental effects on HRQOL.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

NMFS bottom trawl survey data were used to describe changes in distribution, abundance, and rates of population change occurring in the Gulf of Maine–Georges Bank herring (Clupea harengus) complex during 1963–98. Herring in the region have fully recovered following severe overfishing during the 1960s and 1970s. Three distinct, but seasonally intermingling components from the Gulf of Maine, Nantucket Shoals (Great South Channel area), and Georges Bank appear to compose the herring resource in the region. Distribution ranges contracted as herring biomass declined in the late 1970s and then the range expanded in the 1990s as herring increased. Analysis of research survey data suggest that herring are currently at high levels of abundance and biomass. All three components of the stock complex, including the Georges Bank component, have recovered to pre-1960s abundance. Survey data support the theory that herring recolonized the Georges Bank region in stages from adjacent components during the late 1980s, most likely from herring spawning in the Gulf of Maine.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The purpose of this paper is to present an empirical analysis of complex sample data with regard to the biasing effect of nonindependence of observations on standard error parameter estimates. In a two-factor confirmatory factor analysis model, using real data, we show how the bias in standard errors can be derived when the nonindependence is ignored. We demonstrate that the standard error bias produced by the nonindependence of observations can be considerable and we briefly discuss solutions to overcome the problem.