5 resultados para American Housing Survey
em Duke University
Resumo:
The third wave of the National Congregations Study (NCS-III) was conducted in 2012. The 2012 General Social Survey asked respondents who attend religious services to name their religious congregation, producing a nationally representative cross-section of congregations from across the religious spectrum. Data about these congregations was collected via a 50-minute interview with one key informant from 1,331 congregations. Information was gathered about multiple aspects of congregations’ social composition, structure, activities, and programming. Approximately two-thirds of the NCS-III questionnaire replicates items from 1998 or 2006-07 NCS waves. Each congregation was geocoded, and selected data from the 2010 United States census or American Community Survey have been appended. We describe NCS-III methodology and use the cumulative NCS dataset (containing 4,071 cases) to describe five trends: more ethnic diversity, greater acceptance of gays and lesbians, increasingly informal worship styles, declining size (but not from the perspective of the average attendee), and declining denominational affiliation.
Resumo:
Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.
This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.
The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new
individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the
refreshment sample itself. As we illustrate, nonignorable unit nonresponse
can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse
in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.
The second method incorporates informative prior beliefs about
marginal probabilities into Bayesian latent class models for categorical data.
The basic idea is to append synthetic observations to the original data such that
(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.
We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.
The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.
Resumo:
Abstract
Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.
The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.
The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.
The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.
Resumo:
In the United States, poverty has been historically higher and disproportionately concentrated in the American South. Despite this fact, much of the conventional poverty literature in the United States has focused on urban poverty in cities, particularly in the Northeast and Midwest. Relatively less American poverty research has focused on the enduring economic distress in the South, which Wimberley (2008:899) calls “a neglected regional crisis of historic and contemporary urgency.” Accordingly, this dissertation contributes to the inequality literature by focusing much needed attention on poverty in the South.
Each empirical chapter focuses on a different aspect of poverty in the South. Chapter 2 examines why poverty is higher in the South relative to the Non-South. Chapter 3 focuses on poverty predictors within the South and whether there are differences in the sub-regions of the Deep South and Peripheral South. These two chapters compare the roles of family demography, economic structure, racial/ethnic composition and heterogeneity, and power resources in shaping poverty. Chapter 4 examines whether poverty in the South has been shaped by historical racial regimes.
The Luxembourg Income Study (LIS) United States datasets (2000, 2004, 2007, 2010, and 2013) (derived from the U.S. Census Current Population Survey (CPS) Annual Social and Economic Supplement) provide all the individual-level data for this study. The LIS sample of 745,135 individuals is nested in rich economic, political, and racial state-level data compiled from multiple sources (e.g. U.S. Census Bureau, U.S. Department of Agriculture, University of Kentucky Center for Poverty Research, etc.). Analyses involve a combination of techniques including linear probability regression models to predict poverty and binary decomposition of poverty differences.
Chapter 2 results suggest that power resources, followed by economic structure, are most important in explaining the higher poverty in the South. This underscores the salience of political and economic contexts in shaping poverty across place. Chapter 3 results indicate that individual-level economic factors are the largest predictors of poverty within the South, and even more so in the Deep South. Moreover, divergent results between the South, Deep South, and Peripheral South illustrate how the impact of poverty predictors can vary in different contexts. Chapter 4 results show significant bivariate associations between historical race regimes and poverty among Southern states, although regression models fail to yield significant effects. Conversely, historical race regimes do have a small, but significant effect in explaining the Black-White poverty gap. Results also suggest that employment and education are key to understanding poverty among Blacks and the Black-White poverty gap. Collectively, these chapters underscore why place is so important for understanding poverty and inequality. They also illustrate the salience of micro and macro characteristics of place for helping create, maintain, and reproduce systems of inequality across place.
Resumo:
What role do state party organizations play in twenty-first century American politics? What is the nature of the relationship between the state and national party organizations in contemporary elections? These questions frame the three studies presented in this dissertation. More specifically, I examine the organizational development of the state party organizations and the strategic interactions and connections between the state and national party organizations in contemporary elections.
In the first empirical chapter, I argue that the Internet Age represents a significant transitional period for state party organizations. Using data collected from surveys of state party leaders, this chapter reevaluates and updates existing theories of party organizational strength and demonstrates the importance of new indicators of party technological capacity to our understanding of party organizational development in the early twenty-first century. In the second chapter, I ask whether the national parties utilize different strategies in deciding how to allocate resources to state parties through fund transfers and through the 50-state-strategy party-building programs that both the Democratic and Republican National Committees advertised during the 2010 elections. Analyzing data collected from my 2011 state party survey and party-fund-transfer data collected from the Federal Election Commission, I find that the national parties considered a combination of state and national electoral concerns in directing assistance to the state parties through their 50-state strategies, as opposed to the strict battleground-state strategy that explains party fund transfers. In my last chapter, I examine the relationships between platforms issued by Democratic and Republican state and national parties and the strategic considerations that explain why state platforms vary in their degree of similarity to the national platform. I analyze an extensive platform dataset, using cluster analysis and document similarity measures to compare platform content across the 1952 to 2014 period. The analysis shows that, as a group, Democratic and Republican state platforms exhibit greater intra-party homogeneity and inter-party heterogeneity starting in the early 1990s, and state-national platform similarity is higher in states that are key players in presidential elections, among other factors. Together, these three studies demonstrate the significance of the state party organizations and the state-national party partnership in contemporary politics.