50 resultados para Research data
Resumo:
There are two main types of data sources of income distributions in China: household survey data and grouped data. Household survey data are typically available for isolated years and individual provinces. In comparison, aggregate or grouped data are typically available more frequently and usually have national coverage. In principle, grouped data allow investigation of the change of inequality over longer, continuous periods of time, and the identification of patterns of inequality across broader regions. Nevertheless, a major limitation of grouped data is that only mean (average) income and income shares of quintile or decile groups of the population are reported. Directly using grouped data reported in this format is equivalent to assuming that all individuals in a quintile or decile group have the same income. This potentially distorts the estimate of inequality within each region. The aim of this paper is to apply an improved econometric method designed to use grouped data to study income inequality in China. A generalized beta distribution is employed to model income inequality in China at various levels and periods of time. The generalized beta distribution is more general and flexible than the lognormal distribution that has been used in past research, and also relaxes the assumption of a uniform distribution of income within quintile and decile groups of populations. The paper studies the nature and extent of inequality in rural and urban China over the period 1978 to 2002. Income inequality in the whole of China is then modeled using a mixture of province-specific distributions. The estimated results are used to study the trends in national inequality, and to discuss the empirical findings in the light of economic reforms, regional policies, and globalization of the Chinese economy.
Resumo:
This article investigates the researcher's work in the coproduction (or not) of complaint sequences in research interviews. Using a conversation analytic approach, we show how the interviewer's management of complaint sequences in a research setting is consequential for subsequent talk and thus directly affects the data generated. In the examples shown here, researchers sharing cocategorial incumbency with respondents may well provide spaces for research participants to formulate complaints. This article examines sequences of talk surrounding complaints to show how researchers generate complaints (or not) and handle unsafe complaints. Researchers are able to provoke specific types of accounts from respondents, whereas their respondents may actively resist the researchers' direction. For researchers using the interview as a method of data generation, examination of complaint sequences and how these appear in interview data provides insight into how interview talk is coproduced and managed within a socially situated setting.
Resumo:
Objective: To assess consent to record linkage, describe the characteristics of consenters and compare self-report versus Medicare records of general practitioner use. Method. Almost 40,000 women in the Australian Longitudinal Study on Women's Health were sent a request by mail for permission to link their Medicare records and survey data. Results: 19,700 women consented: 37% of young (18-23 years), 59% of mid-age (4550 years) and 53% of older women (70-75 years). Consenters tended to have higher levels of education and, among the older cohort, were in better health than nonconsenters. Women tended to under-report the number of visits to general practitioners. Conclusions: Record linkage of survey and Medicare data on a large scale is feasible. The linked data provide information on health and socio-economic status which are valuable for understanding health service utilisation. Implications: Linked records provide a powerful tool for health care research, particularly in longitudinal studies.
Resumo:
When the data consist of certain attributes measured on the same set of items in different situations, they would be described as a three-mode three-way array. A mixture likelihood approach can be implemented to cluster the items (i.e., one of the modes) on the basis of both of the other modes simultaneously (i.e,, the attributes measured in different situations). In this paper, it is shown that this approach can be extended to handle three-mode three-way arrays where some of the data values are missing at random in the sense of Little and Rubin (1987). The methodology is illustrated by clustering the genotypes in a three-way soybean data set where various attributes were measured on genotypes grown in several environments.
Resumo:
Objective: To describe and analyse the study design and manuscript deficiencies in original research articles submitted to Emergency Medicine. Methods: This was a retrospective, analytical study. Articles were enrolled if the reports of the Section Editor and two reviewers were available. Data were extracted from these reports only. Outcome measures were the mean number and nature of the deficiencies and the mean reviewers’ assessment score. Results: Fifty-seven articles were evaluated (28 accepted for publication, 19 rejected, 10 pending revision). The mean (± SD) number of deficiencies was 18.1 ± 6.9, 16.4 ± 6.5 and 18.4 ± 6.7 for all articles, articles accepted for publication and articles rejected, respectively (P = 0.31 between accepted and rejected articles). The mean assessment scores (0–10) were 5.5 ± 1.5, 5.9 ± 1.5 and 4.7 ± 1.4 for all articles, articles accepted for publication and articles rejected, respectively. Accepted articles had a significantly higher assessment score than rejected articles (P = 0.006). For each group, there was a negative correlation between the number of deficiencies and the mean assessment score (P > 0.05). Significantly more rejected articles ‘… did not further our knowledge’ (P = 0.0014) and ‘… did not describe background information adequately’ (P = 0.049). Many rejected articles had ‘… findings that were not clinically or socially significant’ (P = 0.07). Common deficiencies among all articles included ambiguity of the methods (77%) and results (68%), conclusions not warranted by the data (72%), poor referencing (56%), inadequate study design description (51%), unclear tables (49%), an overly long discussion (49%), limitations of the study not described (51%), inadequate definition of terms (49%) and subject selection bias (40%). Conclusions: Researchers should undertake studies that are likely to further our knowledge and be clinically or socially significant. Deficiencies in manuscript preparation are more frequent than mistakes in study design and execution. Specific training or assistance in manuscript preparation is indicated.
Resumo:
Qualitative data analysis (QDA) is often a time-consuming and laborious process usually involving the management of large quantities of textual data. Recently developed computer programs offer great advances in the efficiency of the processes of QDA. In this paper we report on an innovative use of a combination of extant computer software technologies to further enhance and simplify QDA. Used in appropriate circumstances, we believe that this innovation greatly enhances the speed with which theoretical and descriptive ideas can be abstracted from rich, complex, and chaotic qualitative data. © 2001 Human Sciences Press, Inc.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
Within the information systems field, the task of conceptual modeling involves building a representation of selected phenomena in some domain. High-quality conceptual-modeling work is important because it facilitates early detection and correction of system development errors. It also plays an increasingly important role in activities like business process reengineering and documentation of best-practice data and process models in enterprise resource planning systems. Yet little research has been undertaken on many aspects of conceptual modeling. In this paper, we propose a framework to motivate research that addresses the following fundamental question: How can we model the world to better facilitate our developing, implementing, using, and maintaining more valuable information systems? The framework comprises four elements: conceptual-modeling grammars, conceptual-modeling methods, conceptual-modeling scripts, and conceptual-modeling contexts. We provide examples of the types of research that have already been undertaken on each element and illustrate research opportunities that exist.
Resumo:
Genetic research on risk of alcohol, tobacco or drug dependence must make allowance for the partial overlap of risk-factors for initiation of use, and risk-factors for dependence or other outcomes in users. Except in the extreme cases where genetic and environmental risk-factors for initiation and dependence overlap completely or are uncorrelated, there is no consensus about how best to estimate the magnitude of genetic or environmental correlations between Initiation and Dependence in twin and family data. We explore by computer simulation the biases to estimates of genetic and environmental parameters caused by model misspecification when Initiation can only be defined as a binary variable. For plausible simulated parameter values, the two-stage genetic models that we consider yield estimates of genetic and environmental variances for Dependence that, although biased, are not very discrepant from the true values. However, estimates of genetic (or environmental) correlations between Initiation and Dependence may be seriously biased, and may differ markedly under different two-stage models. Such estimates may have little credibility unless external data favor selection of one particular model. These problems can be avoided if Initiation can be assessed as a multiple-category variable (e.g. never versus early-onset versus later onset user), with at least two categories measurable in users at risk for dependence. Under these conditions, under certain distributional assumptions., recovery of simulated genetic and environmental correlations becomes possible, Illustrative application of the model to Australian twin data on smoking confirmed substantial heritability of smoking persistence (42%) with minimal overlap with genetic influences on initiation.
Resumo:
Back ground. Based on the well-described excess of schizophrenia births in winter and spring, we hypothesised that individuals with schizophrenia (a) would be more likely to be born during periods of decreased perinatal sunshine, and (b) those born during periods of less sunshine would have an earlier age of first registration. Methods. We undertook an ecological analysis of long-term trends in perinatal sunshine duration and schizophrenia birth rates based on two mental health registers (Queensland. Australia n = 6630; The Netherlands n = 24, 474). For each of the 480 months between 1931 and 1970, the agreement between slopes of the trends in psychosis and long-term sunshine duration series were assessed. Age at first registration was assessed by quartiles of long-term trends in perinatal sunshine duration, Males and females were assessed separately. Results. Both the Dutch and Australian data showed a statistically significant association between falling long-term trends in sunshine duration around the time of birth and rising schizophrenia birth rates for males only. In both the Dutch and Australian data there were significant associations between earlier age of first registration and reduced long-term trends in sunshine duration around the time of birth for both males and females, Conclusions. A measure of long-term trends in perinatal sunshine duration was associated with two epidemiological features of schizophrenia in two separate data sets. Exposures related to sunshine duration warrant further consideration in schizophrenia research. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
We compare Bayesian methodology utilizing free-ware BUGS (Bayesian Inference Using Gibbs Sampling) with the traditional structural equation modelling approach based on another free-ware package, Mx. Dichotomous and ordinal (three category) twin data were simulated according to different additive genetic and common environment models for phenotypic variation. Practical issues are discussed in using Gibbs sampling as implemented by BUGS to fit subject-specific Bayesian generalized linear models, where the components of variation may be estimated directly. The simulation study (based on 2000 twin pairs) indicated that there is a consistent advantage in using the Bayesian method to detect a correct model under certain specifications of additive genetics and common environmental effects. For binary data, both methods had difficulty in detecting the correct model when the additive genetic effect was low (between 10 and 20%) or of moderate range (between 20 and 40%). Furthermore, neither method could adequately detect a correct model that included a modest common environmental effect (20%) even when the additive genetic effect was large (50%). Power was significantly improved with ordinal data for most scenarios, except for the case of low heritability under a true ACE model. We illustrate and compare both methods using data from 1239 twin pairs over the age of 50 years, who were registered with the Australian National Health and Medical Research Council Twin Registry (ATR) and presented symptoms associated with osteoarthritis occurring in joints of the hand.