28 resultados para exploratory data analysis
Resumo:
Objective: To compare rates of self-reported use of health services between rural, remote and urban South Australians. Methods: Secondary data analysis from a population-based survey to assess health and well-being, conducted in South Australia in 2000. In all, 2,454 adults were randomly selected and interviewed using the computer-assisted telephone interview (CATI) system. We analysed health service use by Accessibility and Remoteness Index of Australia (ARIA) category. Results: There was no statistically significant difference in the median number of uses of the four types of health services studied across ARIA categories. Significantly fewer residents of highly accessible areas reported never using primary care services (14.4% vs. 22.2% in very remote areas), and significantly more reported high use ( greater than or equal to6 visits, 29.3% vs. 21.5%). Fewer residents of remote areas reported never attending hospital (65.6% vs. 73.8% in highly accessible areas). Frequency of use of mental health services was not statistically significantly different across ARIA categories. Very remote residents were more likely to spend at least one night in a public hospital (15.8%) than were residents of other areas (e.g. 5.9% for highly accessible areas). Conclusion: The self-reported frequency of use of a range of health services in South Australia was broadly similar across ARIA categories. However, use of primary care services was higher among residents of highly accessible areas and public hospital use increased with increasing remoteness. There is no evidence for systematic rural disadvantage in terms of self-reported health service utilisation in this State.
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
Developed, piloted, and examined the psychometric properties of the Child and Adolescent Social and Adaptive Functioning Scale (CASAFS), a self-report measure designed to examine the social functioning of young people in the areas of school performance, peer relationships, family relationships, and home duties/self-care. The findings of confirmatory and exploratory factor analysis support a 4-factor solution consistent with the hypothesized domains. Fit indexes suggested that the 4-correlated factor model represented a satisfactory solution for the data, with the covariation between factors being satisfactorily explained by a single, higher order factor reflecting social and adaptive functioning in general. The internal consistency and 12-month test-retest reliability of the total scale was acceptable. A significant, negative correlation was found between the CASAFS and a measure of depressive symptoms, showing that high levels of social functioning are associated with low levels of depression. Significant differences in CASAFS total and subscale scores were found between clinically depressed adolescents and a matched sample of nonclinical controls. Adolescents who reported elevated but subclinical levels of depression also reported lower levels of social functioning in comparison to nonclinical controls.
Resumo:
An increasing number of studies shows that the glycogen-accumulating organisms (GAOs) can survive and may indeed proliferate under the alternating anaerobic/aerobic conditions found in EBPR systems, thus forming a strong competitor of the polyphosphate-accumulating organisms (PAOs). Understanding their behaviors in a mixed PAO and GAO culture under various operational conditions is essential for developing operating strategies that disadvantage the growth of this group of unwanted organisms. A model-based data analysis method is developed in this paper for the study of the anaerobic PAO and GAO activities in a mixed PAO and GAO culture. The method primarily makes use of the hydrogen ion production rate and the carbon dioxide transfer rate resulting from the acetate uptake processes by PAOs and GAOs, measured with a recently developed titration and off-gas analysis (TOGA) sensor. The method is demonstrated using the data from a laboratory-scale sequencing batch reactor (SBR) operated under alternating anaerobic and aerobic conditions. The data analysis using the proposed method strongly indicates a coexistence of PAOs and GAOs in the system, which was independently confirmed by fluorescent in situ hybridization (FISH) measurement. The model-based analysis also allowed the identification of the respective acetate uptake rates by PAOs and GAOs, along with a number of kinetic and stoichiometric parameters involved in the PAO and GAO models. The excellent fit between the model predictions and the experimental data not involved in parameter identification shows that the parameter values found are reliable and accurate. It also demonstrates that the current anaerobic PAO and GAO models are able to accurately characterize the PAO/GAO mixed culture obtained in this study. This is of major importance as no pure culture of either PAOs or GAOs has been reported to date, and hence the current PAO and GAO models were developed for the interpretation of experimental results of mixed cultures. The proposed method is readily applicable for detailed investigations of the competition between PAOs and GAOs in enriched cultures. However, the fermentation of organic substrates carried out by ordinary heterotrophs needs to be accounted for when the method is applied to the study of PAO and GAO competition in full-scale sludges. (C) 2003 Wiley Periodicals, Inc.
Resumo:
This paper develops an Internet geographical information system (GIS) and spatial model application that provides socio-economic information and exploratory spatial data analysis for local government authorities (LGAs) in Queensland, Australia. The application aims to improve the means by which large quantities of data may be analysed, manipulated and displayed in order to highlight trends and patterns as well as provide performance benchmarking that is readily understandable and easily accessible for decision-makers. Measures of attribute similarity and spatial proximity are combined in a clustering model with a spatial autocorrelation index for exploratory spatial data analysis to support the identification of spatial patterns of change. Analysis of socio-economic changes in Queensland is presented. The results demonstrate the usefulness and potential appeal of the Internet GIS applications as a tool to inform the process of regional analysis, planning and policy.
Resumo:
Aims The aims of this study are to develop and validate a measure to screen for a range of gambling-related cognitions (GRC) in gamblers. Design and participants A total of 968 volunteers were recruited from a community-based population. They were divided randomly into two groups. Principal axis factoring with varimax rotation was performed on group one and confirmatory factor analysis (CFA) was used on group two to confirm the best-fitted solution. Measurements The Gambling Related Cognition Scale (GRCS) was developed for this study and the South Oaks Gambling Screen (SOGS), the Motivation Towards Gambling Scale (MTGS) and the Depression Anxiety Stress Scale (DASS-2 1) were used for validation. Findings Exploratory factor analysis performed using half the sample indicated five factors, which included interpretative control/bias (GRCS-IB), illusion of control (GRCS-IC), predictive control (GRCS-PC), gambling-related expectancies (GRCS-GE) and a perceived inability to stop gambling (GRCS-IS). These accounted for 70% of the total variance. Using the other half of the sample, CFA confirmed that the five-factor solution fitted the data most effectively. Cronbach's alpha coefficients for the factors ranged from 0.77 to 0.91, and 0.93 for the overall scale. Conclusions This paper demonstrated that the 23-item GRCS has good psychometric properties and thus is a useful instrument for identifying GRC among non-clinical gamblers. It provides the first step towards devising/adapting similar tools for problem gamblers as well as developing more specialized instruments to assess particular domains of GRC.
Resumo:
Predictive genetic testing for serious, mature-onset genetic illness represents a unique context in health decision making. This article presents findings from an exploratory qualitative Australian-based study into the decision making of individuals at risk for Huntington's disease (HD) with regard to predictive genetic testing. Sixteen in-depth interviews were conducted with a range of at-risk individuals. Data analysis revealed four discrete decision-making positions rather than a 'to test' or not to test' dichotomy. A conceptual dimension of (non-)openness and (non-)engagement characterized the various decisions. Processes of decision making and a concept of 'test readiness' were identified. Findings from this research, while not generalizable, are discussed in relation to theoretical frameworks and stage models of health decision making, as well as possible clinical implications.
Resumo:
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Resumo:
We investigated cross-cultural differences in the factor structure and psychometric properties of the 75-item Young Schema Questionnaire-Short Form (YSQ-SF). Participants were 833 South Korean and 271 Australian undergraduate students. The South Korean sample was randomly divided into two sub-samples. Sample A was used for Exploratory Factor Analysis (EFA) and sample B was used for Confirmatory Factor Analysis (CFA). EFA for the South Korean sample revealed a 13-factor solution to be the best fit for the data, and CFA on the data from sample B confirmed this result. CFA on the data from the Australian sample also revealed a 13-factor solution. The overall scale of the YSQ-SF demonstrated a high level of internal consistency in the South Korean and Australian groups. Furthermore, adequate internal consistencies for all subscales in the South Korean and Australian samples were demonstrated. In conclusion, the results showed that YSQ-SF with 13 factors has good psychometric properties and reliability for South Korean and Australian University students. Korean samples had significantly higher YSD scores on most of the 13 subscales than the Australian sample. However, limitations of the current study preclude the generalisability of the findings to beyond undergraduate student populations. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Background: In 1992, Frisch et al (Psychol Assess. 1992;4:92- 10 1) developed the Quality of Life Inventory (QOLI) to measure the concept of quality of life (QOL) because it has long been thought to be related to both physical and emotional well-being. However, the psychometric properties of the QOLI in clinical populations are still in debate. The present study examined the factor structure of QOLI and reported its validity and reliability in a clinical sample. Method: Two hundred seventeen patients with anxiety and depressive disorders completed the QOLI and additional questionnaires measuring symptoms (Zung Self-rating Depression Scale, Beck Anxiety Inventory, Fear Questionnaire, Depression Anxiety Stress Scale-Stress) and subjective well-being (Satisfaction With Life Scale) were also used. Results: Exploratory factor analysis via the principal components method, with oblique rotation, revealed a 2-factor structure that accounted for 42.73% of the total variance, and a subsequent confirmatory factor analysis suggested a moderate fit of the data to this model. The 2 factors appeared to describe self-oriented QOL and externally oriented QOL. The Cronbach alpha coefficients were 0.85 for the overall QOLI score, 0.81 for the first factor, and 0.75 for the second factor. Conclusion: Consistent evidence was also found to support the concurrent, discriminant, predictive, and criterion-related validity of the QOLI. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.
Resumo:
The importance of availability of comparable real income aggregates and their components to applied economic research is highlighted by the popularity of the Penn World Tables. Any methodology designed to achieve such a task requires the combination of data from several sources. The first is purchasing power parities (PPP) data available from the International Comparisons Project roughly every five years since the 1970s. The second is national level data on a range of variables that explain the behaviour of the ratio of PPP to market exchange rates. The final source of data is the national accounts publications of different countries which include estimates of gross domestic product and various price deflators. In this paper we present a method to construct a consistent panel of comparable real incomes by specifying the problem in state-space form. We present our completed work as well as briefly indicate our work in progress.