48 resultados para Automated data analysis
Resumo:
In an investigation intended to determine training needs of night crews, Bowers et al. (1998, this issue) report two studies showing that the patterning of communication is a better discriminator of good and poor crews than is the content of communication. Bowers et al. characterize their studies as intended to generate hypotheses for training needs and draw connections with Exploratory Sequential Data Analysis (ESDA). Although applauding the intentions of Bowers ct al., we point out some concerns with their characterization and implementation of ESDA. Our principal concern is that the Bowers et al. exploration of the data does not convincingly lead them back to a better fundamental understanding of the original phenomena they are investigating.
Resumo:
This paper develops an interactive approach for exploratory spatial data analysis. Measures of attribute similarity and spatial proximity are combined in a clustering model to support the identification of patterns in spatial information. Relationships between the developed clustering approach, spatial data mining and choropleth display are discussed. Analysis of property crime rates in Brisbane, Australia is presented. A surprising finding in this research is that there are substantial inconsistencies in standard choropleth display options found in two widely used commercial geographical information systems, both in terms of definition and performance. The comparative results demonstrate the usefulness and appeal of the developed approach in a geographical information system environment for exploratory spatial data analysis.
Resumo:
Disease resistance is associated with a plant defense response that involves an integrated set of signal transduction pathways. Changes in the expression patterns of 2.375 selected genes were examined simultaneously by cDNA microarray analysis in Arabidopsis thaliana after inoculation with an incompatible fungal pathogen Alternaria brassicicola or treatment with the defense-related signaling molecules salicylic acid (SA), methyl jasmonate (MJ), or ethylene, Substantial changes (up- and down-regulation) in the steady-state abundance of 705 mRNAs were observed in response to one or more of the treatments, including known and putative defense-related genes and 106 genes with no previously described function or homology, In leaf tissue inoculated with A. brassicicola, the abundance of 168 mRNAs was increased more than 2.5-fold, whereas that of 39 mRNAs was reduced. Similarly, the abundance of 192, 221, and 55 mRNAs was highly (>2.5-fold) increased after treatment with SA, MJ, and ethylene, respectively. Data analysis revealed a surprising level of coordinated defense responses, including 169 mRNAs regulated by multiple treatments/defense pathways. The largest number of genes coinduced (one of four induced genes) and corepressed was found after treatments with SA and MJ. In addition, 50% of the genes induced by ethylene treatment were also induced by MJ treatment. These results indicated the existence of a substantial network of regulatory interactions and coordination occurring during plant defense among the different defense signaling pathways, notably between the salicylate and jasmonate pathways that were previously thought to act in an antagonistic fashion.
Resumo:
Objective: To compare rates of self-reported use of health services between rural, remote and urban South Australians. Methods: Secondary data analysis from a population-based survey to assess health and well-being, conducted in South Australia in 2000. In all, 2,454 adults were randomly selected and interviewed using the computer-assisted telephone interview (CATI) system. We analysed health service use by Accessibility and Remoteness Index of Australia (ARIA) category. Results: There was no statistically significant difference in the median number of uses of the four types of health services studied across ARIA categories. Significantly fewer residents of highly accessible areas reported never using primary care services (14.4% vs. 22.2% in very remote areas), and significantly more reported high use ( greater than or equal to6 visits, 29.3% vs. 21.5%). Fewer residents of remote areas reported never attending hospital (65.6% vs. 73.8% in highly accessible areas). Frequency of use of mental health services was not statistically significantly different across ARIA categories. Very remote residents were more likely to spend at least one night in a public hospital (15.8%) than were residents of other areas (e.g. 5.9% for highly accessible areas). Conclusion: The self-reported frequency of use of a range of health services in South Australia was broadly similar across ARIA categories. However, use of primary care services was higher among residents of highly accessible areas and public hospital use increased with increasing remoteness. There is no evidence for systematic rural disadvantage in terms of self-reported health service utilisation in this State.
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
An increasing number of studies shows that the glycogen-accumulating organisms (GAOs) can survive and may indeed proliferate under the alternating anaerobic/aerobic conditions found in EBPR systems, thus forming a strong competitor of the polyphosphate-accumulating organisms (PAOs). Understanding their behaviors in a mixed PAO and GAO culture under various operational conditions is essential for developing operating strategies that disadvantage the growth of this group of unwanted organisms. A model-based data analysis method is developed in this paper for the study of the anaerobic PAO and GAO activities in a mixed PAO and GAO culture. The method primarily makes use of the hydrogen ion production rate and the carbon dioxide transfer rate resulting from the acetate uptake processes by PAOs and GAOs, measured with a recently developed titration and off-gas analysis (TOGA) sensor. The method is demonstrated using the data from a laboratory-scale sequencing batch reactor (SBR) operated under alternating anaerobic and aerobic conditions. The data analysis using the proposed method strongly indicates a coexistence of PAOs and GAOs in the system, which was independently confirmed by fluorescent in situ hybridization (FISH) measurement. The model-based analysis also allowed the identification of the respective acetate uptake rates by PAOs and GAOs, along with a number of kinetic and stoichiometric parameters involved in the PAO and GAO models. The excellent fit between the model predictions and the experimental data not involved in parameter identification shows that the parameter values found are reliable and accurate. It also demonstrates that the current anaerobic PAO and GAO models are able to accurately characterize the PAO/GAO mixed culture obtained in this study. This is of major importance as no pure culture of either PAOs or GAOs has been reported to date, and hence the current PAO and GAO models were developed for the interpretation of experimental results of mixed cultures. The proposed method is readily applicable for detailed investigations of the competition between PAOs and GAOs in enriched cultures. However, the fermentation of organic substrates carried out by ordinary heterotrophs needs to be accounted for when the method is applied to the study of PAO and GAO competition in full-scale sludges. (C) 2003 Wiley Periodicals, Inc.
Resumo:
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Resumo:
The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.
Resumo:
One approach to microbial genotyping is to make use of sets of single-nucleotide polymorphisms (SNPs) in combination with binary markers. Here we report the modification and automation of a SNP-plus-binary-marker-based approach to the genotyping of Staphylococcus aureus and its application to 391 S. aureus isolates from southeast Queensland, Australia. The SNPs used were arcC210, tpi243, arcC162, gmk318, pta294, tpi36, tpi241, and pta383. These provide a Simpson's index of diversity (D) of 0.95 with respect to the S. aureus multilocus sequence typing database and define 61 genotypes and the major clonal complexes. The binary markers used were pvl, cna, sdrE, pT181, and pUB110. Two novel real-time PCR formats for interrogating these markers were compared. One of these makes use of light upon extension (LUX) primers and biplexed reactions, while the other is a streamlined modification of kinetic PCR using SYBR green. The latter format proved to be more robust. In addition, automated methods for DNA template preparation, reaction setup, and data analysis were developed. A single SNP-based method for ST-93 (Queensland clone) identification was also devised. The genotyping revealed the numerical importance of the South West Pacific and Queensland community-acquired methicillin-resistant S. aureus (MRSA) clones and the clonal complex 239 Aus-1/Aus-2 hospital-associated MRSA. There was a strong association between the community-acquired clones and pvl.
Resumo:
We present an application of Mathematical Morphology (MM) for the classification of astronomical objects, both for star/galaxy differentiation and galaxy morphology classification. We demonstrate that, for CCD images, 99.3 +/- 3.8% of galaxies can be separated from stars using MM, with 19.4 +/- 7.9% of the stars being misclassified. We demonstrate that, for photographic plate images, the number of galaxies correctly separated from the stars can be increased using our MM diffraction spike tool, which allows 51.0 +/- 6.0% of the high-brightness galaxies that are inseparable in current techniques to be correctly classified, with only 1.4 +/- 0.5% of the high-brightness stars contaminating the population. We demonstrate that elliptical (E) and late-type spiral (Sc-Sd) galaxies can be classified using MM with an accuracy of 91.4 +/- 7.8%. It is a method involving fewer 'free parameters' than current techniques, especially automated machine learning algorithms. The limitation of MM galaxy morphology classification based on seeing and distance is also presented. We examine various star/galaxy differentiation and galaxy morphology classification techniques commonly used today, and show that our MM techniques compare very favourably.
Resumo:
The importance of availability of comparable real income aggregates and their components to applied economic research is highlighted by the popularity of the Penn World Tables. Any methodology designed to achieve such a task requires the combination of data from several sources. The first is purchasing power parities (PPP) data available from the International Comparisons Project roughly every five years since the 1970s. The second is national level data on a range of variables that explain the behaviour of the ratio of PPP to market exchange rates. The final source of data is the national accounts publications of different countries which include estimates of gross domestic product and various price deflators. In this paper we present a method to construct a consistent panel of comparable real incomes by specifying the problem in state-space form. We present our completed work as well as briefly indicate our work in progress.
Resumo:
The results of empirical studies are limited to particular contexts, difficult to generalise and the studies themselves are expensive to perform. Despite these problems, empirical studies in software engineering can be made effective and they are important to both researchers and practitioners. The key to their effectiveness lies in the maximisation of the information that can be gained by examining existing studies, conducting power analyses for an accurate minimum sample size and benefiting from previous studies through replication. This approach was applied in a controlled experiment examining the combination of automated static analysis tools and code inspection in the context of verification and validation (V&V) of concurrent Java components. The combination of these V&V technologies was shown to be cost-effective despite the size of the study, which thus contributes to research in V&V technology evaluation.
Resumo:
Traditionally the basal ganglia have been implicated in motor behavior, as they are involved in both the execution of automatic actions and the modification of ongoing actions in novel contexts. Corresponding to cognition, the role of the basal ganglia has not been defined as explicitly. Relative to linguistic processes, contemporary theories of subcortical participation in language have endorsed a role for the globus pallidus internus (GPi) in the control of lexical-semantic operations. However, attempts to empirically validate these postulates have been largely limited to neuropsychological investigations of verbal fluency abilities subsequent to pallidotomy. We evaluated the impact of bilateral posteroventral pallidotomy (BPVP) on language function across a range of general and high-level linguistic abilities, and validated/extended working theories of pallidal participation in language. Comprehensive linguistic profiles were compiled up to 1 month before and 3 months after BPVP in 6 subjects with Parkinson's disease (PD). Commensurate linguistic profiles were also gathered over a 3-month period for a nonsurgical control cohort of 16 subjects with PD and a group of 16 non-neurologically impaired controls (NC). Nonparametric between-groups comparisons were conducted and reliable change indices calculated, relative to baseline/3-month follow-up difference scores. Group-wise statistical comparisons between the three groups failed to reveal significant postoperative changes in language performance. Case-by-case data analysis relative to clinically consequential change indices revealed reliable alterations in performance across several language variables as a consequence of BPVP. These findings lend support to models of subcortical participation in language, which promote a role for the GPi in lexical-semantic manipulation mechanisms. Concomitant improvements and decrements in postoperative performance were interpreted within the context of additive and subtractive postlesional effects. Relative to parkinsonian cohorts, clinically reliable versus statistically significant changes on a case by case basis may provide the most accurate method of characterizing the way in which pathophysiologically divergent basal ganglia linguistic circuits respond to BPVP.
Resumo:
Observational longitudinal research is particularly useful for assessing etiology and prognosis and for providing evidence for clinical decision making. However, there are no structured reporting requirements for studies of this design to assist authors, editors, and readers. The authors developed and tested a checklist of criteria related to threats to the internal and external validity of observational longitudinal studies. The checklist criteria concerned recruitment, data collection, biases, and data analysis and descriptive issues relevant to study rationale, study population, and generalizability. Two raters independently assessed 49 randomly selected articles describing stroke research published from 1999 to 2003 in six journals: American Journal of Epidemiology, Journal of Epidemiology and Community Health, Stroke, Annals of Neurology, Archives of Physical Medicine and Rehabilitation, and American Journal of Physical Medicine and Rehabilitation. On average, 17 of the 33 checklist criteria were reported. Criteria describing the study design were better reported than those related to internal validity. No relation was found between study type (etiologic or prognostic) or word count and quality of reporting. A flow diagram for summarizing participant flow through a study was developed. Editors and authors should consider using a checklist and flow diagram when reporting on observational longitudinal research.