186 resultados para data complexity
em University of Queensland eSpace - Australia
Resumo:
The schema of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. Obtaining quickly the appropriate data increases the likelihood that an organization will make good decisions and respond adeptly to challenges. This research presents and validates a methodology for evaluating, ex ante, the relative desirability of alternative instantiations of a model of data. In contrast to prior research, each instantiation is based on a different formal theory. This research theorizes that the instantiation that yields the lowest weighted average query complexity for a representative sample of information requests is the most desirable instantiation for end-user queries. The theory was validated by an experiment that compared end-user performance using an instantiation of a data structure based on the relational model of data with performance using the corresponding instantiation of the data structure based on the object-relational model of data. Complexity was measured using three different Halstead metrics: program length, difficulty, and effort. For a representative sample of queries, the average complexity using each instantiation was calculated. As theorized, end users querying the instantiation with the lower average complexity made fewer semantic errors, i.e., were more effective at composing queries. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
Since their discovery 150 years ago, Neanderthals have been considered incapable of behavioural change and innovation. Traditional synchronic approaches to the study of Neanderthal behaviour have perpetuated this view and shaped our understanding of their lifeways and eventual extinction. In this thesis I implement an innovative diachronic approach to the analysis of Neanderthal faunal extraction, technology and symbolic behaviour as contained in the archaeological record of the critical period between 80,000 and 30,000 years BP. The thesis demonstrates patterns of change in Neanderthal behaviour which are at odds with traditional perspectives and which are consistent with an interpretation of increasing behavioural complexity over time, an idea that has been suggested but never thoroughly explored in Neanderthal archaeology. Demonstrating an increase in behavioural complexity in Neanderthals provides much needed new data with which to fuel the debate over the behavioural capacities of Neanderthals and the first appearance of Modern Human Behaviour in Europe. It supports the notion that Neanderthal populations were active agents of behavioural innovation prior to the arrival of Anatomically Modern Humans in Europe and, ultimately, that they produced an early Upper Palaeolithic cultural assemblage (the Châtelperronian) independent of modern humans. Overall, this thesis provides an initial step towards the development of a quantitative approach to measuring behavioural complexity which provides fresh insights into the cognitive and behavioural capabilities of Neanderthals.
Resumo:
This paper presents an analysis of dysfluencies in two oral tellings of a familiar children's story by a young boy with autism. Thurber & Tager-Flusberg (1993) postulate a lower degree of cognitive and communicative investment to explain a lower frequency of non-grammatical pauses observed in elicited narratives of children with autism in comparison to typically developing and intellectually disabled controls. we also found a very low frequency of non-grammatical pauses in our data, but indications of high engagement and cognitive and communicative investment. We point to a wider range of disfluencies as indicators of cognitive load, and show that the kind and location of dysfluencies produced may reveal which aspects of the narrative task are creating the greatest cognitive demand: here, mental state ascription, perspectivization, and adherence to story schema. This paper thus generates analytical options and hypotheses that can be explored further in a larger population of children with autism and typically developing controls.
Resumo:
This paper presents a new relative measure of signal complexity, referred to here as relative structural complexity, which is based on the matching pursuit (MP) decomposition. By relative, we refer to the fact that this new measure is highly dependent on the decomposition dictionary used by MP. The structural part of the definition points to the fact that this new measure is related to the structure, or composition, of the signal under analysis. After a formal definition, the proposed relative structural complexity measure is used in the analysis of newborn EEG. To do this, firstly, a time-frequency (TF) decomposition dictionary is specifically designed to compactly represent the newborn EEG seizure state using MP. We then show, through the analysis of synthetic and real newborn EEG data, that the relative structural complexity measure can indicate changes in EEG structure as it transitions between the two EEG states; namely seizure and background (non-seizure).
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
In the past century, the debate over whether or not density-dependent factors regulate populations has generally focused on changes in mean population density, ignoring the spatial variance around the mean as unimportant noise. In an attempt to provide a different framework for understanding population dynamics based on individual fitness, this paper discusses the crucial role of spatial variability itself on the stability of insect populations. The advantages of this method are the following: (1) it is founded on evolutionary principles rather than post hoc assumptions; (2) it erects hypotheses that can be tested; and (3) it links disparate ecological schools, including spatial dynamics, behavioral ecology, preference-performance, and plant apparency into an overall framework. At the core of this framework, habitat complexity governs insect spatial variance. which in turn determines population stability. First, the minimum risk distribution (MRD) is defined as the spatial distribution of individuals that results in the minimum number of premature deaths in a population given the distribution of mortality risk in the habitat (and, therefore, leading to maximized population growth). The greater the divergence of actual spatial patterns of individuals from the MRD, the greater the reduction of population growth and size from high, unstable levels. Then, based on extensive data from 29 populations of the processionary caterpillar, Ochrogaster lunifer, four steps are used to test the effect of habitat interference on population growth rates. (1) The costs (increasing the risk of scramble competition) and benefits (decreasing the risk of inverse density-dependent predation) of egg and larval aggregation are quantified. (2) These costs and benefits, along with the distribution of resources, are used to construct the MRD for each habitat. (3) The MRD is used as a benchmark against which the actual spatial pattern of individuals is compared. The degree of divergence of the actual spatial pattern from the MRD is quantified for each of the 29 habitats. (4) Finally, indices of habitat complexity are used to provide highly accurate predictions of spatial divergence from the MRD, showing that habitat interference reduces population growth rates from high, unstable levels. The reason for the divergence appears to be that high levels of background vegetation (vegetation other than host plants) interfere with female host-searching behavior. This leads to a spatial distribution of egg batches with high mortality risk, and therefore lower population growth. Knowledge of the MRD in other species should be a highly effective means of predicting trends in population dynamics. Species with high divergence between their actual spatial distribution and their MRD may display relatively stable dynamics at low population levels. In contrast, species with low divergence should experience high levels of intragenerational population growth leading to frequent habitat-wide outbreaks and unstable dynamics in the long term. Six hypotheses, erected under the framework of spatial interference, are discussed, and future tests are suggested.
Resumo:
We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Children aged between 3 and 7 years were taught simple and dimension-abstracted oddity discrimination using learning-set training techniques, in which isomorphic problems with varying content were presented with verbal explanation and feedback. Following the training phase, simple oddity (SO), dimension-abstracted oddity with one or two irrelevant dimensions, and non-oddity (NO) tasks were presented (without feedback) to determine the basis of solution. Although dimension-abstracted oddity requires discrimination based on a stimulus that is different from the others, which are all the same as each other on the relevant dimension, this was not the major strategy. The data were more consistent with use of a simple oddity strategy by 3- to 4-year-olds, and a most different strategy by 6- to 7-year-olds. These strategies are interpreted as reducing task complexity. (C) 2002 Elsevier Science Inc. All rights reserved.
Resumo:
In simultaneous analyses of multiple data partitions, the trees relevant when measuring support for a clade are the optimal tree, and the best tree lacking the clade (i.e., the most reasonable alternative). The parsimony-based method of partitioned branch support (PBS) forces each data set to arbitrate between the two relevant trees. This value is the amount each data set contributes to clade support in the combined analysis, and can be very different to support apparent in separate analyses. The approach used in PBS can also be employed in likelihood: a simultaneous analysis of all data retrieves the maximum likelihood tree, and the best tree without the clade of interest is also found. Each data set is fitted to the two trees and the log-likelihood difference calculated, giving partitioned likelihood support (PLS) for each data set. These calculations can be performed regardless of the complexity of the ML model adopted. The significance of PLS can be evaluated using a variety of resampling methods, such as the Kishino-Hasegawa test, the Shimodiara-Hasegawa test, or likelihood weights, although the appropriateness and assumptions of these tests remains debated.
Resumo:
Capturing the voices of women when the issue is of a sensitive nature has been a major concern of feminist researchers. It has often been argued that interpretive methods are the most appropriate way to collect such information, but there are other appropriate ways to approach the design of research. This article explores the use of a mixed-method approach to collect data on incontinence in older women and argues for the use of a variety of creative approaches to collect and analyze data.
Resumo:
The data structure of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. This research develops a methodology for evaluating, ex ante, the relative desirability of alternative data structures for end user queries. This research theorizes that the data structure that yields the lowest weighted average complexity for a representative sample of information requests is the most desirable data structure for end user queries. The theory was tested in an experiment that compared queries from two different relational database schemas. As theorized, end users querying the data structure associated with the less complex queries performed better Complexity was measured using three different Halstead metrics. Each of the three metrics provided excellent predictions of end user performance. This research supplies strong evidence that organizations can use complexity metrics to evaluate, ex ante, the desirability of alternate data structures. Organizations can use these evaluations to enhance the efficient and effective retrieval of information by creating data structures that minimize end user query complexity.
Resumo:
This paper provides an analysis of data from a state-wide survey of statutory child protection workers, adult mental health workers, and child mental health workers. Respondents provided details of their experience of collaboration on cases where a parent had mental health problems and there were serious child protection concerns. The survey was conducted as part of a large mixed-method research project on developing best practice at the intersection of child protection and mental health services. Descriptions of 300 cases were provided by 122 respondents. Analyses revealed that a great deal of collaboration occur-red across a wide range of government and community-based agencies; that collaborative processes were often positive and rewarding for workers; and that collaboration was most difficult when the nature of the parental mental illness or the need for child protection intervention was contested. The difficulties experienced included communication, role clarity, competing primary focus, contested parental mental health needs, contested child protection needs, and resources. (C) 2004 Elsevier Ltd. All rights reserved.
Resumo:
New tools derived from advances in molecular biology have not been widely adopted in plant breeding for complex traits because of the inability to connect information at gene level to the phenotype in a manner that is useful for selection. In this study, we explored whether physiological dissection and integrative modelling of complex traits could link phenotype complexity to underlying genetic systems in a way that enhanced the power of molecular breeding strategies. A crop and breeding system simulation study on sorghum, which involved variation in 4 key adaptive traits-phenology, osmotic adjustment, transpiration efficiency, stay-green-and a broad range of production environments in north-eastern Australia, was used. The full matrix of simulated phenotypes, which consisted of 547 location-season combinations and 4235 genotypic expression states, was analysed for genetic and environmental effects. The analysis was conducted in stages assuming gradually increased understanding of gene-to-phenotype relationships, which would arise from physiological dissection and modelling. It was found that environmental characterisation and physiological knowledge helped to explain and unravel gene and environment context dependencies in the data. Based on the analyses of gene effects, a range of marker-assisted selection breeding strategies was simulated. It was shown that the inclusion of knowledge resulting from trait physiology and modelling generated an enhanced rate of yield advance over cycles of selection. This occurred because the knowledge associated with component trait physiology and extrapolation to the target population of environments by modelling removed confounding effects associated with environment and gene context dependencies for the markers used. Developing and implementing this gene-to-phenotype capability in crop improvement requires enhanced attention to phenotyping, ecophysiological modelling, and validation studies to test the stability of candidate genetic regions.
Resumo:
Objective: Recent data from Education Queensland has identified rising numbers of children receiving diagnoses of autistic spectrum disorder (ASD). Faced with funding diagnostic pressures, in clinical situations that are complex and inherently uncertain, it is possible that specialists err on the side of a positive diagnosis. This study examines the extent to which possible overinclusion of ASD diagnosis may exist in the presence of uncertainty and factors potentially related to this practice in Queensland. Methods: Using anonymous self-report, all Queensland child psychiatrists and paediatricians who see paediatric patients with development/behavioural problems were surveyed and asked whether they had ever specified an ASD diagnosis in the presence of diagnostic uncertainty. Using logistic regression, elicited responses to the diagnostic uncertainty questions were related to other clinical- and practice-related characteristics. Results: Overall, 58% of surveyed psychiatrists and paediatricians indicated that, in the face of diagnostic uncertainty, they had erred on the side of providing an ASD diagnosis for educational ascertainment and 36% of clinicians had provided an autism diagnosis for Carer's Allowance when Centrelink diagnostic specifications had not been met. Conclusion: In the absence of definitive biological markers, ASD remains a behavioural diagnosis that is often complex and uncertain. In response to systems that demand a categorical diagnostic response, specialists are providing ASD diagnoses, even when uncertain. The motivation for this practice appears to be a clinical risk/benefit analysis of what will achieve the best outcomes for children. It is likely that these practices will continue unless systems change eligibility to funding based on functional impairment rather than medical diagnostic categories.