906 resultados para Data Interpretation, Statistical
Resumo:
Time perception is studied with subjective or semi-objective psychophysical methods. With subjective methods, observers provide quantitative estimates of duration and data depict the psychophysical function relating subjective duration to objective duration. With semi-objective methods, observers provide categorical or comparative judgments of duration and data depict the psychometric function relating the probability of a certain judgment to objective duration. Both approaches are used to study whether subjective and objective time run at the same pace or whether time flies or slows down under certain conditions. We analyze theoretical aspects affecting the interpretation of data gathered with the most widely used semi-objective methods, including single-presentation and paired-comparison methods. For this purpose, a formal model of psychophysical performance is used in which subjective duration is represented via a psychophysical function and the scalar property. This provides the timing component of the model, which is invariant across methods. A decisional component that varies across methods reflects how observers use subjective durations to make judgments and give the responses requested under each method. Application of the model shows that psychometric functions in single-presentation methods are uninterpretable because the various influences on observed performance are inextricably confounded in the data. In contrast, data gathered with paired-comparison methods permit separating out those influences. Prevalent approaches to fitting psychometric functions to data are also discussed and shown to be inconsistent with widely accepted principles of time perception, implicitly assuming instead that subjective time equals objective time and that observed differences across conditions do not reflect differences in perceived duration but criterion shifts. These analyses prompt evidence-based recommendations for best methodological practice in studies on time perception.
Error, Bias, and Long-Branch Attraction in Data for Two Chloroplast Photosystem Genes in Seed Plants
Resumo:
Sequences of two chloroplast photosystem genes, psaA and psbB, together comprising about 3,500 bp, were obtained for all five major groups of extant seed plants and several outgroups among other vascular plants. Strongly supported, but significantly conflicting, phylogenetic signals were obtained in parsimony analyses from partitions of the data into first and second codon positions versus third positions. In the former, both genes agreed on a monophyletic gymnosperms, with Gnetales closely related to certain conifers. In the latter, Gnetales are inferred to be the sister group of all other seed plants, with gymnosperms paraphyletic. None of the data supported the modern ‘‘anthophyte hypothesis,’’ which places Gnetales as the sister group of flowering plants. A series of simulation studies were undertaken to examine the error rate for parsimony inference. Three kinds of errors were examined: random error, systematic bias (both properties of finite data sets), and statistical inconsistency owing to long-branch attraction (an asymptotic property). Parsimony reconstructions were extremely biased for third-position data for psbB. Regardless of the true underlying tree, a tree in which Gnetales are sister to all other seed plants was likely to be reconstructed for these data. None of the combinations of genes or partitions permits the anthophyte tree to be reconstructed with high probability. Simulations of progressively larger data sets indicate the existence of long-branch attraction (statistical inconsistency) for third-position psbB data if either the anthophyte tree or the gymnosperm tree is correct. This is also true for the anthophyte tree using either psaA third positions or psbB first and second positions. A factor contributing to bias and inconsistency is extremely short branches at the base of the seed plant radiation, coupled with extremely high rates in Gnetales and nonseed plant outgroups. M. J. Sanderson,* M. F. Wojciechowski,*† J.-M. Hu,* T. Sher Khan,* and S. G. Brady
Resumo:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
Resumo:
Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.
Resumo:
Reliable budget/cost estimates for road maintenance and rehabilitation are subjected to uncertainties and variability in road asset condition and characteristics of road users. The CRC CI research project 2003-029-C ‘Maintenance Cost Prediction for Road’ developed a method for assessing variation and reliability in budget/cost estimates for road maintenance and rehabilitation. The method is based on probability-based reliable theory and statistical method. The next stage of the current project is to apply the developed method to predict maintenance/rehabilitation budgets/costs of large networks for strategic investment. The first task is to assess the variability of road data. This report presents initial results of the analysis in assessing the variability of road data. A case study of the analysis for dry non reactive soil is presented to demonstrate the concept in analysing the variability of road data for large road networks. In assessing the variability of road data, large road networks were categorised into categories with common characteristics according to soil and climatic conditions, pavement conditions, pavement types, surface types and annual average daily traffic. The probability distributions, statistical means, and standard deviation values of asset conditions and annual average daily traffic for each type were quantified. The probability distributions and the statistical information obtained in this analysis will be used to asset the variation and reliability in budget/cost estimates in later stage. Generally, we usually used mean values of asset data of each category as input values for investment analysis. The variability of asset data in each category is not taken into account. This analysis method demonstrated that it can be used for practical application taking into account the variability of road data in analysing large road networks for maintenance/rehabilitation investment analysis.
Resumo:
Psychologists investigating dreams in non-Western cultures have generally not considered the meanings of dreams within the unique meaning-structure of the person in his or her societal context. The majority of dream studies in African societies are no exception. Researchers approaching dreams within rural Xhosa and Zulu speaking societies have either adopted an anthropological or a psychodynamic orientation. The latter approach particularly imposes a Western perspective in the interpretation of dream material. There have been no comparable studies of dream interpretation among urban blacks participating in the African Independent Church Movement. The present study focuses on the rural Xhosa speaking people and the urban black population who speak one of the Nguni languages and identify with the African Independent Church Movement. The study is concerned with understanding the meanings of dreams within the cultural context in which they occur. The specific aims of the study are: 1. To explicate the indigenous system of dream interpretation as revealed by acknowledged dream experts. 2. To examine the commonalities and the differences between the interpretation of dreams in two groups, drawn from a rural and urban setting respectively. 3. To elaborate upon the life-world of the participants by the interpretations gained from the above investigation. One hundred dreams and interpretations are collected from two categories of participants referred to as the Rural Group and the Urban Group. The Rural Group is made up of amagqira [traditional healers] and their clients, while the Urban Group consists of prophets and members of the African Independent Churches. Each group includes acknowledged dream experts. A phenomenological methodology is adopted in explicating the data. The methodological precedure involves a number of rigorous stages of expl ication whereby the original data is reduced to Constituent Profiles leading to the construction of a Thematic Index File. By searching and reflect ing upon the data, interpretative themes are identified. These themes are explicated to provide a rigorous description of the interpretative-reality of each group. Themes explicated w i thin the Rural Group are: the physiognomy of the dreamer's life-world as revealed by ithongo, the interpretation of ithongo as revealed through action, the dream relationship as an anticipatory mode-of-existence, iphupha as disclosing a vulnerable mode-of-being, human bodiliness as revealed in dream interpretations and the legitimation of the interpretative-reality within the life-world. Themes explicated within the Urban Group are: the phys iognomy of the dreamer's life-world revealed in their dream-existence, the interpretative-reality revealed through the enaction of dreams, tension between the newer Christian-based cosomology and the traditional cultural-based cosmology, a moral imperative, prophetic perception and human bodiliness, as revealed in dream interpretations and the legitimation of the interpretative-reality within the life-world. The essence of the interpretative-reality of both groups is very similar and is expressed in the notion of relatedness to a cosmic mode-of-being. The cosmic mode-of-being includes a numinous dimension which is expressed through divine presence in the form of ancestors, Holy Spirit or God. These notions cannot be apprehended by theoretical constructs alone but may be grasped and given form in meaning-disclosing intuitions which are expressed in the lifeworld in terms of bodiliness, revelatory knowledge, action and healing. Some differences b e tween the two groups are evident and reveal some conflict between the monotheistic Christian cosmology and the traditional cosmology. Unique aspects of the interpetative-reality of the Urban Group are expressed in terms of difficulties in the urban social environment and the notion of a moral imperative. It is observed that cul tural self-expression based upon traditional ideas continues to play a significant role in the urban environment. The apparent conflict revealed between the respective cosmologies underlies an integration of the aditional meanings with Christian concepts. This finding is consistent with the literature suggesting that the African Independent Church is a syncretic movement. The life-world is based upon the immediate and vivid experience of the numinous as revealed in the dream phenomenon. The participants' approach to dreams is not based upon an explicit theory, but upon an immediate and pathic understanding of the dream phenomenon. The understanding is based upon the interpreter's concrete understanding of the life-world, which includes the possibility of cosmic integration and continuity between the personal and transpersonal realms of being. The approach is characterized as an expression of man's primordial attunement with the cosmos. The approach of the participants to dreams may not b e consistent with a Western rational orientation, but neverthele ss, it is a valid approach . The validity is based upon the immediate life-world of experience which is intelligible, coherent, and above all, it is meaning-giving in revealing life-possibility within the context of human existence.
Resumo:
Migraine is a painful disorder for which the etiology remains obscure. Diagnosis is largely based on International Headache Society criteria. However, no feature occurs in all patients who meet these criteria, and no single symptom is required for diagnosis. Consequently, this definition may not accurately reflect the phenotypic heterogeneity or genetic basis of the disorder. Such phenotypic uncertainty is typical for complex genetic disorders and has encouraged interest in multivariate statistical methods for classifying disease phenotypes. We applied three popular statistical phenotyping methods—latent class analysis, grade of membership and grade of membership “fuzzy” clustering (Fanny)—to migraine symptom data, and compared heritability and genome-wide linkage results obtained using each approach. Our results demonstrate that different methodologies produce different clustering structures and non-negligible differences in subsequent analyses. We therefore urge caution in the use of any single approach and suggest that multiple phenotyping methods be used.
Resumo:
An educational priority of many nations is to enhance mathematical learning in early childhood. One area in need of special attention is that of statistics. This paper argues for a renewed focus on statistical reasoning in the beginning school years, with opportunities for children to engage in data modelling activities. Such modelling involves investigations of meaningful phenomena, deciding what is worthy of attention (i.e., identifying complex attributes), and then progressing to organising, structuring, visualising, and representing data. Results are reported from the first year of a three-year longitudinal study in which three classes of first-grade children and their teachers engaged in activities that required the creation of data models. The theme of “Looking after our Environment,” a component of the children’s science curriculum at the time, provided the context for the activities. Findings focus on how the children dealt with given complex attributes and how they generated their own attributes in classifying broad data sets, and the nature of the models the children created in organising, structuring, and representing their data.
Resumo:
Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.
Resumo:
Seasonal patterns have been found in a remarkable range of health conditions, including birth defects, respiratory infections and cardiovascular disease. Accurately estimating the size and timing of seasonal peaks in disease incidence is an aid to understanding the causes and possibly to developing interventions. With global warming increasing the intensity of seasonal weather patterns around the world, a review of the methods for estimating seasonal effects on health is timely. This is the first book on statistical methods for seasonal data written for a health audience. It describes methods for a range of outcomes (including continuous, count and binomial data) and demonstrates appropriate techniques for summarising and modelling these data. It has a practical focus and uses interesting examples to motivate and illustrate the methods. The statistical procedures and example data sets are available in an R package called ‘season’. Adrian Barnett is a senior research fellow at Queensland University of Technology, Australia. Annette Dobson is a Professor of Biostatistics at The University of Queensland, Australia. Both are experienced medical statisticians with a commitment to statistical education and have previously collaborated in research in the methodological developments and applications of biostatistics, especially to time series data. Among other projects, they worked together on revising the well-known textbook "An Introduction to Generalized Linear Models," third edition, Chapman Hall/CRC, 2008. In their new book they share their knowledge of statistical methods for examining seasonal patterns in health.
Resumo:
Emerging data streaming applications in Wireless Sensor Networks require reliable and energy-efficient Transport Protocols. Our recent Wireless Sensor Network deployment in the Burdekin delta, Australia, for water monitoring [T. Le Dinh, W. Hu, P. Sikka, P. Corke, L. Overs, S. Brosnan, Design and deployment of a remote robust sensor network: experiences from an outdoor water quality monitoring network, in: Second IEEE Workshop on Practical Issues in Building Sensor Network Applications (SenseApp 2007), Dublin, Ireland, 2007] is one such example. This application involves streaming sensed data such as pressure, water flow rate, and salinity periodically from many scattered sensors to the sink node which in turn relays them via an IP network to a remote site for archiving, processing, and presentation. While latency is not a primary concern in this class of application (the sampling rate is usually in terms of minutes or hours), energy-efficiency is. Continuous long-term operation and reliable delivery of the sensed data to the sink are also desirable. This paper proposes ERTP, an Energy-efficient and Reliable Transport Protocol for Wireless Sensor Networks. ERTP is designed for data streaming applications, in which sensor readings are transmitted from one or more sensor sources to a base station (or sink). ERTP uses a statistical reliability metric which ensures the number of data packets delivered to the sink exceeds the defined threshold. Our extensive discrete event simulations and experimental evaluations show that ERTP is significantly more energyefficient than current approaches and can reduce energy consumption by more than 45% when compared to current approaches. Consequently, sensor nodes are more energy-efficient and the lifespan of the unattended WSN is increased.