991 resultados para Data-cleaning


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report is an update of an earlier one produced in September 2009 (see Carrington et al. 2009) which remains as an ePrint through the project’s home page. The report focuses on our examination of extant data which have been sourced with respect to self-harm and suicide among males living in regional and remote Australia and which were available in public data bases at production time. Moreover, specific areas of concern regarding elevated rates of suicide for rural males and data anomalies which emerged during our examination of these data are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report is an update of an earlier one produced in January 2010 (see Carrington et al. 2010) which remains as an ePrint through the project’s home page. This report focuses on our examination of extant data which have been sourced with respect to intentional violence perpetrated or experienced by males living in regional and remote Australia . and which were available in public data bases at production. The nature of intentional violent acts can be physical, sexual or psychological or involve deprivation or neglect.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report is an update of an earlier one produced in January 2010 (see Carrington et al. 2010) which remains as an ePrint through the project’s home page. This report focuses on our examination of extant data which have been sourced with respect to unintentional serious and violent harm, including injuries, to males living in regional and remote Australia . and which were available in public data bases at production. Such harm typically might be caused by, for example, transport accidents, occupational exposures and hazards, burns and so on. Thus unintentional violent harm can cause physical trauma the consequences of which can lead to chronic conditions including psychological harm or substance abuse.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report is an update of an earlier one produced in January 2010 (see Carrington et al. 2010) which remains as an ePrint through the project’s home page. The report focus on our examination of extant data which have been sourced with respect to personally and socially risky behaviour associated with males living in regional and remote Australia and which were available in public data bases at production.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report is an update of an earlier one produced in January 2010 (see Carrington et al. 2010) which remains as an ePrint through the project’s home page. The report considers extant data which have been sourced with respect to some of the consequences of violent acts, incidents, harms and risky behaviour involving males living in regional and remote Australia and which were available in public data bases at production.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Various time-memory tradeoffs attacks for stream ciphers have been proposed over the years. However, the claimed success of these attacks assumes the initialisation process of the stream cipher is one-to-one. Some stream cipher proposals do not have a one-to-one initialisation process. In this paper, we examine the impact of this on the success of time-memory-data tradeoff attacks. Under the circumstances, some attacks are more successful than previously claimed while others are less. The conditions for both cases are established.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This special issue of the Journal of Urban Technology brings together five articles that are based on presentations given at the Street Computing workshop held on 24 November 2009 in Melbourne in conjunction with the Australian Computer-Human Interaction conference (OZCHI 2009). Our own article introduces the Street Computing vision and explores the potential, challenges and foundations of this research vision. In order to do so, we first look at the currently available sources of information and discuss their link to existing research efforts. Section 2 then introduces the notion of Street Computing and our research approach in more detail. Section 3 looks beyond the core concept itself and summarises related work in this field of interest.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a sequential Monte Carlo algorithm for Bayesian sequential experimental design applied to generalised non-linear models for discrete data. The approach is computationally convenient in that the information of newly observed data can be incorporated through a simple re-weighting step. We also consider a flexible parametric model for the stimulus-response relationship together with a newly developed hybrid design utility that can produce more robust estimates of the target stimulus in the presence of substantial model and parameter uncertainty. The algorithm is applied to hypothetical clinical trial or bioassay scenarios. In the discussion, potential generalisations of the algorithm are suggested to possibly extend its applicability to a wide variety of scenarios

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Several authors stress the importance of data’s crucial foundation for operational, tactical and strategic decisions (e.g., Redman 1998, Tee et al. 2007). Data provides the basis for decision making as data collection and processing is typically associated with reducing uncertainty in order to make more effective decisions (Daft and Lengel 1986). While the first series of investments of Information Systems/Information Technology (IS/IT) into organizations improved data collection, restricted computational capacity and limited processing power created challenges (Simon 1960). Fifty years on, capacity and processing problems are increasingly less relevant; in fact, the opposite exists. Determining data relevance and usefulness is complicated by increased data capture and storage capacity, as well as continual improvements in information processing capability. As the IT landscape changes, businesses are inundated with ever-increasing volumes of data from both internal and external sources available on both an ad-hoc and real-time basis. More data, however, does not necessarily translate into more effective and efficient organizations, nor does it increase the likelihood of better or timelier decisions. This raises questions about what data managers require to assist their decision making processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last few years we have observed a proliferation of approaches for clustering XML docu- ments and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the XML data to be clustered. These applications need data in the form of similar contents, tags, paths, structures and semantics. In this paper, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. This presentation leads to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering compo- nent. Finally, the paper moves into the description of future trends and research issues that still need to be faced.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper argues for a renewed focus on statistical reasoning in the beginning school years, with opportunities for children to engage in data modelling. Results are reported from the first year of a 3-year longitudinal study in which three classes of first-grade children (6-year-olds) and their teachers engaged in data modelling activities. The theme of Looking after our Environment, part of the children’s science curriculum, provided the task context. The goals for the two activities addressed here included engaging children in core components of data modelling, namely, selecting attributes, structuring and representing data, identifying variation in data, and making predictions from given data. Results include the various ways in which children represented and re represented collected data, including attribute selection, and the metarepresentational competence they displayed in doing so. The “data lenses” through which the children dealt with informal inference (variation and prediction) are also reported.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In response to the need to leverage private finance and the lack of competition in some parts of the Australian public sector infrastructure market, especially in the very large economic infrastructure sector procured using Pubic Private Partnerships, the Australian Federal government has demonstrated its desire to attract new sources of in-bound foreign direct investment (FDI). This paper aims to report on progress towards an investigation into the determinants of multinational contractors’ willingness to bid for Australian public sector major infrastructure projects. This research deploys Dunning’s eclectic theory for the first time in terms of in-bound FDI by multinational contractors into Australia. Elsewhere, the authors have developed Dunning’s principal hypothesis to suit the context of this research and to address a weakness arising in this hypothesis that is based on a nominal approach to the factors in Dunning's eclectic framework and which fails to speak to the relative explanatory power of these factors. In this paper, a first stage test of the authors' development of Dunning's hypothesis is presented by way of an initial review of secondary data vis-à-vis the selected sector (roads and bridges) in Australia (as the host location) and with respect to four selected home countries (China; Japan; Spain; and US). In doing so, the next stage in the research method concerning sampling and case studies is also further developed and described in this paper. In conclusion, the extent to which the initial review of secondary data suggests the relative importance of the factors in the eclectic framework is considered. It is noted that more robust conclusions are expected following the future planned stages of the research including primary data from the case studies and a global survey of the world’s largest contractors and which is briefly previewed. Finally, and beyond theoretical contributions expected from the overall approach taken to developing and testing Dunning’s framework, other expected contributions concerning research method and practical implications are mentioned.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A rule-based approach for classifying previously identified medical concepts in the clinical free text into an assertion category is presented. There are six different categories of assertions for the task: Present, Absent, Possible, Conditional, Hypothetical and Not associated with the patient. The assertion classification algorithms were largely based on extending the popular NegEx and Context algorithms. In addition, a health based clinical terminology called SNOMED CT and other publicly available dictionaries were used to classify assertions, which did not fit the NegEx/Context model. The data for this task includes discharge summaries from Partners HealthCare and from Beth Israel Deaconess Medical Centre, as well as discharge summaries and progress notes from University of Pittsburgh Medical Centre. The set consists of 349 discharge reports, each with pairs of ground truth concept and assertion files for system development, and 477 reports for evaluation. The system’s performance on the evaluation data set was 0.83, 0.83 and 0.83 for recall, precision and F1-measure, respectively. Although the rule-based system shows promise, further improvements can be made by incorporating machine learning approaches.