869 resultados para surrogate data
Resumo:
Rapid recursive estimation of hidden Markov Model (HMM) parameters is important in applications that place an emphasis on the early availability of reasonable estimates (e.g. for change detection) rather than the provision of longer-term asymptotic properties (such as convergence, convergence rate, and consistency). In the context of vision- based aircraft (image-plane) heading estimation, this paper suggests and evaluates the short-data estimation properties of 3 recursive HMM parameter estimation techniques (a recursive maximum likelihood estimator, an online EM HMM estimator, and a relative entropy based estimator). On both simulated and real data, our studies illustrate the feasibility of rapid recursive heading estimation, but also demonstrate the need for careful step-size design of HMM recursive estimation techniques when these techniques are intended for use in applications where short-data behaviour is paramount.
Resumo:
The upstream oil & gas industry has been contending with massive data sets and monolithic files for many years, but “Big Data”—that is, the ability to apply more sophisticated types of analytical tools to information in a way that extracts new insights or creates new forms of value—is a relatively new concept that has the potential to significantly re-shape the industry. Despite the impressive amount of value that is being realized by Big Data technologies in other parts of the marketplace, however, much of the data collected within the oil & gas sector tends to be discarded, ignored, or analyzed in a very cursory way. This paper examines existing data management practices in the upstream oil & gas industry, and compares them to practices and philosophies that have emerged in organizations that are leading the Big Data revolution. The comparison shows that, in companies that are leading the Big Data revolution, data is regarded as a valuable asset. The presented evidence also shows, however, that this is usually not true within the oil & gas industry insofar as data is frequently regarded there as descriptive information about a physical asset rather than something that is valuable in and of itself. The paper then discusses how upstream oil & gas companies could potentially extract more value from data, and concludes with a series of specific technical and management-related recommendations to this end.
Resumo:
Heterogeneous health data is a critical issue when managing health information for quality decision making processes. In this paper we examine the efficient aggregation of lifestyle information through a data warehousing architecture lens. We present a proof of concept for a clinical data warehouse architecture that enables evidence based decision making processes by integrating and organising disparate data silos in support of healthcare services improvement paradigms.
Resumo:
Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.
Resumo:
Interpolation techniques for spatial data have been applied frequently in various fields of geosciences. Although most conventional interpolation methods assume that it is sufficient to use first- and second-order statistics to characterize random fields, researchers have now realized that these methods cannot always provide reliable interpolation results, since geological and environmental phenomena tend to be very complex, presenting non-Gaussian distribution and/or non-linear inter-variable relationship. This paper proposes a new approach to the interpolation of spatial data, which can be applied with great flexibility. Suitable cross-variable higher-order spatial statistics are developed to measure the spatial relationship between the random variable at an unsampled location and those in its neighbourhood. Given the computed cross-variable higher-order spatial statistics, the conditional probability density function (CPDF) is approximated via polynomial expansions, which is then utilized to determine the interpolated value at the unsampled location as an expectation. In addition, the uncertainty associated with the interpolation is quantified by constructing prediction intervals of interpolated values. The proposed method is applied to a mineral deposit dataset, and the results demonstrate that it outperforms kriging methods in uncertainty quantification. The introduction of the cross-variable higher-order spatial statistics noticeably improves the quality of the interpolation since it enriches the information that can be extracted from the observed data, and this benefit is substantial when working with data that are sparse or have non-trivial dependence structures.
Resumo:
The Echology: Making Sense of Data initiative seeks to break new ground in arts practice by asking artists to innovate with respect to a) the possible forms of data representation in public art and b) the artist's role in engaging publics on environmental sustainability in new urban developments. Initiated by ANAT and Carbon Arts in 2011, Echology has seen three artists selected by National competition in 2012 for Lend Lease sites across Australia. In 2013 commissioning of one of these works, the Mussel Choir by Natalie Jeremijenko, began in Melbourne's Victoria Harbour development. This emerging practice of data - driven and environmentally engaged public artwork presents multiple challenges to established systems of public arts production and management, at the same time as offering up new avenues for artists to forge new modes of collaboration. The experience of Echology and in particular, the Mussel Choir is examined here to reveal opportunities for expansion of this practice through identification of the factors that lead to a resilient 'ecology of part nership' between stakeholders that include science and technology researchers, education providers, city administrators, and urban developers.
Resumo:
Discovering the means to prevent and cure schizophrenia is a vision that motivates many scientists. But in order to achieve this goal, we need to understand its neurobiological basis. The emergent metadiscipline of cognitive neuroscience fields an impressive array of tools that can be marshaled towards achieving this goal, including powerful new methods of imaging the brain (both structural and functional) as well as assessments of perceptual and cognitive capacities based on psychophysical procedures, experimental tasks and models developed by cognitive science. We believe that the integration of data from this array of tools offers the greatest possibilities and potential for advancing understanding of the neural basis of not only normal cognition but also the cognitive impairments that are fundamental to schizophrenia. Since sufficient expertise in the application of these tools and methods rarely reside in a single individual, or even a single laboratory, collaboration is a key element in this endeavor. Here, we review some of the products of our integrative efforts in collaboration with our colleagues on the East Coast of Australia and Pacific Rim. This research focuses on the neural basis of executive function deficits and impairments in early auditory processing in patients using various combinations of performance indices (from perceptual and cognitive paradigms), ERPs, fMRI and sMRI. In each case, integration of two or more sources of information provides more information than any one source alone by revealing new insights into structure-function relationships. Furthermore, the addition of other imaging methodologies (such as DTI) and approaches (such as computational models of cognition) offers new horizons in human brain imaging research and in understanding human behavior.
Resumo:
Although there are many potential new insights to be gained through advancing research on the clients of male sex workers, significant social, ethical and methodological challenges to accessing this population exist. This research project case explores our attempts to recruit a population that does not typically form a cohesive or coherent 'community' and often avoids self-identifying to mitigate the stigma attached to buying sex. We used an arms-length recruitment campaign that focussed on directing potential participants to our study website, which could in turn lead them to participate in an anonymous telephone interview. Barriers to reaching male sex-work clients, however, demanded the evolution of our recruitment strategy. New technologies are part of the solution to accessing a hard-to-reach population, but they only work if researchers engage responsively. We also show how we conducted an in-depth interview with a client and discuss the value of using secondary data.
Resumo:
A tag-based item recommendation method generates an ordered list of items, likely interesting to a particular user, using the users past tagging behaviour. However, the users tagging behaviour varies in different tagging systems. A potential problem in generating quality recommendation is how to build user profiles, that interprets user behaviour to be effectively used, in recommendation models. Generally, the recommendation methods are made to work with specific types of user profiles, and may not work well with different datasets. In this paper, we investigate several tagging data interpretation and representation schemes that can lead to building an effective user profile. We discuss the various benefits a scheme brings to a recommendation method by highlighting the representative features of user tagging behaviours on a specific dataset. Empirical analysis shows that each interpretation scheme forms a distinct data representation which eventually affects the recommendation result. Results on various datasets show that an interpretation scheme should be selected based on the dominant usage in the tagging data (i.e. either higher amount of tags or higher amount of items present). The usage represents the characteristic of user tagging behaviour in the system. The results also demonstrate how the scheme is able to address the cold-start user problem.
Resumo:
Bactrocera papayae Drew & Hancock, Bactrocera philippinensis Drew & Hancock, Bactrocera carambolae Drew & Hancock, and Bactrocera invadens Drew, Tsuruta & White are four horticultural pest tephritid fruit fly species that are highly similar, morphologically and genetically, to the destructive pest, the Oriental fruit fly, Bactrocera dorsalis (Hendel) (Diptera: Tephritidae). This similarity has rendered the discovery of reliable diagnostic characters problematic, which, in view of the economic importance of these taxa and the international trade implications, has resulted in ongoing difficulties for many areas of plant protection and food security. Consequently, a major international collaborative and integrated multidisciplinary research effort was initiated in 2009 to build upon existing literature with the specific aim of resolving biological species limits among B. papayae, B. philippinensis, B. carambolae, B. invadens and B. dorsalis to overcome constraints to pest management and international trade. Bactrocera philippinensis has recently been synonymized with B. papayae as a result of this initiative and this review corroborates that finding; however, the other names remain in use. While consistent characters have been found to reliably distinguish B. carambolae from B. dorsalis, B. invadens and B. papayae, no such characters have been found to differentiate the latter three putative species. We conclude that B. carambolae is a valid species and that the remaining taxa, B. dorsalis, B. invadens and B. papayae, represent the same species. Thus, we consider B. dorsalis (Hendel) as the senior synonym of B. papayae Drew and Hancock syn.n. and B. invadens Drew, Tsuruta & White syn.n. A redescription of B. dorsalis is provided. Given the agricultural importance of B. dorsalis, this taxonomic decision will have significant global plant biosecurity implications, affecting pest management, quarantine, international trade, postharvest treatment and basic research. Throughout the paper, we emphasize the value of independent and multidisciplinary tools in delimiting species, particularly in complicated cases involving morphologically cryptic taxa.
Resumo:
This thesis presents a novel program parallelization technique incorporating with dynamic and static scheduling. It utilizes a problem specific pattern developed from the prior knowledge of the targeted problem abstraction. Suitable for solving complex parallelization problems such as data intensive all-to-all comparison constrained by memory, the technique delivers more robust and faster task scheduling compared to the state-of-the art techniques. Good performance is achieved from the technique in data intensive bioinformatics applications.
Resumo:
This pilot project investigated the existing practices and processes of Proficient, Highly Accomplished and Lead teachers in the interpretation, analysis and implementation of National Assessment Program – Literacy and Numeracy (NAPLAN) data. A qualitative case study approach was the chosen methodology, with nine teachers across a variety of school sectors interviewed. Themes and sub-themes were identified from the participants’ interview responses revealing the ways in which Queensland teachers work with NAPLAN data. The data illuminated that generally individual schools and teachers adopted their own ways of working with data, with approaches ranging from individual/ad hoc, to hierarchical or a whole school approach. Findings also revealed that data are the responsibility of various persons from within the school hierarchy; some working with the data electronically whilst others rely on manual manipulation. Manipulation of data is used for various purposes including tracking performance, value adding and targeting programmes for specific groups of students, for example the gifted and talented. Whilst all participants had knowledge of intervention programmes and how practice could be modified, there were large inconsistencies in knowledge and skills across schools. Some see the use of data as a mechanism for accountability, whilst others mention data with regards to changing the school culture and identifying best practice. Overall, the findings showed inconsistencies in approach to focus area 5.4. Recommendations therefore include a more national approach to the use of educational data.
Resumo:
In this chapter, we draw out the relevant themes from a range of critical scholarship from the small body of digital media and software studies work that has focused on the politics of Twitter data and the sociotechnical means by which access is regulated. We highlight in particular the contested relationships between social media research (in both academic and non-academic contexts) and the data wholesale, retail, and analytics industries that feed on them. In the second major section of the chapter we discuss in detail the pragmatic edge of these politics in terms of what kinds of scientific research is and is not possible in the current political economy of Twitter data access. Finally, at the end of the chapter we return to the much broader implications of these issues for the politics of knowledge, demonstrating how the apparently microscopic level of how the Twitter API mediates access to Twitter data actually inscribes and influences the macro level of the global political economy of science itself, through re-inscribing institutional and traditional disciplinary privilege We conclude with some speculations about future developments in data rights and data philanthropy that may at least mitigate some of these negative impacts.
Resumo:
Monitoring the environment with acoustic sensors is an effective method for understanding changes in ecosystems. Through extensive monitoring, large-scale, ecologically relevant, datasets can be produced that can inform environmental policy. The collection of acoustic sensor data is a solved problem; the current challenge is the management and analysis of raw audio data to produce useful datasets for ecologists. This paper presents the applied research we use to analyze big acoustic datasets. Its core contribution is the presentation of practical large-scale acoustic data analysis methodologies. We describe details of the data workflows we use to provide both citizen scientists and researchers practical access to large volumes of ecoacoustic data. Finally, we propose a work in progress large-scale architecture for analysis driven by a hybrid cloud-and-local production-grade website.